Note: this project does not need to be run in the VM, and runs with *python3*, not
Python 2.x. If you have acccess to a mac or a unix/linux box, run
it there. Windows may or may not work. You are always free to run in the vm from previous
assignments.
### Setup
Download files for Assignment 5 [here](https://ceres.cs.umd.edu/424/assign/assignment5Dist.tgz?1").
Build (`docker build --rm -t 424 .`) and run (`docker run -it -v
$(pwd):/424 424`) to create your container.
### Overview
In this project, you will modify our simple database to illustrate some query processing algorithms.
The database system is written in Python and attempts to simulate how
a database system would work, including what blocks it would read from
disk, etc.
*`disk_relations.py`: This module contains the classes Block,
RelationBlock, Relation, and a few others helper classes like
Pointer. A Block is a base class, that is subclassed by RelationBlock
and BTreeBlock (in `btree.py` file). A RelationBlock contains a set of
tuples, and a Relation contains a list of RelationBlocks.
*`queryprocessing.py`: This contains naive implementations of some of
the query processing operators, including SequentialScan,
NestedLoopsJoin, HashJoin, and SortMergeJoin. The operators are
written using the iterator `get_next` interface, which is discussed
in Chapter 15.7.
### How to Use the Files
The directory also contains two other files:
*`create_sample_databases.py`: Creates a sample database with 2 relations and 2 indexes.
*`testing-queryprocessing.py`: Shows the execution of some simple query processing tasks using sample data.
The simplest way you can run this is by just doing: `python3 testing-queryprocessing.py`. That will run all the code in the `testing-queryprocessing.py` file.
A better option is to do: `python3 -i testing-queryprocessing.py`. This will execute all the code in `testing-queryprocessing.py` file and then it will open a Python shell. In that shell, you can start doing more operations.
### Your Task
Your task is to finish a few of the unfinished pieces in the `queryprocessing.py` file.
* (20pt) Function `SortMergeJoin.get_next()` in `queryprocessing.py`: Your task is to implement SEMI JOIN and ANTI JOIN variants of the SortMergeJoin algorithm.
* (10pt) Function `SetMinus.get_next()` in `queryprocessing.py`: Here you have to finish the implementation of the SetMinus operation.
### Hints
- Look at the INNER_JOIN variant of the sortmerge join for clues how to construct your code.
- Note that SQL relations can have duplicates.
- Use the code from `worksheet.sql` to get a sense of how these variants work:
```
-- createdb tst
-- psql tst
drop table rs;
drop table st;
create table rs (a int, b int);
create table st (b int, c int);
insert into rs values (1, 2);
insert into rs values (1, 2);
insert into rs values (2, 2);
insert into rs values (3, 3);
insert into rs values (4, 3);
insert into st values (2, 1);
insert into st values (2, 2);
insert into st values (5, 1);
insert into st values (6, 2);
-- semi join (wrong)
(SELECT * FROM rs) EXCEPT ((SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st));
-- semi join (right)
(SELECT * FROM rs) EXCEPT ALL ((SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st));
-- anti join
(SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st);
```
### Submission
Submit the modified `queryprocessing.py` file to [gradescope](https://www.gradescope.com/courses/811728/assignments/4669999).