auto

f780ac5f · Peter J. Keleher · 06e04a4c · f780ac5f
Commit f780ac5f authored 7 months ago by Peter J. Keleher
--- a/assign5.md
+++ b/assign5.md
+## Assignment 5: Query Processing
+
+*The assignment is to be done by yourself.*
+
+    Note: this project does not need to be run in the VM, and runs with *python3*, not
+    Python 2.x. If you have acccess to a mac or a  unix/linux box, run
+    it there. Windows may or may not work. You are always free to run in the vm from previous
+	assignments.
+
+
+### Setup
+
+Download files for Assignment 5 [here](https://ceres.cs.umd.edu/424/assign/assignment5Dist.tgz?1").
+Build (`docker build --rm -t 424 .`) and run (`docker run -it -v
+$(pwd):/424 424`) to create your container.
+
+
+### Overview
+
+In this project, you will modify our simple database to illustrate some query processing algorithms. 
+The database system is written in Python and attempts to simulate how
+a database system would work, including what blocks it would read from
+disk, etc. 
+
+* `disk_relations.py`: This module contains the classes Block,
+RelationBlock, Relation, and a few others helper classes like
+Pointer. A Block is a base class,  that is subclassed by RelationBlock
+and BTreeBlock (in `btree.py` file). A RelationBlock contains a set of
+tuples, and a Relation contains a list of RelationBlocks.  
+* `queryprocessing.py`: This contains naive implementations of some of
+  the query processing operators, including SequentialScan,
+  NestedLoopsJoin, HashJoin, and SortMergeJoin. The operators are
+  written using the iterator `get_next` interface, which is discussed
+  in Chapter 15.7. 
+
+
+### How to Use the Files
+
+The directory also contains two other files:
+* `create_sample_databases.py`: Creates a sample database with 2 relations and 2 indexes.
+* `testing-queryprocessing.py`: Shows the execution of some simple query processing tasks using sample data. 
+
+The simplest way you can run this is by just doing: `python3 testing-queryprocessing.py`. That will run all the code in the `testing-queryprocessing.py` file.
+
+A better option is to do: `python3 -i testing-queryprocessing.py`. This will execute all the code in `testing-queryprocessing.py` file and then it will open a Python shell. In that shell, you can start doing more operations.
+
+### Your Task
+
+Your task is to finish a few of the unfinished pieces in the `queryprocessing.py` file.
+* (20pt) Function `SortMergeJoin.get_next()` in `queryprocessing.py`: Your task is to implement SEMI JOIN and ANTI JOIN variants of the SortMergeJoin algorithm.
+* (10pt) Function `SetMinus.get_next()` in `queryprocessing.py`: Here you have to finish the implementation of the SetMinus operation.
+
+### Hints
+- Look at the INNER_JOIN variant of the sortmerge join for clues how to construct your code.
+- Note that SQL relations can have duplicates.
+- Use the code from `worksheet.sql` to get a sense of how these variants work:
+
+```
+-- createdb tst
+-- psql tst
+
+drop table rs;
+drop table st;
+create table rs (a int, b int);
+create table st (b int, c int);
+
+insert into rs values (1, 2);
+insert into rs values (1, 2);
+insert into rs values (2, 2);
+insert into rs values (3, 3);
+insert into rs values (4, 3);
+
+insert into st values (2, 1);
+insert into st values (2, 2);
+insert into st values (5, 1);
+insert into st values (6, 2);
+
+-- semi join (wrong)
+(SELECT * FROM rs) EXCEPT ((SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st));
+
+-- semi join (right)
+(SELECT * FROM rs) EXCEPT ALL ((SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st));
+
+-- anti join
+(SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st);
+```
+
+
+### Submission
+Submit the modified `queryprocessing.py` file to [gradescope](https://www.gradescope.com/courses/811728/assignments/4669999).