From f780ac5f6ab609f3a0b5d5c9cd1edd7d748bff29 Mon Sep 17 00:00:00 2001 From: "Peter J. Keleher" <keleher@cs.umd.edu> Date: Sat, 19 Oct 2024 16:25:10 -0400 Subject: [PATCH] auto --- assign5.md | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) create mode 100644 assign5.md diff --git a/assign5.md b/assign5.md new file mode 100644 index 0000000..114813f --- /dev/null +++ b/assign5.md @@ -0,0 +1,90 @@ +## Assignment 5: Query Processing + +*The assignment is to be done by yourself.* + + Note: this project does not need to be run in the VM, and runs with *python3*, not + Python 2.x. If you have acccess to a mac or a unix/linux box, run + it there. Windows may or may not work. You are always free to run in the vm from previous + assignments. + + +### Setup + +Download files for Assignment 5 [here](https://ceres.cs.umd.edu/424/assign/assignment5Dist.tgz?1"). +Build (`docker build --rm -t 424 .`) and run (`docker run -it -v +$(pwd):/424 424`) to create your container. + + +### Overview + +In this project, you will modify our simple database to illustrate some query processing algorithms. +The database system is written in Python and attempts to simulate how +a database system would work, including what blocks it would read from +disk, etc. + +* `disk_relations.py`: This module contains the classes Block, +RelationBlock, Relation, and a few others helper classes like +Pointer. A Block is a base class, that is subclassed by RelationBlock +and BTreeBlock (in `btree.py` file). A RelationBlock contains a set of +tuples, and a Relation contains a list of RelationBlocks. +* `queryprocessing.py`: This contains naive implementations of some of + the query processing operators, including SequentialScan, + NestedLoopsJoin, HashJoin, and SortMergeJoin. The operators are + written using the iterator `get_next` interface, which is discussed + in Chapter 15.7. + + +### How to Use the Files + +The directory also contains two other files: +* `create_sample_databases.py`: Creates a sample database with 2 relations and 2 indexes. +* `testing-queryprocessing.py`: Shows the execution of some simple query processing tasks using sample data. + +The simplest way you can run this is by just doing: `python3 testing-queryprocessing.py`. That will run all the code in the `testing-queryprocessing.py` file. + +A better option is to do: `python3 -i testing-queryprocessing.py`. This will execute all the code in `testing-queryprocessing.py` file and then it will open a Python shell. In that shell, you can start doing more operations. + +### Your Task + +Your task is to finish a few of the unfinished pieces in the `queryprocessing.py` file. +* (20pt) Function `SortMergeJoin.get_next()` in `queryprocessing.py`: Your task is to implement SEMI JOIN and ANTI JOIN variants of the SortMergeJoin algorithm. +* (10pt) Function `SetMinus.get_next()` in `queryprocessing.py`: Here you have to finish the implementation of the SetMinus operation. + +### Hints +- Look at the INNER_JOIN variant of the sortmerge join for clues how to construct your code. +- Note that SQL relations can have duplicates. +- Use the code from `worksheet.sql` to get a sense of how these variants work: + +``` +-- createdb tst +-- psql tst + +drop table rs; +drop table st; +create table rs (a int, b int); +create table st (b int, c int); + +insert into rs values (1, 2); +insert into rs values (1, 2); +insert into rs values (2, 2); +insert into rs values (3, 3); +insert into rs values (4, 3); + +insert into st values (2, 1); +insert into st values (2, 2); +insert into st values (5, 1); +insert into st values (6, 2); + +-- semi join (wrong) +(SELECT * FROM rs) EXCEPT ((SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st)); + +-- semi join (right) +(SELECT * FROM rs) EXCEPT ALL ((SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st)); + +-- anti join +(SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st); +``` + + +### Submission +Submit the modified `queryprocessing.py` file to [gradescope](https://www.gradescope.com/courses/811728/assignments/4669999). -- GitLab