Skip to content
Snippets Groups Projects
Commit f780ac5f authored by Peter J. Keleher's avatar Peter J. Keleher
Browse files

auto

parent 06e04a4c
No related branches found
No related tags found
No related merge requests found
## Assignment 5: Query Processing
*The assignment is to be done by yourself.*
Note: this project does not need to be run in the VM, and runs with *python3*, not
Python 2.x. If you have acccess to a mac or a unix/linux box, run
it there. Windows may or may not work. You are always free to run in the vm from previous
assignments.
### Setup
Download files for Assignment 5 [here](https://ceres.cs.umd.edu/424/assign/assignment5Dist.tgz?1").
Build (`docker build --rm -t 424 .`) and run (`docker run -it -v
$(pwd):/424 424`) to create your container.
### Overview
In this project, you will modify our simple database to illustrate some query processing algorithms.
The database system is written in Python and attempts to simulate how
a database system would work, including what blocks it would read from
disk, etc.
* `disk_relations.py`: This module contains the classes Block,
RelationBlock, Relation, and a few others helper classes like
Pointer. A Block is a base class, that is subclassed by RelationBlock
and BTreeBlock (in `btree.py` file). A RelationBlock contains a set of
tuples, and a Relation contains a list of RelationBlocks.
* `queryprocessing.py`: This contains naive implementations of some of
the query processing operators, including SequentialScan,
NestedLoopsJoin, HashJoin, and SortMergeJoin. The operators are
written using the iterator `get_next` interface, which is discussed
in Chapter 15.7.
### How to Use the Files
The directory also contains two other files:
* `create_sample_databases.py`: Creates a sample database with 2 relations and 2 indexes.
* `testing-queryprocessing.py`: Shows the execution of some simple query processing tasks using sample data.
The simplest way you can run this is by just doing: `python3 testing-queryprocessing.py`. That will run all the code in the `testing-queryprocessing.py` file.
A better option is to do: `python3 -i testing-queryprocessing.py`. This will execute all the code in `testing-queryprocessing.py` file and then it will open a Python shell. In that shell, you can start doing more operations.
### Your Task
Your task is to finish a few of the unfinished pieces in the `queryprocessing.py` file.
* (20pt) Function `SortMergeJoin.get_next()` in `queryprocessing.py`: Your task is to implement SEMI JOIN and ANTI JOIN variants of the SortMergeJoin algorithm.
* (10pt) Function `SetMinus.get_next()` in `queryprocessing.py`: Here you have to finish the implementation of the SetMinus operation.
### Hints
- Look at the INNER_JOIN variant of the sortmerge join for clues how to construct your code.
- Note that SQL relations can have duplicates.
- Use the code from `worksheet.sql` to get a sense of how these variants work:
```
-- createdb tst
-- psql tst
drop table rs;
drop table st;
create table rs (a int, b int);
create table st (b int, c int);
insert into rs values (1, 2);
insert into rs values (1, 2);
insert into rs values (2, 2);
insert into rs values (3, 3);
insert into rs values (4, 3);
insert into st values (2, 1);
insert into st values (2, 2);
insert into st values (5, 1);
insert into st values (6, 2);
-- semi join (wrong)
(SELECT * FROM rs) EXCEPT ((SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st));
-- semi join (right)
(SELECT * FROM rs) EXCEPT ALL ((SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st));
-- anti join
(SELECT * FROM rs) EXCEPT (SELECT rs.* FROM rs NATURAL JOIN st);
```
### Submission
Submit the modified `queryprocessing.py` file to [gradescope](https://www.gradescope.com/courses/811728/assignments/4669999).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment