Skip to content
Snippets Groups Projects
Commit 976c42b5 authored by Peter J. Keleher's avatar Peter J. Keleher
Browse files

auto

parent f780ac5f
No related branches found
No related tags found
No related merge requests found
## Project 2: Advanced SQL, Python DB app
### Setup
Download the startup files [here](https://ceres.cs.umd.edu/424/assign/assignment2Dist.tgz).
Build (`docker build --rm -t 424 .`) and run (`docker run -it -v $(pwd):/424 424`)
the container as before, then `cd /424` (inside the container).
The main differences here are that we have loaded a skewed database
(`flightsskewed`), and also installed Java.
Ensure that the `flightsskewed` database has been created, together with tables populated from `large-skewed.sql`. Use
`flightsskewed` for all parts of this assignment.
## Part 1: Query Construction (2 pts)
Consider the following query which finds the number of flights
taken by users whose name starts with 'William'.
```
select c.customerid, c.name, count(*)
from customers c join flewon f on (c.customerid = f.customerid and c.name like 'William%')
group by c.customerid, c.name
order by c.customerid;
```
The result however does not contain the users whose name contains 'William' but who did
not fly at all (e.g., `cust733`). So we may consider
modifying this query to use a left outer join instead, so we get those users as well:
```
select c.customerid, c.name, count(*)
from customers c left outer join flewon f on (c.customerid = f.customerid and c.name like 'William%')
group by c.customerid, c.name
order by c.customerid;
```
Briefly explain why this query does not return the expected answer (as below), and rewrite the query so that it does.
The final answer should look like this:
```
customerid | name | count
------------+--------------------------------+-------
cust727 | William Harris | 4
cust728 | William Hill | 6
cust729 | William Jackson | 6
cust730 | William Johnson | 5
cust731 | William Lee | 0
cust732 | William Lopez | 6
cust733 | William Martinez | 0
cust734 | William Mitchell | 6
cust735 | William Moore | 5
cust736 | William Parker | 4
cust737 | William Roberts | 8
cust738 | William Robinson | 7
cust739 | William Rodriguez | 5
cust740 | William Wright | 8
cust741 | William Young | 5
(15 rows)
```
Save your query in `queries.py` as the definition of `queryWilliam`.
Include your explanation as a comment above this definition.
---
## Part 2: Trigger (3 pt)
We have built a table `NumberOfFlightsTaken(customerid, customername,
numflights)` to keep track of the total number of flights taken by each
customer:
```
create table NumberOfFlightsTaken as
select c.customerid, c.name as customername, count(*) as numflights
from customers c join flewon fo on c.customerid = fo.customerid
group by c.customerid, c.name;
```
Since this is a derived table (and not a view), it will not be kept
up-to-date by the database system. We (you) will therefore
write a `trigger` to keep this new table updated when a new entry is inserted
into, or a row is deleted from, the `flewon` table. Remember that the `customerid`
corresponding to a new `flewon` insertion update may not yet exist in the
`NumberOfFlightsTaken` table. In that case, it should be added to `NumberOfFlightsTaken`
with a count of 1.
Similarly, if deletion of a row in `flewon`
results in a user not having any flights, then the corresponding tuple for
that user in `NumberOfFlightsTaken` should be deleted.
The trigger code should be submitted as the definition of `queryTrigger` in
the `queries.py` file, as straight SQL. This file has an incorrect and
incomplete version of such a trigger commented out. Uncomment this version,
fix it, and test by running `SQLTesting.py`.
Look inside this file to see
the insertions and deletions being tested, and think about what the proper
actions should be.
Notes:
- `python3 SQLTesting.py` will clean the db, set up `NumberOfFlightsTaken`,
and run both your queries, printing their outputs.
- We will again be using automated testing. Ensuring that your queries looks to
be generating correct data w/ `SQLTesting.py` should ensure that the
autograder will produce correct results as well.
### Non-VM instructions (e.g. Macs)
- Create db w/ `createdb flightsskewed`.
- Ensure that the *user* in SQLTesting.py matches your user account.
## Submission
Submit the `queries.py` file [on gradescope](https://www.gradescope.com/courses/811728/assignments/4669976/review_grades).
## Assignment 6: Snapshot Isolation
*The assignment is to be done by yourself.*
### Setup
Download files for Assignment 6 [here](https://ceres.cs.umd.edu/424/assign/assignment6Dist.tgz?1").
Build (`docker build --rm -t 424 .`) and run (`docker run -it -v
$(pwd):/424 424`) to create your container.
### Overview
In this assignment you will modify our simple database to emulate first-committer *snapshot
isolation* instead of lock-based two-phase consistency.
Snapshot isolation is usually implemented using a *multi-versioned* database;
snapshot reads can be done at any time by specifying older timestamps. We do
not have a multi-versioned database, so instead we will handle snapshot reads
by specifying the readset and writeset ahead of time, and *stashing* a snapshot of the
relevant data. Note that this technique is also used in some high-performance
distributed databases<sup>1</sup>. Subsequent reads by the transaction will read from the stashed
copies, rather than directly from the database.
Writes in snapshot isolation are buffered, and only pushed to the database
when the transaction *validates*, i.e. is guaranteed to succeed. A transaction *i*
validates if and only if no other transaction commits between the time *i*
starts and attempts to validate. This is straightforward in a multi-versioned
database which, again, we do not have. We will emulate this by allowing *i* to validate if none of
the items in its writeset change values during the course of *i*'s execution.
This essentially means that (1) we also stash all data in the writeset at the
start of transaction execution, and (2) we compare the stashed value w/ the
"current" value at commit time to see if it has changed.
Transactions that fail validation are aborted.
Note that in a real database validation and commitment should be a single
atomic action; we will not go that far in this system. The example
transactions should never execute their transactions close enough together for
this discrepancy to be apparent.
### Details.
As in prior assignments, you will only have to write a small amount of
code. Your changes will be confined to `transactions.py`, and *only* that file
should be uploaded to gradescope. Search for "SNAPSHOT_ISO" or "YOUR CODE
HERE" to find places that need to be modified.
The following is an example transaction:
```
tstate = TransactionState()
tstate.setMode(TransactionState.SNAPSHOT_ISO)
tstate.takeSnapshot(relation, "A", {primary_id1, primary_id2})
v = tstate.getAttribute(relation, primary_id2, attr)
writeVal = str(100 - int(v))
tstate.setAttribute(relation, primary_id1, attr, writeVal)
res = tstate.commitTransaction()
```
Our approach will be the following:
1. The transaction explicitly takes a snapshot of all values in
the readset and writeset. Real systems use a multi-versioned database to access the
correct data.
1. Reads are satisfied from the snapshot, *or writes from the same transaction*.
2. Written values are tracked and added to a writebuffer
rather than being written to the relation.
3. At commit time, we check to see whether data for which we have updates in our writebuffers
has been changed in the relation by any other transaction. Since our system does not currently have
explicit versioning,
we approximate this by checking to see whether the any *values* of the data we are writing
have changed.
Logically, you will have to add three pieces of state to the
transaction's initialization:
- *snapshot*: a `dict` of database values indexed by `(attr, id)`
- *writeBuffer*: an ordered list of values to be written out at
successfull commit
- *writeset*: a set describing which data has been modified. This can be
extracted from the writeBuffer at commit time, and so does not have to
be tracked explicitly.
You do not have to structure your data this way, and there is some
redundancy in the above.
### Testing
Use `python3 testing.py` to test your implementation locally prior to
gradescope submission. Remove `relation1` before testing:
```
rm relation1 ; python3 testing.py
```
### Submission
Submit the modified `transactions.py` file to [Gradescope](https://www.gradescope.com/courses/811728/assignments/4828403/review_grades).
You should not change any other files.
<sup>1</sup>Thomson, A., Diamond, T., Weng, S. C., Ren, K., Shao, P., & Abadi, D. J. "Calvin: fast distributed transactions for partitioned database systems." *Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data*. 2012.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment