diff --git a/#assign2.md# b/#assign2.md# new file mode 100644 index 0000000000000000000000000000000000000000..4744b59b51dd2c8dc64d603c436b792c213e1ac3 --- /dev/null +++ b/#assign2.md# @@ -0,0 +1,114 @@ +## Project 2: Advanced SQL, Python DB app + +### Setup + +Download the startup files [here](https://ceres.cs.umd.edu/424/assign/assignment2Dist.tgz). + +Build (`docker build --rm -t 424 .`) and run (`docker run -it -v $(pwd):/424 424`) +the container as before, then `cd /424` (inside the container). +The main differences here are that we have loaded a skewed database +(`flightsskewed`), and also installed Java. + +Ensure that the `flightsskewed` database has been created, together with tables populated from `large-skewed.sql`. Use +`flightsskewed` for all parts of this assignment. + +## Part 1: Query Construction (2 pts) + +Consider the following query which finds the number of flights + taken by users whose name starts with 'William'. + +``` +select c.customerid, c.name, count(*) +from customers c join flewon f on (c.customerid = f.customerid and c.name like 'William%') +group by c.customerid, c.name +order by c.customerid; +``` + +The result however does not contain the users whose name contains 'William' but who did +not fly at all (e.g., `cust733`). So we may consider +modifying this query to use a left outer join instead, so we get those users as well: + +``` +select c.customerid, c.name, count(*) +from customers c left outer join flewon f on (c.customerid = f.customerid and c.name like 'William%') +group by c.customerid, c.name +order by c.customerid; +``` + +Briefly explain why this query does not return the expected answer (as below), and rewrite the query so that it does. + +The final answer should look like this: +``` + customerid | name | count + ------------+--------------------------------+------- + cust727 | William Harris | 4 + cust728 | William Hill | 6 + cust729 | William Jackson | 6 + cust730 | William Johnson | 5 + cust731 | William Lee | 0 + cust732 | William Lopez | 6 + cust733 | William Martinez | 0 + cust734 | William Mitchell | 6 + cust735 | William Moore | 5 + cust736 | William Parker | 4 + cust737 | William Roberts | 8 + cust738 | William Robinson | 7 + cust739 | William Rodriguez | 5 + cust740 | William Wright | 8 + cust741 | William Young | 5 + (15 rows) +``` + +Save your query in `queries.py` as the definition of `queryWilliam`. +Include your explanation as a comment above this definition. + +--- +## Part 2: Trigger (3 pt) + +We have built a table `NumberOfFlightsTaken(customerid, customername, +numflights)` to keep track of the total number of flights taken by each +customer: +``` +create table NumberOfFlightsTaken as +select c.customerid, c.name as customername, count(*) as numflights +from customers c join flewon fo on c.customerid = fo.customerid +group by c.customerid, c.name; +``` + +Since this is a derived table (and not a view), it will not be kept +up-to-date by the database system. We (you) will therefore +write a `trigger` to keep this new table updated when a new entry is inserted +into, or a row is deleted from, the `flewon` table. Remember that the `customerid` +corresponding to a new `flewon` insertion update may not yet exist in the +`NumberOfFlightsTaken` table. In that case, it should be added to `NumberOfFlightsTaken` +with a count of 1. + +Similarly, if deletion of a row in `flewon` +results in a user not having any flights, then the corresponding tuple for +that user in `NumberOfFlightsTaken` should be deleted. + +The trigger code should be submitted as the definition of `queryTrigger` in +the `queries.py` file, as straight SQL. This file has an incorrect and +incomplete version of such a trigger commented out. Uncomment this version, +fix it, and test by running `SQLTesting.py`. + +Look inside this file to see +the insertions and deletions being tested, and think about what the proper +actions should be. + +Notes: +- `python3 SQLTesting.py` will clean the db, set up `NumberOfFlightsTaken`, + and run both your queries, printing their outputs. +- We will again be using automated testing. Ensuring that your queries looks to + be generating correct data w/ `SQLTesting.py` should ensure that the + autograder will produce correct results as well. + +### Non-VM instructions (e.g. Macs) + +- Create db w/ `createdb flightsskewed`. +- Ensure that the *user* in SQLTesting.py matches your user account. + +## Submission + +Submit the `queries.py` file [on gradescope](https://www.gradescope.com/courses/811728/assignments/4669976/review_grades). + diff --git a/assign6.md b/assign6.md new file mode 100644 index 0000000000000000000000000000000000000000..a1156ad68544a902c874be425f2417bef7e96e45 --- /dev/null +++ b/assign6.md @@ -0,0 +1,98 @@ +## Assignment 6: Snapshot Isolation + +*The assignment is to be done by yourself.* + +### Setup + +Download files for Assignment 6 [here](https://ceres.cs.umd.edu/424/assign/assignment6Dist.tgz?1"). +Build (`docker build --rm -t 424 .`) and run (`docker run -it -v +$(pwd):/424 424`) to create your container. + + + +### Overview + +In this assignment you will modify our simple database to emulate first-committer *snapshot +isolation* instead of lock-based two-phase consistency. + +Snapshot isolation is usually implemented using a *multi-versioned* database; +snapshot reads can be done at any time by specifying older timestamps. We do +not have a multi-versioned database, so instead we will handle snapshot reads +by specifying the readset and writeset ahead of time, and *stashing* a snapshot of the +relevant data. Note that this technique is also used in some high-performance +distributed databases<sup>1</sup>. Subsequent reads by the transaction will read from the stashed +copies, rather than directly from the database. + +Writes in snapshot isolation are buffered, and only pushed to the database +when the transaction *validates*, i.e. is guaranteed to succeed. A transaction *i* +validates if and only if no other transaction commits between the time *i* +starts and attempts to validate. This is straightforward in a multi-versioned +database which, again, we do not have. We will emulate this by allowing *i* to validate if none of +the items in its writeset change values during the course of *i*'s execution. +This essentially means that (1) we also stash all data in the writeset at the +start of transaction execution, and (2) we compare the stashed value w/ the +"current" value at commit time to see if it has changed. + +Transactions that fail validation are aborted. + +Note that in a real database validation and commitment should be a single +atomic action; we will not go that far in this system. The example +transactions should never execute their transactions close enough together for +this discrepancy to be apparent. + +### Details. +As in prior assignments, you will only have to write a small amount of +code. Your changes will be confined to `transactions.py`, and *only* that file +should be uploaded to gradescope. Search for "SNAPSHOT_ISO" or "YOUR CODE +HERE" to find places that need to be modified. + +The following is an example transaction: +``` + tstate = TransactionState() + tstate.setMode(TransactionState.SNAPSHOT_ISO) + tstate.takeSnapshot(relation, "A", {primary_id1, primary_id2}) + v = tstate.getAttribute(relation, primary_id2, attr) + writeVal = str(100 - int(v)) + tstate.setAttribute(relation, primary_id1, attr, writeVal) + res = tstate.commitTransaction() +``` + +Our approach will be the following: +1. The transaction explicitly takes a snapshot of all values in +the readset and writeset. Real systems use a multi-versioned database to access the +correct data. +1. Reads are satisfied from the snapshot, *or writes from the same transaction*. +2. Written values are tracked and added to a writebuffer +rather than being written to the relation. +3. At commit time, we check to see whether data for which we have updates in our writebuffers +has been changed in the relation by any other transaction. Since our system does not currently have +explicit versioning, +we approximate this by checking to see whether the any *values* of the data we are writing +have changed. + +Logically, you will have to add three pieces of state to the +transaction's initialization: +- *snapshot*: a `dict` of database values indexed by `(attr, id)` +- *writeBuffer*: an ordered list of values to be written out at +successfull commit +- *writeset*: a set describing which data has been modified. This can be +extracted from the writeBuffer at commit time, and so does not have to +be tracked explicitly. + +You do not have to structure your data this way, and there is some +redundancy in the above. + +### Testing +Use `python3 testing.py` to test your implementation locally prior to +gradescope submission. Remove `relation1` before testing: +``` + rm relation1 ; python3 testing.py +``` + +### Submission +Submit the modified `transactions.py` file to [Gradescope](https://www.gradescope.com/courses/811728/assignments/4828403/review_grades). + +You should not change any other files. + +<sup>1</sup>Thomson, A., Diamond, T., Weng, S. C., Ren, K., Shao, P., & Abadi, D. J. "Calvin: fast distributed transactions for partitioned database systems." *Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data*. 2012. +