auto

976c42b5 · Peter J. Keleher · f780ac5f · 976c42b5 · 976c42b5
Commit 976c42b5 authored 7 months ago by Peter J. Keleher
--- a/#assign2.md#
+++ b/#assign2.md#
+## Project 2: Advanced SQL, Python DB app
+
+### Setup
+
+Download the startup files [here](https://ceres.cs.umd.edu/424/assign/assignment2Dist.tgz).
+
+Build (`docker build --rm -t 424 .`) and run (`docker run -it -v $(pwd):/424 424`)
+the container as before, then `cd /424` (inside the container).
+The main differences here are that we have loaded a skewed database
+(`flightsskewed`), and also installed Java. 
+
+Ensure that the `flightsskewed` database has been created, together with tables populated from  `large-skewed.sql`. Use
+`flightsskewed` for all parts of this assignment.
+
+## Part 1: Query Construction (2 pts)
+
+Consider the following query which finds the number of flights
+  taken by users whose name starts with 'William'.
+
+```
+select c.customerid, c.name, count(*)
+from customers c join flewon f on (c.customerid = f.customerid and c.name like 'William%')
+group by c.customerid, c.name
+order by c.customerid;
+```
+
+The result however does not contain the users whose name contains 'William' but who did
+not fly at all (e.g., `cust733`). So we may consider
+modifying this query to use a left outer join instead, so we get those users as well:
+
+```
+select c.customerid, c.name, count(*)
+from customers c left outer join flewon f on (c.customerid = f.customerid and c.name like 'William%')
+group by c.customerid, c.name
+order by c.customerid;
+```
+
+Briefly explain why this query does not return the expected answer (as below), and rewrite the query so that it does.
+
+The final answer should look like this:
+```
+	customerid |              name              | count
+	------------+--------------------------------+-------
+	cust727    | William Harris                 |     4
+	cust728    | William Hill                   |     6
+	cust729    | William Jackson                |     6
+	cust730    | William Johnson                |     5
+	cust731    | William Lee                    |     0
+	cust732    | William Lopez                  |     6
+	cust733    | William Martinez               |     0
+	cust734    | William Mitchell               |     6
+	cust735    | William Moore                  |     5
+	cust736    | William Parker                 |     4
+	cust737    | William Roberts                |     8
+	cust738    | William Robinson               |     7
+	cust739    | William Rodriguez              |     5
+	cust740    | William Wright                 |     8
+	cust741    | William Young                  |     5
+	(15 rows)
+```
+
+Save your query in  `queries.py` as the definition of `queryWilliam`.
+Include your explanation as a comment above this definition.
+
+---
+## Part 2: Trigger (3 pt)
+
+We have built a table `NumberOfFlightsTaken(customerid, customername,
+numflights)` to keep track of the total number of flights taken by each
+customer:
+```
+create table NumberOfFlightsTaken as
+select c.customerid, c.name as customername, count(*) as numflights
+from customers c join flewon fo on c.customerid = fo.customerid
+group by c.customerid, c.name;
+```
+
+Since this is a derived table (and not a view), it will not be kept
+up-to-date by the database system.  We (you) will therefore
+write a `trigger` to keep this new table updated when a new entry is inserted
+into, or a row is deleted from, the `flewon` table. Remember that the `customerid`
+corresponding to a new `flewon` insertion update may not yet exist in the
+`NumberOfFlightsTaken` table. In that case, it should be added to `NumberOfFlightsTaken`
+with a count of 1. 
+
+Similarly, if deletion of a row in `flewon`
+results in a user not having any flights, then the corresponding tuple for
+that user in `NumberOfFlightsTaken` should be deleted. 
+
+The trigger code should be submitted as the definition of `queryTrigger` in
+the `queries.py` file, as straight SQL. This file has an incorrect and
+incomplete version of such a trigger commented out. Uncomment this version,
+fix it, and test by running `SQLTesting.py`. 
+
+Look inside this file to see
+the insertions and deletions being tested, and think about what the proper
+actions should be.
+
+Notes:
+- `python3 SQLTesting.py` will clean the db, set up `NumberOfFlightsTaken`,
+  and run both your queries, printing their outputs.
+- We will again be using automated testing. Ensuring that your queries looks to
+  be generating correct data w/ `SQLTesting.py` should ensure that the
+  autograder will produce correct results as well.
+
+### Non-VM instructions (e.g. Macs)
+
+- Create db w/ `createdb flightsskewed`.
+- Ensure that the *user* in SQLTesting.py matches your user account.
+
+## Submission
+
+Submit the `queries.py` file [on gradescope](https://www.gradescope.com/courses/811728/assignments/4669976/review_grades).
+
--- a/assign6.md
+++ b/assign6.md
+## Assignment 6: Snapshot Isolation
+
+*The assignment is to be done by yourself.*
+
+### Setup
+
+Download files for Assignment 6 [here](https://ceres.cs.umd.edu/424/assign/assignment6Dist.tgz?1").
+Build (`docker build --rm -t 424 .`) and run (`docker run -it -v
+$(pwd):/424 424`) to create your container.
+
+
+
+### Overview
+
+In this assignment you will modify our simple database to emulate first-committer *snapshot
+isolation* instead of lock-based two-phase consistency.
+
+Snapshot isolation is usually implemented using a *multi-versioned* database;
+snapshot reads can be done at any time by specifying older timestamps. We do
+not have a multi-versioned database, so instead we will handle snapshot reads
+by specifying the readset and writeset ahead of time, and *stashing* a snapshot of the
+relevant data. Note that this technique is also used in some high-performance
+distributed databases<sup>1</sup>. Subsequent reads by the transaction will read from the stashed
+copies, rather than directly from the database.
+
+Writes in snapshot isolation are buffered, and only pushed to the database
+when the transaction *validates*, i.e. is guaranteed to succeed. A transaction *i*
+validates if and only if no other transaction commits between the time *i*
+starts and attempts to validate. This is straightforward in a multi-versioned
+database which, again, we do not have. We will emulate this by allowing *i* to validate if none of
+the items in its writeset change values during the course of *i*'s execution.
+This essentially means that (1) we also stash all data in the writeset at the
+start of transaction execution, and (2) we compare the stashed value w/ the
+"current" value at commit time to see if it has changed.
+
+Transactions that fail validation are aborted.
+
+Note that in a real database validation and commitment should be a single
+atomic action; we will not go that far in this system. The example
+transactions should never execute their transactions close enough together for
+this discrepancy to be apparent.
+
+### Details.
+As in prior assignments, you will only have to write a small amount of
+code. Your changes will be confined to `transactions.py`, and *only* that file
+should be uploaded to gradescope.  Search for "SNAPSHOT_ISO" or "YOUR CODE
+HERE" to find places that need to be modified.
+
+The following is an example transaction:
+```
+    tstate = TransactionState()
+    tstate.setMode(TransactionState.SNAPSHOT_ISO)
+    tstate.takeSnapshot(relation, "A", {primary_id1, primary_id2})
+    v = tstate.getAttribute(relation, primary_id2, attr)
+    writeVal = str(100 - int(v))
+    tstate.setAttribute(relation, primary_id1, attr, writeVal)
+    res = tstate.commitTransaction()
+```
+
+Our approach will be the following:
+1. The transaction explicitly takes a snapshot of all values in
+the readset and writeset. Real systems use a multi-versioned database to access the
+correct data. 
+1. Reads are satisfied from the snapshot, *or writes from the same transaction*.
+2. Written values are tracked and added to a writebuffer
+rather than being written to the relation.
+3. At commit time, we check to see whether data for which we have updates in our writebuffers
+has been changed in the relation by any other transaction. Since our system does not currently have
+explicit versioning,    
+we approximate this by checking to see whether the any *values* of the data we are writing
+have changed.
+
+Logically, you will have to add three pieces of state to the
+transaction's initialization:
+- *snapshot*: a `dict` of database values indexed by `(attr, id)`
+- *writeBuffer*: an ordered list of values to be written out at
+successfull commit
+- *writeset*: a set describing which data has been modified. This can be
+extracted from the writeBuffer at commit time, and so does not have to
+be tracked explicitly. 
+
+You do not have to structure your data this way, and there is some
+redundancy in the above.
+
+### Testing
+Use `python3 testing.py` to test your implementation locally prior to
+gradescope submission. Remove `relation1` before testing:
+```
+    rm relation1 ; python3 testing.py
+```
+
+### Submission
+Submit the modified `transactions.py` file to [Gradescope](https://www.gradescope.com/courses/811728/assignments/4828403/review_grades). 
+
+You should not change any other files.
+
+<sup>1</sup>Thomson, A., Diamond, T., Weng, S. C., Ren, K., Shao, P., & Abadi, D. J. "Calvin: fast distributed transactions for partitioned database systems." *Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data*. 2012.
+