Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
#assign2.md# 4.52 KiB
## Project 2: Advanced SQL, Python DB app

### Setup

Download the startup files [here](https://ceres.cs.umd.edu/424/assign/assignment2Dist.tgz).

Build (`docker build --rm -t 424 .`) and run (`docker run -it -v $(pwd):/424 424`)
the container as before, then `cd /424` (inside the container).
The main differences here are that we have loaded a skewed database
(`flightsskewed`), and also installed Java. 

Ensure that the `flightsskewed` database has been created, together with tables populated from  `large-skewed.sql`. Use
`flightsskewed` for all parts of this assignment.

## Part 1: Query Construction (2 pts)

Consider the following query which finds the number of flights
  taken by users whose name starts with 'William'.

```
select c.customerid, c.name, count(*)
from customers c join flewon f on (c.customerid = f.customerid and c.name like 'William%')
group by c.customerid, c.name
order by c.customerid;
```

The result however does not contain the users whose name contains 'William' but who did
not fly at all (e.g., `cust733`). So we may consider
modifying this query to use a left outer join instead, so we get those users as well:

```
select c.customerid, c.name, count(*)
from customers c left outer join flewon f on (c.customerid = f.customerid and c.name like 'William%')
group by c.customerid, c.name
order by c.customerid;
```

Briefly explain why this query does not return the expected answer (as below), and rewrite the query so that it does.

The final answer should look like this:
```
	customerid |              name              | count
	------------+--------------------------------+-------
	cust727    | William Harris                 |     4
	cust728    | William Hill                   |     6
	cust729    | William Jackson                |     6
	cust730    | William Johnson                |     5
	cust731    | William Lee                    |     0
	cust732    | William Lopez                  |     6
	cust733    | William Martinez               |     0
	cust734    | William Mitchell               |     6
	cust735    | William Moore                  |     5
	cust736    | William Parker                 |     4
	cust737    | William Roberts                |     8
	cust738    | William Robinson               |     7
	cust739    | William Rodriguez              |     5
	cust740    | William Wright                 |     8
	cust741    | William Young                  |     5
	(15 rows)
```

Save your query in  `queries.py` as the definition of `queryWilliam`.
Include your explanation as a comment above this definition.

---
## Part 2: Trigger (3 pt)

We have built a table `NumberOfFlightsTaken(customerid, customername,
numflights)` to keep track of the total number of flights taken by each
customer:
```
create table NumberOfFlightsTaken as
select c.customerid, c.name as customername, count(*) as numflights
from customers c join flewon fo on c.customerid = fo.customerid
group by c.customerid, c.name;
```

Since this is a derived table (and not a view), it will not be kept
up-to-date by the database system.  We (you) will therefore
write a `trigger` to keep this new table updated when a new entry is inserted
into, or a row is deleted from, the `flewon` table. Remember that the `customerid`
corresponding to a new `flewon` insertion update may not yet exist in the
`NumberOfFlightsTaken` table. In that case, it should be added to `NumberOfFlightsTaken`
with a count of 1. 

Similarly, if deletion of a row in `flewon`
results in a user not having any flights, then the corresponding tuple for
that user in `NumberOfFlightsTaken` should be deleted. 

The trigger code should be submitted as the definition of `queryTrigger` in
the `queries.py` file, as straight SQL. This file has an incorrect and
incomplete version of such a trigger commented out. Uncomment this version,
fix it, and test by running `SQLTesting.py`. 

Look inside this file to see
the insertions and deletions being tested, and think about what the proper
actions should be.

Notes:
- `python3 SQLTesting.py` will clean the db, set up `NumberOfFlightsTaken`,
  and run both your queries, printing their outputs.
- We will again be using automated testing. Ensuring that your queries looks to
  be generating correct data w/ `SQLTesting.py` should ensure that the
  autograder will produce correct results as well.

### Non-VM instructions (e.g. Macs)

- Create db w/ `createdb flightsskewed`.
- Ensure that the *user* in SQLTesting.py matches your user account.

## Submission

Submit the `queries.py` file [on gradescope](https://www.gradescope.com/courses/811728/assignments/4669976/review_grades).