diff --git a/assign1.md b/assign1.md new file mode 100644 index 0000000000000000000000000000000000000000..080501195a37e186b0cefeceba04a9e539fd3564 --- /dev/null +++ b/assign1.md @@ -0,0 +1,86 @@ +## CMSC424 Fall 2023 Assignment 1: SQL +### Due Sep 8, 11:59pm. +*The assignment is to be done by yourself, do not turn in AI-generated work.* +You **will** need to create SQL queries on the first exam. + +The following assumes you have gone through PostgreSQL instructions and have ran some queries on the `university` database. + +Download the Assignment 1 distribution <a href="https://ceres.cs.umd.edu/424/assign/assignment1Dist.tgz">here</a>. +The resulting files are: + +1. populate.sql: The SQL script for creating the data. +1. queries.py: The file where to enter your answer +1. SQLTesting.py: File to be used for running the queries (in `queries.py`) against the database, and generate the file to be submitted. +1. Dockerfile: A modified Dockerfile for this project. + +### Getting started +Start the container as in Assignment 0. +- Create a new database called `elections` and switch to it (see the PostgreSQL setup instructions). +- Run `\i populate.sql` to create and populate the tables. + +Note that as usual, you do not *have* to use Docker. However, it might make things easier for you. + +### Schema +The dataset contains results of `senate` and `presidential` elections for a subset of the years. For the `senate`, it contains only the statewide results from 1976 to 2018, whereas for the `presidential` elections, it contains county-level data going back to 2000. + +The schema of the tables should be self-explanatory. + +The data was collected from https://electionlab.mit.edu/data. + +Some things to remember: +- The `special senate` elections are problematic. Typically senate elections take place every 6 years, with the two elections for a given state staggered. So generally speaking, any given year (say 2018), there would only be one senate election per state. However, because of special circumstances, there are +sometimes 2 elections in a given year for the same state. These two can be disambiguated based on the `specialelections` boolean flag in the database. + +In many cases (especially for complex queries or queries involving +`max` or `min`), you will find it easier to create temporary tables +using the `with` construct. This also allows you to break down the full +query and makes it easier to debug. + +You don't have to use the "hints" if you don't want to; there might +be simpler ways to solve the questions. + +### Testing using SQLTesting.py +Build (`docker build --rm -t 424 .`) and run (`docker run -it -v $(pwd):/424 424`) +the container as before, then `cd /424` (inside the container). + +You will be writing a series of queries to implement the prompts in +`queries.py`. Your answers (i.e., SQL queries) should be added to +`queries.py` file at the appropriate line. A simple query is provided +for the first answer to show you how it works. You are also provided +with a Python file `SQLTesting.py` for testing your answers. + +- We recommend that you use `psql` to design your queries, and then paste the queries to the `queries.py` file, and confirm it works. + +- SQLTesting takes quite a few options: use `python3 SQLTesting.py -h` to see the options. + +- To get started with SQLTesting, do: `python3 SQLTesting.py -i` -- that will run each of the queries and show you your answer. + +- Run your query for Question 1, for example: `python3 SQLTesting.py -q 1`. + +- `-i` flag to SQLTesting will run all the queries, one at a time (waiting for you to press Enter after each query). + +- **Note**: We will essentially run a modified version of `SQLTesting.py` that compares the returned answers against correct answers. So it imperative that `python3 SQLTesting.py` runs without errors. + +### Notes/Errata +- Database changes **do not** persist across container restarts, +i.e. across distinct run commands. This is because database state is +stored in the container's file system, which is ephemeral. This should +not be an issue as `queries.py` is mounted from the distribution directory. + +### Apple Silicon (M1, M2...) Macs + +You can **use postgres directly** via **Homebrew**: +``` + brew install postgresql@14 + brew services restart postgresql@14 +``` + +Before running `SQLTesting.py`, you also need to install `psycopg2`: +``` + pip3 install psycopg2 +``` + +## Submission Instructions +See `queries.py` for queries to write. +Submit your altered `queries.py` to Gradescope at <a href="https://www.gradescope.com/courses/535193/assignments/2852219">Assignment 1</a>. +