Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

Project 1: SQL Assignment, CMSC424, Fall 2019

The assignment is to be done by yourself.

Start by cd-ing into 424f19. Do a git pull origin master. You should see a project1 directory with the following files:

  1. Vagrantfile: Use this to create a new VM for project1.
  2. README.md: This file.
  3. small.sql: The SQL script for creating the data.
  4. queries.py: The file where to enter your answer; this is the file to be submitted
  5. answers.py: The results from correct queries on the small dataset.
  6. SQLTesting.py: File to be used for testing your submission -- see below.
  7. Go back into your VM and verify that the new project directory is visible. Note: The testing will be done on a different, larger dataset.

Schema

The flights dataset contains synthetic air flight data. Specifically it contains the following tables:

  1. airports: airportid, city, name, total2011, total2012
  2. customers: customerid, name, birthdate, frequentflieron
  3. airlines: airlineid, name, hub
  4. flights: flightid, source, dest, airlineid, local_departing_time, local_arrival_time
  5. flewon: flightid, customerid, flightdate

See the provided SQL file for the table definitions.

The dataset was generated synthetically: the airport ids and the cities were chosen from the biggest airports in the US, but the rest of the data is populated randomly. The data will not make sense. For example, two different flights between the same cities may have very different flight durations. The flight times between the cities may not correspond to geographical distances that you know. Some other information about the data:

  • The dates in the large database might be different than in the small.
  • Each customer may at most take one flight every day.
  • The flight times were chosen between 30 minutes to 5 hours randomly.
  • All flights are daily (start and end on a single day), and none are overnight.
  • For every flight from city A to city B, there is corresponding return flight from B to A.
  • The "flewon" table only contains the flight date -- the flight times must be extracted from the flights table.

In many cases (especially for complex queries or queries involving max or min), you will find it easier to create temporary tables using the with construct. This also allows you to break down the full query and makes it easier to debug.

You don't have to use the "hints" if you don't want to; there might be simpler ways to solve the questions.

Testing and submitting using SQLTesting.py

Your answers (i.e., SQL queries) should be added to the queries.py file. A simple query is provided for the first answer to show you how it works. You are also provided with a Python file SQLTesting.py for testing your answers.

  • We recommend that you use psql (as in psql flights) to design your queries, and then paste the queries to the queries.py file, and confirm it works.

  • SQLTesting takes quite a few options: use python3 SQLTesting.py -h to see the options.

  • To get started with SQLTesting, do: python3 SQLTesting.py -v -i -- that will run each of the queries and show you your answer and correct answer.

  • If you want to test your answer to Question 1, use: python3 SQLTesting.py -q 1. The program compares the result of running your query against the provided answer (in the answers.py file).

  • The -v flag will print out more information, including the correct and submitted answers etc.

  • If you want to test your answers to all questions (this is what we will do), use: python3 SQLTesting.py and look at the final total score.

  • -i flag to SQLTesting will run all the queries, one at a time (waiting for you to press Enter after each query).

  • Note that: We will basically run this same program on your submitted queries.py file, but with the larger dataset; your score on the assignment will be score output by the program. The program tries to do partial credits (as you can see in the code). It is very unlikely that your score on the larger, hidden dataset will be higher than your score on the provided dataset.

Submission Instructions

Submit the queries.py file using ELMS here. Due September 17.