diff --git a/assign2.md b/assign2.md index 4744b59b51dd2c8dc64d603c436b792c213e1ac3..4d69e3c98faaae0db380da646278ecb1f4dbbaa4 100644 --- a/assign2.md +++ b/assign2.md @@ -106,7 +106,6 @@ Notes: ### Non-VM instructions (e.g. Macs) - Create db w/ `createdb flightsskewed`. -- Ensure that the *user* in SQLTesting.py matches your user account. ## Submission diff --git a/assign3.md b/assign3.md new file mode 100644 index 0000000000000000000000000000000000000000..c43b51d2f00ba9c2704495b5c4eafc5dcb2cd3fc --- /dev/null +++ b/assign3.md @@ -0,0 +1,186 @@ +## Project 3: Object-Relational Mappings + +*The assignment is to be done by yourself.* + +Two parts: `psycopg` and `peewee`. + +### Setup + +Download the startup files [here](https://ceres.cs.umd.edu/424/assign/assignment3Dist.tgz?2). + +As before, use the Dockerfile to create and start an image: build (`docker build --rm -t 424 .`) and run (`docker run -it -v $(pwd):/424 424`) . +Ensure that the database that +the `flightsskewed` database exists, and that tables are populated from `large-skewed.sql`. Use +`flightsskewed` for all parts of this assignment. + +## Your Tasks + +**Task 1: External clients (25 pts)**: One of more prominent ways to use a database system is using an +external client, using APIs such as ODBC and JDBC, or the +Python DB-API 2.0 specification. + +We will be using the [psycopg](http://initd.org/psycopg/) instantiation of the Python DB spec to access the +database. There are many good tutorials, such as this +[Postgres/psycopg Tutorial](http://www.postgresqltutorial.com/postgresql-python), +the [default documentation](http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries) is quite good, +and you can also +get see a working example of queries in *SQLTesting.py* from Project 1. + +For those new to Python, I recommend the +[Python Tutorial](https://docs.python.org/3/tutorial/index.html). + +Your task to write a single Python function `runPsy()` function, in +file `psy.py`, that reads in a JSON file and updates the database +appropriately. `runPsy()` should take three parameters: +- `conn`: the database connection, already opened and initialized +- `curs`: a *cursor* for the connection. +- `jsonFile`: the JSON file name + +Database cursors implement the notion of a *session* with respect to a single +connection. Cursors are used to execute commands and return results, whether +one by one or all at once. + +Each line in the input file will consist of a single JSON object in one of the following two formats: + +- **New customer**, where information about a customer is provided in the following format + (though our example file has each input in a single line). You can assume that the + frequent flier airline name matches exactly what is there in the 'airlines' table. + +``` +{ "newcustomer": { + "customerid": "cust1000", + "name": "XYZ", + "birthdate": "1991-12-06", + "frequentflieron": "Spirit Airlines" + } +} +``` + +- **Flew On**, where information about the passengers in a flight is +provided. Create new rows in `flewon` for each customer, as well as the `customers` +table if the `customerid` does not already exist. + - In some cases the `customerid` provided may not be present in the database (cust1000 as seen below). In this case, first update the `customers` table (all the info is guaranteed to be +there), and then add tuples to the `flewon` table. + +``` +{ "flewon": { + "flightid": "DL108", + "flightdate": "2015-09-25", + "customers": [ + {"customerid": "cust1001", "name": "XYZ", "birthdate": "1991-12-06", "frequentflieron": "AA"}, + {"customerid": "cust25"}, {"customerid": "cust33"} + ] + } +} +``` + +### Testing + +`python3 testPsy.py` + +Notes: + +1. In the case of either of the following errors, you should just print "Error424" and then exit. You do not need to handle any other errors. + - If the `customerid` for a `newcustomer` update is already present, or + - if the `frequentflieron` does not have a match in the airlines table +2. Be sure to *commit()* the database connection at the end. Otherwise, no data will be modified. +3. `example.json` is an example input file consisting of the JSON input + above. We will test on a slightly different input file. +4. There are many Python JSON parsing libraries, but the simplest to use is the +[json module](https://docs.python.org/3/library/json.html) from the +[Python standard library](https://docs.python.org/3/library/). +5. `./clean-example.py` will remove the tuples added from `example.py`. + + +## Task 2: Object-Relational Mappings (`peewee`) (35 pts) + +Another way to use a database is through an object-relational-mapping (ORM), +which maps table rows onto objects that are visible in a programming +language. The result is that you can write a database application without +using SQL at all. + +We will re-write the previous program with exactly the same semantics, but using an object +model approach instead. There are many Python ORMs, including Django, but we +will use the simpler [Peewee](http://docs.peewee-orm.com/en/latest/peewee/quickstart.html#quickstart). + +Your goal will to create a function `runOrm()`, parameterized by JSON +file name, and implements the same functionality as above. + +Create your file by using a Peewee distribution tool called `pwiz`, which uses database + *introspection* to create python classes mirroring an existing postgres schema. Use as follows: +``` + pwiz.py -e postgresql -u root -P flightsskewed > orm.py # password is `root` +``` +This will create the scaffolding of your ORM program in `orm.py`. + + +Test this code as follows: +- Add `from datetime import date` to the top of `orm.py`. +- Add the following lines to the end of the file: +``` +def runORM(jsonFile): + Customers.delete().where(Customers.name == 'bob').execute() + Airports.delete().where(Airports.airportid == 'PET').execute() + + bob = Customers(name="bob", customerid='cust1010', birthdate='1960-01-15', frequentflieron='SW') + bob.save(force_insert=True) + + bwi = Airports(airportid='PET', city='Takoma', name='Pete', total2011=2, total2012=4) + bwi.save(force_insert=True) + + for port in Airports.select().order_by(Airports.name): + print (port.name) +``` + +Run by typing `python3 testORM.py` from the shell. You should have a record added to each of +*customers* and *airports*, followed by a listing of all airport +names. Verify that `bob` and `PET` were added using `psql`. + +### SubTask 1: Replicate the work in `psy.py` (25 pts) +Remove all the lines of `runORM()`. Replace them with code to +implement the same changes to +the database as in `psy.py` (read JSON strings, insert corresponding tuples into the +database), but do so entirely through the **Peewee** ORM interface. + +### SubTask 2: Update `NumberOfFlightsTaken` (10 pts) +Add code in `runORM` to update `NumberOfFlightsTaken` as in assign2, which has +already been defined (but not populated). In other words, clear out the table and repopulate it from scratch each time runORM() is called. + + + +## Notes +- You must add `.save(force_insert=True)` for new tuples or they will not be committed to the + database. +- `python3 SQLTesting.py` is once again your last step. Run this and it will call both + of your functions. Getting this to work means that your code will probably + also work on Gradescope. +- You might or might not need to eliminate blank lines before " class Meta:" lines in + order to avoid syntax errors, i.e. get rid of any blank lines + within class definitions (or ensure that proper indentation is maintained by adding spaces). + + +## Native mac / non-docker installations +On an apple-silicon mac you must: +- ensure python3/pip3 are installed +- `sudo pip3 install psycopg2` +- `sudo pip3 install peewee` +- ensure that the DB connections in the `test*.py` files are using a local name rather than vagrant, i.e.: +``` +conn = psycopg2.connect("dbname=flightsskewed user=keleher") +``` +instead of: +``` +conn = psycopg2.connect("dbname=flightsskewed user=vagrant password=vagrant") +``` +Password for the local name should be the empty password. However, note that you still have to use the root version when submitting to gradescope! + +Also, be sure you have: +``` + database = PostgresqlDatabase('flightsskewed', **{'host': 'localhost', 'user': 'root', 'password': 'root'}) +``` +in `orm.py`. + +## Submit Instructions +Submit `psy.py` and `orm.py` files by compressing (zip: on a mac +select them both and right-click "select") them to an archive with two +files, and then dropping that archive on to the <a href="https://www.gradescope.com/courses/811728/assignments/4669997">Assignment 3 submission</a>.