From 31c2afc568e1977d762ed655da44f3a7b777bac6 Mon Sep 17 00:00:00 2001 From: "Peter J. Keleher" <keleher@cs.umd.edu> Date: Mon, 19 Aug 2024 15:07:51 -0400 Subject: [PATCH] auto --- assign0.md | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 assign0.md diff --git a/assign0.md b/assign0.md new file mode 100644 index 0000000..1b8ede7 --- /dev/null +++ b/assign0.md @@ -0,0 +1,116 @@ +# CMSC424 Fall 2024 Assignment 0: Computing Environment (ungraded) +### Finish by Sep 1, 11:59pm. + +Over the course of the semester, you will work with a variety of +software packages, possibly including PostgreSQL, Apache Spark, Python +Django, Mongo, and others. Installing these packages and getting +started can often be a hassle because of software dependencies. This +sesmester we have simplified things by installing everything by +creating *Dockerfiles* that eliminate most of the hassles. + +The procedure for each assignment will be as follows: +- access the assignment page from the schedule +- download the install folder +- create and run the Assignment's Docker container on your machine +- log in to the container and create your solution +- test and submit through gradescope. + +This Assignment 0 is just a complete end-to-end example of the +process, except submitting to Gradescope. + + +We will use [Docker +Desktop](https://www.docker.com/products/docker-desktop/) for our +projects this semester. The specification for the docker container is +in `Dockerfile`. + +1. Build the container image as follows: `docker build --rm -t 424 .` +1. Verify that the imeage has been build in your docker + Desktop. Assuming you are using bash, run the image: `docker run -it -v $(pwd):/424 424` (or for *tcsh* use ``docker run -it -v `pwd`:/424 424``). +1. `cd /424` from within the container. + +You are now inside the docker container in a directory (`/424`) that +mirrors your host directory where you ran the docker commands. **Changes +made to this directory either from inside or outside the container are +reflected on the other side.** You can now edit or create files from +outside the container and see the changes immediately from inside. + +Use the "run" command each time you wish to get into the database. + + +### PostgreSQL + +PostgreSQL is a powerful, full-fledged relational database system, and will be used for several assignments. +The current version of PostgreSQL is 12.2 (verify by running the query +`select version();` + +Following steps will get you started with creating a database and populating it with the `University` dataset provided on the book website: http://www.db-book.com + +* You will be using PostgreSQL in client-server mode. Recall that the server is a continuously running process that listens on a specific port (the actual port would differ, and you can usually choose it when starting the server). In order to connect to the server, the client will need to know the port. The client and server are often on different machines, but for you, it may be easiest if they are on the same machine (i.e., the virtual machine). + +* Using the **psql** client is the easiest -- it provides a command-line access to the database. But there are other clients too, including a GUI (although that would require starting the VM in a GUI mode, which is a bit more involved). We will assume **psql** here. If you really want to use the graphical interfaces, we recommend trying to install PostgreSQL directly on your machine. + +* Important: The server should be already started on your virtual machine -- you do not need to start it. However, the following two help pages discuss how to start the + server: [Creating a database cluster](http://www.postgresql.org/docs/current/static/creating-cluster.html) and [Starting the server](http://www.postgresql.org/docs/current/static/server-start.html) + +* PostgreSQL server has a default superuser called **postgres**. You can do everything under that username, or you can create a different username for yourself. If you run a command (say `createdb`) without any options, it uses the same username that you are logged in under (i.e., `vagrant`). However, if you haven't created a PostgreSQL user with that name, the command will fail. You can either create a user (by logging in as the superuser), or run everything as a superuser (typically with the option: **-U postgres**). + +* We will use the **root** user, which has already been created for you. + + +* After the server has started, the first step is to **create** a database, using the **createdb** command. PostgreSQL automatically creates one database for its own purpose, called **postgres**. It is preferable you create a different database for your data. Here are more details on **createdb**: + http://www.postgresql.org/docs/current/static/tutorial-createdb.html + +* We will create a database called **university**. + ``` + createdb university + ``` +* Once the database is created, you can connect to it. There are many ways to connect to the server. The easiest is to use the commandline tool called **psql**. Start it by: + ``` + psql university + ``` + **psql** takes quite a few other options: you can specify different user, a specific port, another server etc. See documentation: http://www.postgresql.org/docs/current/static/app-psql.html + +* Note: you don't need a password here because PostgreSQL uses what's called `peer authentication` by default. You would typically need a password for other types of connections to the server (e.g., through JDBC). + +Now you can start using the database. + + - The psql program has a number of internal commands that are not SQL commands; such commands are often client and database specific. For psql, they begin with the + backslash character: `\`. For example, you can get help on the syntax of various PostgreSQL SQL commands by typing: `\h`. + + - `\d`: lists out the tables in the database. + + - All commands like this can be found at: http://www.postgresql.org/docs/current/static/app-psql.html. `\?` will also list them out. + + - To populate the database using the provided university dataset, use the following: + ``` + \i DDL.sql + \i smallRelationsInsertFile.sql + ``` + + - For this to work, the two .sql files must be in the same directory as the one where you started psql. The first command creates the tables, and the + second one inserts tuples in it. + + - Run a query, e.g.: `select count(*) from instructor;` + + - Create a different database ```university_large``` for the larger dataset provided (`largeRelationsInsertFile.sql`). Since the table names + are identical, we need a separate database. You would need this for Quiz 1. + +--- + +## FAQs +You may be able use PostgreSQL directly on a mac via **Homebrew**: +``` + brew install postgresql@14 + brew services restart postgresql@14 +``` + +Before running `SQLTesting.py`, you also need to install `psycopg2`: +``` + pip3 install psycopg2 +``` + + +### Submit + +Ungraded, no submit. -- GitLab