Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

CMSC424 Fall 2024 Assignment 0: Computing Environment (ungraded)

Over the course of the semester, you will work with a variety of software packages, possibly including PostgreSQL, Apache Spark, Python Django, Mongo, and others. Installing these packages and getting started can often be a hassle because of software dependencies. This sesmester we have simplified things by installing everything by creating Dockerfiles that eliminate most of the hassles.

The procedure for each assignment will be as follows:

  • access the assignment page from the schedule (this page)
  • download the install folder (assignment0Dist.tgz).
  • create and run the Assignment's Docker container on your machine
  • log in to the container and create your solution
  • test and submit through gradescope.

This Assignment 0 is just a complete end-to-end example of the process, except submitting to Gradescope.

We will use Docker Desktop for our projects this semester. The specification for the docker container is in Dockerfile.

  1. Install docker

  2. Insert WORKDIR /424 as the third line of your Dockerfile, if it is not already there.

  3. Build the container image as follows: docker build --rm -t 424 .

  4. Verify that the imeage has been build in your docker Desktop. Assuming you are using bash, run the image: docker run -it -v $(pwd):/424 424

    1. For tcsh use docker run -it -v `pwd`:/424 424.
    2. On windows, the following worked for me: docker run -it -v "${PWD}:/424" 424. You might have to install WSL 2, and for this powershell will have to be running as administrator as well.
  5. cd /424 from within the container.

You are now inside the docker container in a directory (/424) that mirrors your host directory where you ran the docker commands. Changes made to this directory either from inside or outside the container are reflected on the other side. You can now edit or create files from outside the container and see the changes immediately from inside.

Use the "run" command each time you wish to get into the database.

PostgreSQL

PostgreSQL is a powerful, full-fledged relational database system, and will be used for several assignments. The current version of PostgreSQL is 12.2 (verify by running the query select version();

Following steps will get you started with creating a database and populating it with the University dataset provided on the book website: http://www.db-book.com

  • You will be using PostgreSQL in client-server mode. Recall that the server is a continuously running process that listens on a specific port (the actual port would differ, and you can usually choose it when starting the server). In order to connect to the server, the client will need to know the port. The client and server are often on different machines, but for you, it may be easiest if they are on the same machine (i.e., the virtual machine).

  • Using the psql client is the easiest -- it provides a command-line access to the database. But there are other clients too, including a GUI (although that would require starting the VM in a GUI mode, which is a bit more involved). We will assume psql here. If you really want to use the graphical interfaces, we recommend trying to install PostgreSQL directly on your machine.

  • Important: The server should be already started on your virtual machine -- you do not need to start it. However, the following two help pages discuss how to start the server: Creating a database cluster and Starting the server

  • PostgreSQL server has a default superuser called postgres. You can do everything under that username, or you can create a different username for yourself. If you run a command (say createdb) without any options, it uses the same username that you are logged in under (i.e., vagrant). However, if you haven't created a PostgreSQL user with that name, the command will fail. You can either create a user (by logging in as the superuser), or run everything as a superuser (typically with the option: -U postgres).

  • We will use the root user, which has already been created for you.

  • After the server has started, the first step is to create a database, using the createdb command. PostgreSQL automatically creates one database for its own purpose, called postgres. It is preferable you create a different database for your data. Here are more details on createdb: http://www.postgresql.org/docs/current/static/tutorial-createdb.html

  • We will create a database called university.

     createdb university
  • Once the database is created, you can connect to it. There are many ways to connect to the server. The easiest is to use the commandline tool called psql. Start it by:

     psql university

    psql takes quite a few other options: you can specify different user, a specific port, another server etc. See documentation: http://www.postgresql.org/docs/current/static/app-psql.html

  • Note: you don't need a password here because PostgreSQL uses what's called peer authentication by default. You would typically need a password for other types of connections to the server (e.g., through JDBC).

Now you can start using the database.

  • The psql program has a number of internal commands that are not SQL commands; such commands are often client and database specific. For psql, they begin with the backslash character: \. For example, you can get help on the syntax of various PostgreSQL SQL commands by typing: \h.

  • \d: lists out the tables in the database.

  • All commands like this can be found at: http://www.postgresql.org/docs/current/static/app-psql.html. \? will also list them out.

  • To populate the database using the provided university dataset, use the following:

    \i DDL.sql
    \i smallRelationsInsertFile.sql
  • For this to work, the two .sql files must be in the same directory as the one where you started psql. The first command creates the tables, and the second one inserts tuples in it.

  • Run a query, e.g.: select count(*) from instructor;

  • Create a different database university_large for the larger dataset provided (largeRelationsInsertFile.sql). Since the table names are identical, we need a separate database. You would need this for Quiz 1.


FAQs

You may be able use PostgreSQL directly on a mac via Homebrew:

   brew install postgresql@14
   brew services restart postgresql@14

Before running SQLTesting.py, you also need to install psycopg2:

   pip3 install psycopg2

Submit

Ungraded, no submit.