-
Peter J. Keleher authoredPeter J. Keleher authored
CMSC424 Fall 2024 Assignment 0: Computing Environment (ungraded)
Over the course of the semester, you will work with a variety of software packages, possibly including PostgreSQL, Apache Spark, Python Django, Mongo, and others. Installing these packages and getting started can often be a hassle because of software dependencies. This sesmester we have simplified things by installing everything by creating Dockerfiles that eliminate most of the hassles.
The procedure for each assignment will be as follows:
- access the assignment page from the schedule (this page)
- download the install folder (assignment0Dist.tgz).
- create and run the Assignment's Docker container on your machine
- log in to the container and create your solution
- test and submit through gradescope.
This Assignment 0 is just a complete end-to-end example of the process, except submitting to Gradescope.
We will use Docker
Desktop for our
projects this semester. The specification for the docker container is
in Dockerfile
.
-
Insert
WORKDIR /424
as the third line of your Dockerfile, if it is not already there. -
Build the container image as follows:
docker build --rm -t 424 .
-
Verify that the imeage has been build in your docker Desktop. Assuming you are using bash, run the image:
docker run -it -v $(pwd):/424 424
- For tcsh use
docker run -it -v `pwd`:/424 424
. - On windows, the following worked for me:
docker run -it -v "${PWD}:/424" 424
. You might have to install WSL 2, and for this powershell will have to be running as administrator as well.
- For tcsh use
-
cd /424
from within the container.
You are now inside the docker container in a directory (/424
) that
mirrors your host directory where you ran the docker commands. Changes
made to this directory either from inside or outside the container are
reflected on the other side. You can now edit or create files from
outside the container and see the changes immediately from inside.
Use the "run" command each time you wish to get into the database.
PostgreSQL
PostgreSQL is a powerful, full-fledged relational database system, and will be used for several assignments.
The current version of PostgreSQL is 12.2 (verify by running the query
select version();
Following steps will get you started with creating a database and populating it with the University
dataset provided on the book website: http://www.db-book.com
-
You will be using PostgreSQL in client-server mode. Recall that the server is a continuously running process that listens on a specific port (the actual port would differ, and you can usually choose it when starting the server). In order to connect to the server, the client will need to know the port. The client and server are often on different machines, but for you, it may be easiest if they are on the same machine (i.e., the virtual machine).
-
Using the psql client is the easiest -- it provides a command-line access to the database. But there are other clients too, including a GUI (although that would require starting the VM in a GUI mode, which is a bit more involved). We will assume psql here. If you really want to use the graphical interfaces, we recommend trying to install PostgreSQL directly on your machine.
-
Important: The server should be already started on your virtual machine -- you do not need to start it. However, the following two help pages discuss how to start the server: Creating a database cluster and Starting the server
-
PostgreSQL server has a default superuser called postgres. You can do everything under that username, or you can create a different username for yourself. If you run a command (say
createdb
) without any options, it uses the same username that you are logged in under (i.e.,vagrant
). However, if you haven't created a PostgreSQL user with that name, the command will fail. You can either create a user (by logging in as the superuser), or run everything as a superuser (typically with the option: -U postgres). -
We will use the root user, which has already been created for you.
-
After the server has started, the first step is to create a database, using the createdb command. PostgreSQL automatically creates one database for its own purpose, called postgres. It is preferable you create a different database for your data. Here are more details on createdb: http://www.postgresql.org/docs/current/static/tutorial-createdb.html
-
We will create a database called university.
createdb university
-
Once the database is created, you can connect to it. There are many ways to connect to the server. The easiest is to use the commandline tool called psql. Start it by:
psql university
psql takes quite a few other options: you can specify different user, a specific port, another server etc. See documentation: http://www.postgresql.org/docs/current/static/app-psql.html
-
Note: you don't need a password here because PostgreSQL uses what's called
peer authentication
by default. You would typically need a password for other types of connections to the server (e.g., through JDBC).
Now you can start using the database.
-
The psql program has a number of internal commands that are not SQL commands; such commands are often client and database specific. For psql, they begin with the backslash character:
\
. For example, you can get help on the syntax of various PostgreSQL SQL commands by typing:\h
. -
\d
: lists out the tables in the database. -
All commands like this can be found at: http://www.postgresql.org/docs/current/static/app-psql.html.
\?
will also list them out. -
To populate the database using the provided university dataset, use the following:
\i DDL.sql \i smallRelationsInsertFile.sql
-
For this to work, the two .sql files must be in the same directory as the one where you started psql. The first command creates the tables, and the second one inserts tuples in it.
-
Run a query, e.g.:
select count(*) from instructor;
-
Create a different database
university_large
for the larger dataset provided (largeRelationsInsertFile.sql
). Since the table names are identical, we need a separate database. You would need this for Quiz 1.
FAQs
You may be able use PostgreSQL directly on a mac via Homebrew:
brew install postgresql@14
brew services restart postgresql@14
Before running SQLTesting.py
, you also need to install psycopg2
:
pip3 install psycopg2
Submit
Ungraded, no submit.