Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

Project 5: Transactions, CMSC424, Fall 2019

(Due Dec 8, midnight)

The assignment is to be done by yourself.

Overview

In this project, you will modify a very simple database system that we have written to illustrate some of the transactions functionality. The database system is written in Python and attempts to simulate how a database system would work, including what blocks it would read from disk, etc.

NOTE: This codebase is different from the Project 4 codebase, with simpler relation schema and querying interface, but a more complex Buffer Pool Manager, and file management. Also, this project requires python 3.X.

Another Important Note: We have taken a few shortcuts in this code to simplify it, which unfortunately means that there may be important synchronization failures we missed. Let me know if you see any unexpected behavior.

Synchronization in Python

Although you won't need to do any synchronization-related coding, it would be necessary for you to understand the basic Python synchronization primitives. The manual explains this quite well: https://docs.python.org/3/library/threading.html. Some basics:

  • Each transaction for us will be started in a sepearate thread (see testingLock.py for some examples). The main command for doing so is: threading.Thread, which takes a function name as an argument.
  • The main synchronization primitives are: Lock and RLock (Sections 16.2.2 and 16.2.3 in the manual above). You create them by calling threading.Lock() or threading.RLock(), and you use them by acquiring and releasing them. Only one thread can take a Lock or a RLock at any time.
  • Two other primitives build on top of the above: Conditions and Events. See the manual for the details on those. We use Conditions in transactions.py to signal threads.
  • Using with simplifies this: a Lock/RLock/Condition/Event is passed as an argument, and only one thread can be in the body of the with.

disk_relations.py

A Relation is backed by a file in the file system, i.e., all the data in the relation is written to a file and is read from a file. If a Relation is created with a non-existing fileName, then a new Relation is created with pre-populated 100 tuples. The file is an ASCII file -- the reading and writing is using json module.

You can open the relation1 file and see it's contents. The file is logically divided into blocks. The first block defines the column names, specifies the number of data blocks following, and gives a simple index (implemented as a hash). The index maps primary key values (always the first attribute) to blocks. Following this block are the blocks containing the tuples.

Relation files are opened at startup and only the first block is read. This provides to the db column names, the number of disk blocks, and the index. The file is then closed. When a tuple is later accessed the index is used to map the tuple to a block, and the relation file is reopened and only that block is read.

disk_relations.py also contains a LRU Buffer Pool Implementation, with a fixed size Buffer Pool.

transactions.py

This contains the implementaion of the Lock Table, and a Log Manager. Some details:

  • class LockTable: This class implements a standard hash-based lock table. Object names are used as the identifiers for locking (so objectids must be unique across all
    relations and tuples). The code here and elsewhere implicitly assumes that there is a single relation in the database. For each such object name, we keep track of the transations that currently have the lock (at most one in case of X locks), and a list of transactions that are waiting for a lock.
  • To avoid starvation: if T1 wants an S lock and T2 currently has one on that object, but T3 is waiting (say because it wants an X lock), we make T1 also wait, even though the locks would be compatible.
  • class LogManager: This should be relatively straightforward. The transactions create log records (see below). revertChanges undoes a transaction in case of aborts. You can use Transaction4 (in exampleTransactions.py) to test this.
  • class TransactionManager: manages the currently running transactions, basically doing some bookkeeping.
  • class TransactionState: This class encapsulates some of the basic functionality of transactions, including some helper functions to create log records, keeping track of what locks the transaction currently holds, etc.

Development

There are two testing files, testingLocks.py and testingRecovery.py ("testingX" means both), one for working w/ locks and one for recover.

Both contain code for testing. Running python3 testingLocks.py should get you started. Note that the first time you run it, it will create the two files relation1 and logfile, but after you kill it, the logfile will be inconsistent (we never write out CHECKPOINT records in normal course). So the second time you run it, it will error out since the restartRecovery code is not implemented. So if you want to work on the other two tasks, you should remove those two files every time.

Currently the only way to stop a runaway testingX is through killing it through Ctrl-C. If that doesn't work, try stopping it (Ctrl-Z), and then killing it using kill %1.

Your Tasks

Your task is to finish a few of the unfinished pieces in the two files (10 points for each). Search for TO BE IMPLEMENTED in transactions.py

  • [10 points] Implementing IX Locks in transactions.py: Currently the code supports S, X, and IS locks. So if a transaction needs to lock a tuple in the X mode, the entire relation is locked. This is of course not ideal and we would prefer to use IX locks (on the relation) to speed up concurrency. The code for this will largely mirror the code for IS. You will have to change the getXLockTuple() function and also the compatibility_list appropriately.
  • [20 points] Function restartRecovery() in transaction.py: This function is called if the log file indicates an inconsistency (i.e., if the logfile does not end with an empty CHECKPOINT record). If that's the case, then you must analyze the logfile and do a recovery on that.

For the recovery, you will need to write the code for both revertChanges and restartRecovery. revertChanges works by writing a compensating log record (CLR) for each update that a failed transaction does, in the correct order. See recoverytests-original for examples of the format.

"Redo"-ing a transaction does not add any new log records, but it does modify the database.

Note that modifying the database a la tup.setAttribute() modifies the in-memory copy of the tuple, but does not push it to disk. You need to ensure that all such tuples make it to disk before adding the checkpoint record to the log.

Testing

Locks For locking, you should verify that your IX locks are working as intended. For example, you should ensure that:

  • distinct tuples can be simultaneously locked exclusively by different transactions
  • a tuple locked exclusively prevents a shared lock on the entire table.

You can use testingLocks.py as the basis for your testing. The provided version will create three transactions that contend for the same lock (on the entire relation) and pause for differing amounts of time. Running python3 testingLocks.py should produce something like:

    hub:~/project5> python3 testingLocks.py
    Transaction 1 able to         get this lock on relation1
    Transaction 1 able to get this lock on 20
    Transaction 2: compatible so able to get this lock on relation1
    Transaction 2 able to get this lock on 30
    Transaction 3: compatible so able to get this lock on relation1
    Transaction 3 able to get this lock on 40
    Transaction 3: Releasing lock on 40
    Transaction 3: Releasing lock on relation1
    Transaction 2: Releasing lock on 30
    Transaction 2: Releasing lock on relation1
    Transaction 1: Releasing lock on 20
    Transaction 1: Releasing lock on relation1
    hub:~/project5> 

Note that you should modify this test (change the transaction, or create a new one) such that it tests tuple locks as discussed above. This test works fine w/ tuple locks defaulting to table locks, but our tests will not.

We will test locks by running transactions that delay through sleeps and use timing effects to ensure that your locks are conflicting when necessary, and not when not. We will also check for the lock messages ("able to get this lock on...", etc.).

Recovery Our testing will check both the contents of the database (the final versions of the relation file) and the logfile.

For recovery there are two artifacts: the logfile and the state of the database. We will test both. We have provided three test logfiles, together w/ correct answers. We have not provided the correct databases but you should be able to determine correctness just be looking in the relation file. To test the lock files, do the following, for example:

    hub:~/project5> cp recoverytests-original/recoverytest2_relation relation1
	hub:~/project5> cp recoverytests-original/recoverytest2_logfile logfile
	hub:~/project5> python3 testingRecovery.py
    Setting the last_tranasction_id to be 7
	Starting Restart Recovery.......
	hub:~/project5> 
	hub:~/project5> diff logfile recoverytests-answers/recoverytest2_logfile
    hub:~/project5> 

Submission

We will test your functionality in a (partially) automated fashion, using a set of test cases. You should only need to change transactions.py. Submit your modified transactions.py file here.