-
Peter J. Keleher authoredPeter J. Keleher authored
Project 5: Transactions, CMSC424, Fall 2019
(Due Dec 8, midnight)
The assignment is to be done by yourself.
Overview
In this project, you will modify a very simple database system that we have written to illustrate some of the transactions functionality. The database system is written in Python and attempts to simulate how a database system would work, including what blocks it would read from disk, etc.
NOTE: This codebase is different from the Project 4 codebase, with simpler relation schema and querying interface, but a more complex Buffer Pool Manager, and file management. Also, this project requires python 3.X.
Another Important Note: We have taken a few shortcuts in this code to simplify it, which unfortunately means that there may be important synchronization failures we missed. Let me know if you see any unexpected behavior.
Synchronization in Python
Although you won't need to do any synchronization-related coding, it would be necessary for you to understand the basic Python synchronization primitives. The manual explains this quite well: https://docs.python.org/3/library/threading.html. Some basics:
- Each transaction for us will be started in a sepearate thread (see
testingLock.py
for some examples). The main command for doing so is:threading.Thread
, which takes a function name as an argument. - The main synchronization primitives are:
Lock
andRLock
(Sections 16.2.2 and 16.2.3 in the manual above). You create them by callingthreading.Lock()
orthreading.RLock()
, and you use them by acquiring and releasing them. Only one thread can take a Lock or a RLock at any time. - Two other primitives build on top of the above:
Conditions
andEvents
. See the manual for the details on those. We useConditions
intransactions.py
to signal threads. - Using
with
simplifies this: a Lock/RLock/Condition/Event is passed as an argument, and only one thread can be in the body of thewith
.
disk_relations.py
A Relation
is backed by a file in the file system, i.e., all the data in the relation is written to a file and is read from a file. If a Relation is created with a non-existing fileName, then a new Relation is created with pre-populated 100 tuples. The file is an ASCII file -- the reading and writing is using json
module.
You can open the relation1
file and see it's contents. The file is logically divided into blocks. The first block defines the column names, specifies the number of data blocks following, and gives a simple index (implemented as a hash). The index maps primary key values (always the first attribute) to blocks. Following this block are the blocks containing the tuples.
Relation files are opened at startup and only the first block is read. This provides to the db column names, the number of disk blocks, and the index. The file is then closed. When a tuple is later accessed the index is used to map the tuple to a block, and the relation file is reopened and only that block is read.
disk_relations.py
also contains a LRU Buffer Pool Implementation, with a fixed size Buffer Pool.
transactions.py
This contains the implementaion of the Lock Table, and a Log Manager. Some details:
-
class LockTable
: This class implements a standard hash-based lock table. Object names are used as the identifiers for locking (soobjectid
s must be unique across all
relations and tuples). The code here and elsewhere implicitly assumes that there is a single relation in the database. For each such object name, we keep track of the transations that currently have the lock (at most one in case of X locks), and a list of transactions that are waiting for a lock. - To avoid starvation: if T1 wants an S lock and T2 currently has one on that object, but T3 is waiting (say because it wants an X lock), we make T1 also wait, even though the locks would be compatible.
-
class LogManager
: This should be relatively straightforward. The transactions create log records (see below).revertChanges
undoes a transaction in case of aborts. You can use Transaction4 (inexampleTransactions.py
) to test this. -
class TransactionManager
: manages the currently running transactions, basically doing some bookkeeping. -
class TransactionState
: This class encapsulates some of the basic functionality of transactions, including some helper functions to create log records, keeping track of what locks the transaction currently holds, etc.
Development
There are two testing files, testingLocks.py
and testingRecovery.py
("testingX" means both), one for working
w/ locks and one for recover.
Both contain code for testing. Running python3 testingLocks.py
should get you
started. Note that the first time you run it, it will create the two files relation1
and
logfile
, but after you kill it, the logfile will be inconsistent (we never write out
CHECKPOINT records in normal course). So the second time you run it, it will error out
since the restartRecovery code is not implemented. So if you want to work on the other two
tasks, you should remove those two files every time.
Currently the only way to stop a runaway testingX
is through killing it through Ctrl-C. If that doesn't work, try stopping it (Ctrl-Z), and then killing it using kill %1
.
Your Tasks
Your task is to finish a few of the unfinished pieces in the two files (10
points for each). Search for TO BE IMPLEMENTED in transactions.py
- [10 points] Implementing IX Locks in
transactions.py
: Currently the code supports S, X, and IS locks. So if a transaction needs to lock a tuple in the X mode, the entire relation is locked. This is of course not ideal and we would prefer to use IX locks (on the relation) to speed up concurrency. The code for this will largely mirror the code for IS. You will have to change thegetXLockTuple()
function and also thecompatibility_list
appropriately. - [20 points] Function
restartRecovery()
intransaction.py
: This function is called if the log file indicates an inconsistency (i.e., if the logfile does not end with an empty CHECKPOINT record). If that's the case, then you must analyze the logfile and do a recovery on that.
For the recovery, you will need to write the code for both revertChanges
and restartRecovery
. revertChanges
works by writing a compensating log
record (CLR) for each update that a failed transaction does, in the correct
order. See recoverytests-original
for examples of the format.
"Redo"-ing a transaction does not add any new log records, but it does modify the database.
Note that modifying the database a la tup.setAttribute()
modifies the in-memory copy of the tuple, but does not push it to disk. You need to ensure that all such tuples make it to disk before adding the checkpoint record to the log.
Testing
Locks For locking, you should verify that your IX locks are working as intended. For example, you should ensure that:
- distinct tuples can be simultaneously locked exclusively by different transactions
- a tuple locked exclusively prevents a shared lock on the entire table.
You can use testingLocks.py
as the basis for your testing. The provided version will create three transactions that contend for the same lock (on the entire relation) and pause for differing amounts of time. Running python3 testingLocks.py
should produce something like:
hub:~/project5> python3 testingLocks.py
Transaction 1 able to get this lock on relation1
Transaction 1 able to get this lock on 20
Transaction 2: compatible so able to get this lock on relation1
Transaction 2 able to get this lock on 30
Transaction 3: compatible so able to get this lock on relation1
Transaction 3 able to get this lock on 40
Transaction 3: Releasing lock on 40
Transaction 3: Releasing lock on relation1
Transaction 2: Releasing lock on 30
Transaction 2: Releasing lock on relation1
Transaction 1: Releasing lock on 20
Transaction 1: Releasing lock on relation1
hub:~/project5>
Note that you should modify this test (change the transaction, or create a new one) such that it tests tuple locks as discussed above. This test works fine w/ tuple locks defaulting to table locks, but our tests will not.
We will test locks by running transactions that delay through sleep
s and use timing
effects to ensure that your locks are conflicting when necessary, and not when not. We
will also check for the lock messages ("able to get this lock on...", etc.).
Recovery
Our testing will check both the contents of the database (the final versions of
the relation file) and the logfile
.
For recovery there are two artifacts: the logfile and the state of the database. We will test both. We have provided three test logfiles, together w/ correct answers. We have not provided the correct databases but you should be able to determine correctness just be looking in the relation file. To test the lock files, do the following, for example:
hub:~/project5> cp recoverytests-original/recoverytest2_relation relation1
hub:~/project5> cp recoverytests-original/recoverytest2_logfile logfile
hub:~/project5> python3 testingRecovery.py
Setting the last_tranasction_id to be 7
Starting Restart Recovery.......
hub:~/project5>
hub:~/project5> diff logfile recoverytests-answers/recoverytest2_logfile
hub:~/project5>
Submission
We will test your functionality in a (partially) automated fashion, using a set of test cases. You should only need to change transactions.py
. Submit your modified transactions.py
file here.