# Project 5: Transactional Support From a Log
v1.12

# Overall Project Description

You will be implementing a fully-replicated transactional key-value store. This implies:
- reads and writes are only valid if in a *committed* transaction
- clients parse all new log entries, applying committed transactions locally.

We will be using the log as a dumb consistent ledger. To this end we need to
(1) create a way to commit arbitrary strings in the log, and (2) expose the shared
log to clients.

We are going to handle both of these through a new method in the `Client`
service: `Cmd`. `Cmd` takes identity, a string command, and a `seen`
value that specifies the prefix of the shared log seen by the client previously.

This function:
- commits the string onto the shared log,
- synchronously waits until it has been committed, and then
- returns everything in the log from `slot` until the slot containing the new command.

The client can use this to create local copy of the log that
it will use to implement and interpret gets, puts, and transactions.

`Cmd` and associated type definitions have already been added to `client.proto`.

## The Software Stack
The straightforward way to integrate
transactions is to build them directly into the log implementation. Primarily, this allows
dependencies between commits and transactional writes.

However, a shared log is a powerful abstraction that can be used to implement all sorts of
functionality even in it's cleanest, most generic form. We are therefore going to layer
the transactional functionality on top of the generic string interface described above.

Our simplified system is going to implement both a key-value store *and* transactions
by layering them over the simple string interface supported by the `Cmd` interface
described above. The client will construct a local representation of the log
by retrieving and piecing together segments of the shared log.

Once the local log is created, the client can process operations, in order, to interpret
transaction commands, determine whether they commit, and update a local copy of a
key-value store that reflects committed transactions.

Unfortunately, Epaxos does not support a totally ordered log, instead it has a 2-D log and
only supports a partial order over those entries. We are going to fix this in
order to support the *strict serializability* used in Tango. We are going to address this through two
mechanisms:
- Create per-replica total orderings by adding something like `inst.Deps[inst.Rep] =
inst.Slot - 1` (aka "Chuck's Hack") before creating edges in the execution graph. This ensures that
all instances of a single replica execute in order.
- Send all client operations to replica 0. This is essentially dumbing down Epaxos to make
it execute more like MultiPaxos.

Aside from Chuck's Hack, `replica.go` should only be modified to deal with the
new `Cmd` request. This means ensure that the command will be sent through the
BigChan and processed like gets and puts, and ensure that it will be
executed. Execution of a `Cmd` means only that the log segment delimited is
added to `ClientReply`.

### The Log
The log will be a totally ordered list of strings, each in one of the following four formats:
```
string = <client id>,<transaction id>,w,<key>,<value>
string = <client id>,<transaction id>,r,<key>
string = <client id>,<transaction id>,commit
string = <client id>,<transaction id>,abort
```
The first two formats define reads and writes in the log, while the last two are commit
and abort records.

A "client id" is something you make up in your input strings. For instance, if you are
running two instances of `transactionalClient.go` (both talking to
replica 0), you might type "1,1,w,k,v" into the first and "2,1,w,k,v" into the
second. These writes are in different transactions.

## Part 1: Serializable Transactions

Our transaction substrate is modeled on that in Tango. Update transactions are
*optimistic*, and commit only if isolation is preserved for the entire transaction
execution. Said another way:

&nbsp;&nbsp;&nbsp;**Definition 1:** *&nbsp;&nbsp;&nbsp;Ti commits iff for each read r in Ti,
the value read by* r *is not stale by the time Ti's commit record is encountered.*

Transactions that are observed to commit have their writes reflected in the clients local
key-value store.

### Write-only Transactions
Write-only transactions are those with no reads. They always commit.

### Read-only Transactions
Read-only transactions append only read commands to the shared log; there is no commit
record. Read transactions commit using the criteria in Definition 1. The transaction's
span in the log is defined as the log segment from the transaction's first read until
it's last.

Reads from read-only transactions are committed to the log with transaction 0, which is
ignored by other replicas.

## Part 2: Snapshot Isolation

Your decision as to how to implement this, but it need not involve any changes to the
log abstraction in `replica.go`

## Use of the `transactionalClient`
`transactionalClient.go` differs from `client.go` in that it reads commands
interactively. Once started, it reads commands (arbitrary strings) from STDIN
and sends them to replica 0. The sole exceptions are:
- `pause <n>` is interpreted as a command to pause for `n` seconds.
- an empty string (just carriage return) is ignored.

All other strings are sent to the replica.
At end-of-text (CTRL-D), `transactionalClient` prints the contents of the K/V
store and committed transactions.

## Testing

Your `transactionalClient.go` takes several arguments (see code), but the two important ones
are:
- `-p` - tells your code to do two things:
- write a commit/abort notification for each transaction as it's fate is decided
- at end-of-text write out the contents of your KV store, ordered lexicographically by key value. Only
written values need be printed.
- `-s` use snapshot isolation instead serializable isolation

### Strict Serializability
I will run the replica set as `run.rb 3`, and the client as `go run transactionalClient.go
-p` or `go run transactionalClient.go -p -s`.

One example. Time is going downwards. Note that transaction *1* of client *1* is different
from transaction *1* of client *2*. You can type these in interactively. Note that the
commands could be written by different instantiations of `transactionalClient.go`, but
equivalently can by written just by a single client. Single-client is what I will use.
```
1,1,w,A,0
1,1,w,B,0
1,1,commit

1,2,r,A
1,2,w,B,nice

2,1,r,A
2,1,w,B,foo

2,2,w,A,bar
2,2,r,B
2,2,commit

2,1,commit

1,2,commit
```

If one of the clients was invoked with option `-p`, it should print:
```
trans 1.1 commit
trans 2.2 commit
trans 2.1 abort
trans 1.2 abort
A="bar"
B="0"
```
---
And another:
```
1,1,w,A,0
1,1,w,B,0
1,1,commit

1,2,w,A,1
1,2,r,B

2,1,w,B,1
2,1,r,A
2,1,commit

1,2,commit
```

Your client w/ `-p` should print:
```
trans 1.1 commit
trans 2.1 commit
trans 1.2 abort
A="0"
B="1"
```

### Snapshot Isolation
Under snapshot isolation, the above two examples will print:

```
trans 1.1 commit
trans 2.2 commit
trans 2.1 commit
trans 1.2 abort
A="bar"
B="foo"
```
and
```
trans 1.1 commit
trans 2.1 commit
trans 1.2 abort
A="1"
B="1"
```

## Submitting
Submit by uploading tarball `cd /vagrant/go/src/818f19/p5; tar cvfz DIRID.tgz DIRID`, w/
your directory ID substituted in for "DIRID".

Upload [here on ELMS](https://umd.instructure.com/courses/1265666/assignments/5024673).