# Project 6: Supporting High-level Abstractions From a Shared Log
**v1.0**<br>
**Due Dec 10**

## Setup

Download files [here](https://ceres.cs.umd.edu/818/projects/p6.tgz?1).

## Overview

<img src=tango-raft width=600>

This project requires you to build three *conflict-free replicated
data types* on top of the shared, replicated log you built in P5.
For all three types, *writes* are in the log, *reads* are not.

For example, with the *intCRDT*, an integer increment
conflictfree-replicated-data-type (CRDT), each increment is written to
the log. Reading the type consists of traversing the log and adding
all the increments.

With the *transactional key store*, writes to the KV are in the log,
reads are not. This is complicated by the transactional nature of the
KV. Like w/ Tango, we expect transactions to support strict
serializability. Writes from transactions that abort do not affect the
KV.

Finally, in the *tree* type, you will create a shared tree that
supports concurrent modifications via simple mutations. You build this
entirely from scratch.

You will build your tango support in a new `p6/tango` module. Your
applications will call the tango module API to access the shared log.


## Building and Testing the Tango Interface
Our "tango" implementation is based relatively closely on the the
Tango paper [1] we discussed in class. In particular, the paper
defines a Tango runtime abstraction that provides `update_helper`s and
`query_helper`s. The former adds a command to the shared log, while
the latter brings a local object up-to-date with respect to the most
recent local view of the log.

We will dispense w/ the object wrapper and define:
```
func TangoQueryHelper(obj TangoObject) ()
func TangoUpdateHelper(obj TangoObject, cmd string) string ()
```
Our shared log commands are strings, so the update helper merely adds a string
to the log.

We also define a *TangoObject*:
```
type TangoObject interface {
	Oid() int64          // object ID
	Tid() int64          // transaction ID
	Apply(data string)
}
```
All three of your abstractions will conform to the TangoObject
interface, allowing your tango system layer to be oblivious to the
application semantics.


## Details

### Changes to the RPC definitions
1. `CommandRequest` includes fields `Oid` and `Tid`. 
1. `LogEntry` also includes `Oid` and `Tid` fields. 
1. New `RetrieveCommitted` RPC to retrieve all *committed* log entries
   after a particular log slot.

### The *tango log* 
The tango module keeps a local copy of the shared log, updated
whenever `TangoQueryHelper()` is called. Much of the time this log is
therefore only a prefix of the full shared log. 

To make this more concrete, consider the workflow of `intCRDT`, with
implements reads and conflict-free increments (updates). Each
`intCRDT` object consists only of it's object ID ("oid") and its state
("state"). An increment is written to the log via `UpdateHelper`,
which you will implement in the tango module. 

`UpdateHelper`
packages the increment and the OID into a new `pb.LogEntry`, and sends
to the local raft instantiation via the `Command()` RPC, which has
been enhanced to allow both object and transaction IDs to be
specified. `intCRDT` is not transactional, so the TID can be left
blank (the "zero value" of an int is...0). 
`Command()` is synchronous, so it does not return until the increment
has been committed.

Reads of `intCRDTs` are implemented by calling `QueryHelper`,
parameterized by OID, which
has the following tasks:
- Sync the local copy of the log w/ the shared version via the new
  `RetrieveCommitted()` raft RPC. The request specifies the library's
  last local entry; the RPC returns everything after it.
- Parse through each of the log entries, in order, calling
  `TangoObject.Apply()` for each entry updating the object w/ OID.
The object's value does not have to be returned because the `Apply()` calls
will have already updated the `intCRDT`.

The parser in `intCRDT.go` updates several objects, displaying the value
of each at the end. The simple input script `scriptInt.1` has two fields per line:
the *oid* and the increment to be applied to that object. If multiple
applications run the same script concurrently, the exact interleaving
is non-deterministic, but the final value should remain unchanged if
we run the same two copies multiple times.


### Transactional semantics
The above is relatively straightforward, but transactional semantics
require a bit more mechanism. First, every line in `scriptKV.1` consists of
an *oid*, a *tid* (transaction ID), and a command. The commands have
the following implications:
- "START" (transaction): Each client application, together with it's
  instantiation of the tango library, as assumed to be
  single-threaded. Only a single transaction is in progress from the
  client at any time. The Tango transactional implementation relies on
  maintaining readsets for the current transaction. Since only one
  local transaction can be active at a time, only one readset need be
  maintained. This readset is re-initialized at each transaction start.
- "FINISH" (transaction): The app signals a commit request by issuing
  "FINISH" to the raft abstraction, annotated with the current readset. 
- "READ": calls `QueryHelper()`, and then returns the current value. 
- "SLEEP": sleeps an integer number of seconds.
- All others are strings that should be copied verbatim to the oid as
  new values for the associate object.

All transactional fates are determined independently, but
deterministically, by each tango library. Recall (from the paper),
that an object version can be specified by the index of the last log
slot that modified the object. The "FINISH" command is annotated with
the complete transactional readset by `UpdateHelper()` when called
with a transaction "FINISH". The readset specifies each object read
(as signaled by calls to `QueryHelper`), and the version seen by each
read. For example, "FINISH,2-3,5-6,5-9" says that objects "2" and "5"
were read during the transaction. The read of object 2 saw version 3,
while there were two distinct reads of object 5, seeing versions 6 and
9, respectively.

Transactions commit only if no read objects are subsequently modified
before the transaction attempts to commit. For example, the above
transaction would be aborted if object 2 is modified to version 14 by
a remote transaction *before* the local transaction finishes.
Read objects *may* be modified by the local transaction and re-read
with different result *without* causing the transaction to abort.

These semantics are implemented in the query helper, which downloads
the most recent shared log suffix and parses the log entries to
determine the fate of any new transactions. Once a transaction's fate
is determined, The "FINISH" is changed to either "COMMIT" or "ABORT"
*in the local copy of the log*. 

To summarize from the application point-of-view: transaction "starts"
and "finishes" are sent to the tango module via `UpdateHelper()`, but
the app sees no other application details, and is not part of the
determination any transaction's fate. The app objects are affected
only by `Apply()` calls. These calls are immediate for
non-transactional updates, but *calls for transactional updates are
delayed until a transaction is known to have committed*.


## Testing.
1. I will test by running two copies of `intCRDT.go` against "scriptInt.1"
concurrently, multiple times. Each time the end result should be the
same regardless of interleaving.
2. I will test the transaction KV store by running `kv.go` with
   "scriptKV.1" with one app. As the third READ is seen, I will start
   another instance of `kv.go` running "scriptKV.2", which should
   cause transaction 2 to abort. Transactions 1 and 3 will commit.
3. You should come up scripts, similar to the KV scripts, to test your
   log-based tree implementation. Details should be in your README.md file.
   
## Random Details
- The tree should support mutations such as:
  - add a child
  - move a child
  - delete a child
  Transactional semantics allow these to be combined atomically.
  
## Submitting
Submit by pushing to your repository. 
- DO update the `README.md` to
reflect what works, and what does not. 
- Describe your tree
implementation, and specify how I can recreate your demonstration.
- Upload a video `demo.mp4` that shows you demonstrating all of your functionality,
  as if this is the only thing I see. Note that I *will* look at your
  code, and attempt to duplicate some of the functionality shown in
  your video, but your video should be complete. If you are on a
  mac, please use the "HandBrake" app to remove some of the bloat
  (default configuration is fine). For example, my 629MB .mov turned
  into an 87MB .mp4 w/ no noticable degradation. Do NOT upload a
  `.mov`.

## Video Walkthrough
[walkthrough](p6walkthrough.mp4)!

## Bibliography

```
[1] Balakrishnan, Mahesh, et al. "Tango: Distributed data structures
    over a shared log." Proceedings of the twenty-fourth ACM symposium
    on operating systems principles. 2013.