auto

80e8e3cf · Peter J. Keleher · 9aa33c16 · 80e8e3cf
Commit 80e8e3cf authored 1 year ago by Peter J. Keleher
--- a/p6.md
+++ b/p6.md
+# Project 6: Supporting High-level Abstractions From a Shared Log
+**v1.0**<br>
+**Due Dec 10**
+## Setup
+Download files [here](https://ceres.cs.umd.edu/818/projects/p6.tgz?1).
+## Overview
+[tango-raft](tangoRaft.jpg)!
+This project will require you to build three *conflict-free replicated
+data types* on top of the shared, replicated log you built in P5.
+For all three types, *writes* are in the log, *reads* are not.
+For example, with the *intCRDT*, an integer increment
+conflictfree-replicated-data-type (CRDT), each increment is written to
+the log. Reading the type consists of traversing the log and adding
+all the increments.
+With the *transactional key store*, writes to the KV are in the log,
+reads are not. This is complicated by the transactional nature of the
+KV. Like w/ Tango, we expect transactions to support strict
+serializability. Writes from transactions that abort do not affect the
+KV.
+Finally, in the *tree* type, you will create a shared tree that
+supports concurrent modifications via simple mutations.
+You will build your tango support in a new `p6/tango` module. Your
+applications will call the tango module API to access the shared log.
+## Building and Testing the Tango Interface
+Our "tango" implementation is based relatively closely on the the
+Tango paper [1] we discussed in class. In particular, the paper
+defines a Tango runtime abstraction that provides `update_helper`s and
+`query_helper`s. The former adds a command to the shared log, while
+the latter brings a local object up-to-date with respect to the most
+recent local view of the log.
+We will dispense w/ the object wrapper and define:
+```
+func TangoQueryHelper(obj TangoObject) ()
+func TangoUpdateHelper(obj TangoObject, cmd string) string ()
+```
+Our shared log commands are strings, so the update helper merely adds a string
+to the log.
+We also define a *TangoObject*:
+```
+type TangoObject interface {
+	Oid() int64          // object ID
+	Tid() int64          // transaction ID
+	Apply(data string)
+}
+```
+All three of your abstractions will conform to the TangoObject
+interface, allowing your tango system layer to be oblivious to the
+application semantics.
+## Details
+### Changes to the RPC definitions
+1. `CommandRequest` includes fields `Oid` and `Tid`. 
+1. `LogEntry` also includes `Oid` and `Tid` fields. 
+1. New `RetrieveCommitted` RPC to retrieve all *committed* log entries
+   after a particular log slot.
+### The *tango log* 
+The tango module keeps a local copy of the shared log, updated
+whenever `TangoQueryHelper()` is called. Much of the time this log is
+therefore only a prefix of the full shared log. 
+To make this more concrete, consider the workflow of `intCRDT`, with
+implements reads and conflict-free increments (updates). Each
+`intCRDT` object consists only of it's object ID ("oid") and its state
+("state"). An increment is written to the log via `UpdateHelper`,
+which you will implement in the tango module. 
+`UpdateHelper`
+packages the increment and the OID into a new `pb.LogEntry`, and sends
+to the local raft instantiation via the `Command()` RPC, which has
+been enhanced to allow both object and transaction IDs to be
+specified. `intCRDT` is not transactional, so the TID can be left
+blank (the "zero value" of an int is...0). 
+`Command()` is synchronous, so it does not return until the increment
+has been committed.
+Reads of `intCRDTs` are implemented by calling `QueryHelper`,
+parameterized by OID, which
+has the following tasks:
+- Sync the local copy of the log w/ the shared version via the new
+  `RetrieveCommitted()` raft RPC. The request specifies the library's
+  last local entry; the RPC returns everything after it.
+- Parse through each of the log entries, in order, calling
+  `TangoObject.Apply()` for each entry updating the object w/ OID.
+The object's value does not have to be returned because the `Apply()` calls
+will have already updated the `intCRDT`.
+The parser in `intCRDT.go` updates several objects, displaying the value
+of each at the end. The simple input script `scriptInt.1` has two fields per line:
+the *oid* and the increment to be applied to that object. If multiple
+applications run the same script concurrently, the exact interleaving
+is non-deterministic, but the final value should remain unchanged if
+we run the same two copies multiple times.
+### Transactional semantics
+The above is relatively straightforward, but transactional semantics
+require a bit more mechanism. First, every line in `scriptKV.1` consists of
+an *oid*, a *tid* (transaction ID), and a command. The commands have
+the following implications:
+- "START" (transaction): Each client application, together with it's
+  instantiation of the tango library, as assumed to be
+  single-threaded. Only a single transaction is in progress from the
+  client at any time. The Tango transactional implementation relies on
+  maintaining readsets for the current transaction. Since only one
+  local transaction can be active at a time, only one readset need be
+  maintained. This readset is re-initialized at each transaction start.
+- "FINISH" (transaction): The app signals a commit request by issuing
+  "FINISH" to the raft abstraction, annotated with the current readset. 
+- "READ": calls `QueryHelper()`, and then returns the current value. 
+- "SLEEP": sleeps an integer number of seconds.
+- All others are strings that should be copied verbatim to the oid as
+  new values for the associate object.
+All transactional fates are determined independently, but
+deterministically, by each tango library. Recall (from the paper),
+that an object version can be specified by the index of the last log
+slot that modified the object. The "FINISH" command is annotated with
+the complete transactional readset by `UpdateHelper()` when called
+with a transaction "FINISH". The readset specifies each object read
+(as signaled by calls to `QueryHelper`), and the version seen by each
+read. For example, "FINISH,2-3,5-6,5-9" says that objects "2" and "5"
+were read during the transaction. The read of object 2 saw version 3,
+while there were two distinct reads of object 5, seeing versions 6 and
+9, respectively.
+Transactions commit only if no read objects are subsequently modified
+before the transaction attempts to commit. For example, the above
+transaction would be aborted if object 2 is modified to version 14 by
+a remote transaction *before* the local transaction finishes.
+Read objects *may* be modified by the local transaction and re-read
+with different result *without* causing the transaction to abort.
+These semantics are implemented in the query helper, which downloads
+the most recent shared log suffix and parses the log entries to
+determine the fate of any new transactions. Once a transaction's fate
+is determined, The "FINISH" is changed to either "COMMIT" or "ABORT"
+*in the local copy of the log*. 
+To summarize from the application point-of-view: transaction "starts"
+and "finishes" are sent to the tango module via `UpdateHelper()`, but
+the app sees no other application details, and is not part of the
+determination any transaction's fate. The app objects are affected
+only by `Apply()` calls. These calls are immediate for
+non-transactional updates, but *calls for transactional updates are
+delayed until a transaction is known to have committed*.
+## Testing.
+1. I will test by running two copies of `intCRDT.go` against "scriptInt.1"
+concurrently, multiple times. Each time the end result should be the
+same regardless of interleaving.
+2. I will test the transaction KV store by running `kv.go` with
+   "scriptKV.1" with one app. As the third READ is seen, I will start
+   another instance of `kv.go` running "scriptKV.2", which should
+   cause transaction 2 to abort. Transactions 1 and 3 will commit.
+3. You should come up scripts, similar to the KV scripts, to test your
+   log-based tree implementation. Details should be in your README.md file.
+## Random Details
+- The tree should support mutations such as:
+  - add a child
+  - move a child
+  - delete a child
+  Transactional semantics allow these to be combined atomically.
+## Submitting
+Submit by pushing to your repository. 
+- DO update the `README.md` to
+reflect what works, and what does not. 
+- Describe your tree
+implementation, and specify how I can recreate your demonstration.
+- Upload a video `demo.mp4` that shows you demonstrating all of your functionality,
+  as if this is the only thing I see. Note that I *will* look at your
+  code, and attempt to duplicate some of the functionality shown in
+  your video, but your video should be complete. If you are on a
+  mac, please use the "HandBrake" app to remove some of the bloat
+  (default configuration is fine). Do NOT upload a `.mov`.
+## Bibliography
+```
+[1] Balakrishnan, Mahesh, et al. "Tango: Distributed data structures
+    over a shared log." Proceedings of the twenty-fourth ACM symposium
+    on operating systems principles. 2013.