# Project 4: Epaxos
v1.01

**Goals:**
1. Correct, executing epaxos
2. Efficient and correct (wrt synchronization) log(s) maintenance.
1. Actually building the 2-d log, populating it accurately.
1. Both fast and slow paths, based on dependences returned by PREACCEPT replies. This implies:
   - accurate counting of PREACCEPT and ACCEPT acknowledgements, with full
      concurrency. In other words, you can not assume there is only a single
      operation with outstanding RPCs at a time.
   - accurate detection of conflicts, dependency tracking
1. Operation execution, which can only take place when all prior dependencies have been
    satisfied (the op executed)
1. Simple key-value store that will illustrate execution timings.
1. *Thrifty* consensus.

**Non-Goals:**
1. Garbage collection, long-lived replicas.
1. Fault tolerance and recovery.
1. Batching.

## Setup
The directory structure is provided [here](p4.tgz). The supplied files
are the following:
- `config.json`: defines replica endpoints and nothing else
- `client/client.go`: skeleton of client.go
- `pb/epaxos.proto`: skeleton of protobuf file
- `replica/replica.go`: skeleton of a replica. Put all replica code
  in this file.
- `replica/run.rb`: starts up multiple replicas at once. Kill them with `killall replica`.

## Key Value Store
The epaxos consensus group will be used to create a simple key-value (KV)
store with three operations: *put*, *get*, and *del*, with the obvious
meanings. Implement the key value store by keeping a local copy on
each replica, updating as write and delete operations are processed.

By default, all three operation types must be serialized by epaxos before
returning a response to the client. The `-a` option to the client can
be used to read a local (in the local replica) version of a data item
without waiting until the serialization completes.  This necessarily
can return incorrect results, but is faster.

I suggest implementing the project in two phases. First, implement the
overall communication patterns, second implement the Epaxos logic,
which will of course require some changes to the communication.

## Part 1: Communicating Processes

Build the entire communication infrastructure for the
project. Clients  communicate with replicas via unary (non-streaming) gRPC calls. 
Replicas communicate with each other through the *streaming API* ([overview](https://grpc.io/docs/guides/concepts.html)). 

## Replica

The general architecture of your replica is shown below for a five-replica system. 
Replicas talk to other replicas (to implement the Epaxos logic) and to 
clients.
You need only test with a single client, though the code should work seamlessly with multiple clients, 

Note that this figure shows six handlers (including "proposition"), but it's probably easier to implement with eight.
![here](arch.png)

In this example, replicas accept unary requests from clients via a `Propose()` handler,
and from other replicas via a `Dispatch()` handler; you must define
both in your protobuf file. Despite the asynchronous,
multi-threaded nature of the gRPC interface, this diagram structures the replica as a
single-threaded, non-blocking, event handler reading messages one-at-a-time from the `bigChannel`.
Messages from other replicas and from clients are all shoved down this
single large channel.

You do *not* have to implement your system exactly like this, but it
has the advantage of reducing synchronization requirements
and making debugging much simpler because only a single message
handler (client requests, requests or replies other replicas) ever
runs at a time. 
Note for this to work, all message handlers *must* be non-blocking.

All outgoing messages to replica *i* are serialized by pushing through `outchannels[i]`. The other
end of `outchannels[i]` is a single go thread that reads from the `outchannel` and writes
via a bi-directional stream to that other replica.


### Message Types and Definitions
I am not telling you what messages to send, nor am I defining them for
you in the protobuf files. However, the following is one relatively
straightforward approach, assuming you follow the "bigChannel"
approach outlined above.

Replicas have internal and external messages. *Internal*
messages are those that go through through bigChannel, and there are 
eight of them: three client requests,
"put", "get", and "del", and five replica-replica requests: "preaccept
request", "preaccept reply", "accept request", "accept reply", and
"commit".

*External* messages go between replicas, and between replicas and
clients.  External messages largely correspond to the internal
messages, though we must also define client reply messages.

Given that the eight internal messages all have corresponding external
messages, we can eliminate some tedium by using
protobuf definitions of the external messages in the internal messages
as well. However, they must all be pushed through
bigChannel, and all channels are of a single type.

One straightforward way of coercing these eight messages into a single
type is to use the protobuf "oneof" clause to define a
"BigChannelMessage" that includes one of eight other messages.  The
bigChannel reader calls the appropriate message handler by checking
"BigChannelMessage"'s "Type" field and calling the apprpriate handler.

### Replica communication setup
Each replica *dials* each other replica and creates a bi-directional stream. This gives us
a total of four streams between each replica pair, but **we only use two**. Messages sent
*to* another replica go through `outchannels` to one end of the stream created by
*dialing* the other replica. Messages *from* the other replica come in through
`Dispatch()` and get pushed into `bigChannel`.

**Details**

### Handling Client Requests
For this first part of the project, I suggest you merely get the message architecture working,
with message data nulled out. 
There may be many concurrent client proposals outstanding at any given time, but the
"bigDispatcher" (the reader at the end of *bigChannel*) must return
the reply to the correct client request thread  (gRPC allows 
each request to be handled by distinct threads).  When a proposal arrives from a client, a
replica should:
- receive it in the proper handler.
- create a "replyChannel" of type `pb.ProposeReply` (which you define
  to hold the client reply until used).
- wrap the request and the replyChannel into a "BigChannelMessage" (again,
you define), and push down bigChannel.
- block on the "replyChannel".

This "BigChannelMessage" will then be read from bigChannel and cause:
- the replyChannel to be saved in a `map`, indexed by the client
requests unique identifier.
- "PreacceptRequest" messages to be sent.

"PreacceptRequest" messages are read at the other replicas, and "PreacceptReply" messages
immediately returned. Once a quorum of replies has been collected:
- all other replicas should be sent "CommitRequest" messages, and
- a reply should be sent to the **correct** client.

How do we contact the correct client?  Remember that the original gRPC
client request handler is still blocked on the "replyChannel". So the
commit handler just need to create an appropriate client reply
message, push it back down to the gRPC handler through the channel,
and then the gRPC handler can wake up and reply to the client.

Straightforward, no?

### Client

The client communicates with replicas via unary RPC. For this project,
the only message sent to the replicas is to request  (via "-a") a KV command be
serialized by the replica group.

Note that creating a unique "identity" string for each request
simplifies is needed to use the above bigChannel approach.


## Part 2: Full System
Your system must fully track and commit accesses across all replicas, w/ commits
eventually bringing all replicas up to date.  This means that all operations must end up
consistently replicated across all replicas.

## Dependency and Conflict Tracking

New operations must be assigned a set of *dependencies*. The dependencies of a given
operation are defined as the nearest
conflicting operations *from each system replica* known at the time the
operation is handled in `bigDispatcher`. A "conflicting operation" in this
project is one accessing the same key in the key-value store, *even if that operation is a
read*. 
The status of a an operation (`PREACCEPTED`, `ACCEPTED`, `COMMITTED`) does not matter.

PREACCEPT and COMMIT messages make these dependencies known to other
replicas. PREACCEPTREPLY's potentially modify the set of dependencies known to the leader,
potentially requiring the *slow path* to be executed before commits can be sent.


## Operation Execution
	
Write operations must be *executed*, i.e. applied to the object store, before
the value written can be returned as the result of a read. The paper describes
operation execution in the pseudo-code below. In our system, execution is
implemented by `executeSome()`. This method builds the set of unexecuted
operations in the 2-D log, and then uses the below procedure to order the
operations for potential execution. Note that if the set contains an operation
that is not yet marked as committed locally, `executeSome()` can not
complete. Instead, it returns immediately, abandoning any work it has
done. When next restarting, restarts the algorithm resets to the beginning.

`executeSome()` is run in response to 1) reads committing, and at 2) periodic intervals
delimited by a *ticker* (which should default to 2 milliseconds). `bigDispatcher` should
now read from either **bigChan** *or* the ticker, using a `select` call.
`executeSome()` should return an ordered array of SCCs, each consisting of an ordered
sequence of the SCC's instructions.

<img width=400 src=exec.png>

Note that **Instances** have two distinct types of ordering imformation: the *sequence*
number and the *deps*. The latter is used to find strongly connected components. The
former is used order instructions within SCCs, and to order SCCs.
You may use `github.com/looplab/tarjan` to create your SCCs.


### Thriftiness
Only include the *next* necessary replicas.  For
example, in a 5-replica system, replica 3's *next* two replicas are 4 and 0.

### Aggregation
I am not requiring you to implement aggregation.

## Tarjan test
The Tarjan test tests the ability of your code Is correct in the original distribution. The core of the testing is
`TestTarjan()` through  `TestTarjan3()`. Do not change these.

What you do need to change is `doTest()`, which has two functions:
- translate the tests (in the synthetic `Inst` instances) into your
internal formats (possibly re-initializing and replacing the contents of your logs), and
- call your internal routines to separate the instructions into
distinct SCCs
- You also need to print out the resulting SCCs. The printout does not have to
  look exactly like mine, but you need to have the SCCs clearly delineated and
  ordered. Each instance inside should be identified by rep number, slot
  number, and sequence number.

The test arrays I give you give replica number, sequence number, and
dependencies for each instruction. The slot number can be inferred from the
ordering with respect to other instructions from the same replica.

Run the tests by typing `go test` in the directory w/ the tests (which is
usually the same directory as the where the replica is defined).

## Notes

- The first log entry should be 0.

## Command-Line Arguments

Your identical replicas will take the following arguments:
- `-c <config-file>`: JSON description of port numbers and
addresses. DEFAULTS TO `config.json` IN TOP-LEVEL DIRECTORY.
- `-r`: Index of local replica in the address list returned by the above.
- `-N`: How many replicas to be used.
- `-d <level>`: Debugging level:
   - level 0: nothing printed except maybe a line at the beginning
     summarizing options.
   - level 1: print messages sent and received.
   - level 2: print messages with some information in them (match the video).
- `-p`: Delay sending the commit messages by 10 seconds after they are legal to send.
  For example, a replica might be run as:  `go run replica.go -d2 -r0 -n3 -p`.
- `-t`: Thrifty operation.

The client takes the following arguments.  "<id>" is the index (0..)
of the replica to send the request to. All keys and values are
strings for this client, though you might want to use `byte` arrays in
messages to be more general (and prepare for project 5):
- `-c <config-file>`: JSON description of port numbers and
addresses. DEFAULTS TO `config.json` IN TOP-LEVEL DIRECTORY.
- `-o <operation>`: KV operation to be serialized. Operation should be of form:
   - `<id>,w,<key>,<value>`, all strings
   - `<id>,r,<key>`: synchronous read, blocks until read serialized via epaxos
   - `<id>,a,<key>`: asynchronous read, immediately satisfied by replica cache.


### Testing and grading

The ground truth is that you need to make your system emit the same lines of text as mine
does in the videos: <a href="https://sedna.cs.umd.edu/818/movies/f19P4part1.mp4">old whole movie</a>, and <a href="https://sedna.cs.umd.edu/818/movies/f19P4part2.mp4">new versions of stores and aggregations</a>.

1) [5 pts] Non-thrifty interaction, testing that all replicas are involved, commits
   sent at correct time:
      
        CLIENT
        go run client.go -o 0,w,k,v
        go run client.go -o 3,w,k,v
        
        REPLICAS
        lagoon> 4:Ready... rep 4, port "9005", N=5, exec=true, pause=false, agg=false, reps:[0 1 2 3]
        3:Ready... rep 3, port "9004", N=5, exec=true, pause=false, agg=false, reps:[0 1 2 4]
        1:Ready... rep 1, port "9002", N=5, exec=true, pause=false, agg=false, reps:[0 2 3 4]
        2:Ready... rep 2, port "9003", N=5, exec=true, pause=false, agg=false, reps:[0 1 3 4]
        0:Ready... rep 0, port "9001", N=5, exec=true, pause=false, agg=false, reps:[1 2 3 4]
        0:Sending msg "PREACCEPT" to 4
        0:Sending msg "PREACCEPT" to 3
        0:Sending msg "PREACCEPT" to 2
        0:Sending msg "PREACCEPT" to 1
        4:Sending msg "PREACCEPTREPLY" to 0
        3:Sending msg "PREACCEPTREPLY" to 0
        2:Sending msg "PREACCEPTREPLY" to 0
        1:Sending msg "PREACCEPTREPLY" to 0
        0:Sending msg "COMMIT" to 4
        0:Sending msg "COMMIT" to 2
        0:Sending msg "COMMIT" to 1
        0:Sending msg "COMMIT" to 3
        3:Sending msg "PREACCEPT" to 4
        3:Sending msg "PREACCEPT" to 0
        3:Sending msg "PREACCEPT" to 2
        3:Sending msg "PREACCEPT" to 1
        0:Sending msg "PREACCEPTREPLY" to 3
        2:Sending msg "PREACCEPTREPLY" to 3
        4:Sending msg "PREACCEPTREPLY" to 3
        1:Sending msg "PREACCEPTREPLY" to 3
        3:Sending msg "COMMIT" to 4
        3:Sending msg "COMMIT" to 1
        3:Sending msg "COMMIT" to 0
        3:Sending msg "COMMIT" to 2

2) [5 pts] **Thrifty** interaction, testing that **correct** replicas are involved, commits
   sent at correct time:

      
        CLIENT
        go run client.go -o 0,w,k,v
        go run client.go -o 3,w,k,v
        
        REPLICAS
        hub:~/dss/epaxos/solution/replica> ./run.rb 5 -d 1 -t
        go run replica.go -N 5 -r 0 -d 1&
        go run replica.go -N 5 -r 1 -d 1&
        go run replica.go -N 5 -r 2 -d 1&
        go run replica.go -N 5 -r 3 -d 1&
        go run replica.go -N 5 -r 4 -d 1&
        hub:~/dss/epaxos/solution/replica> 2:Ready... rep 2, port "9003", N=5, exec=true, pause=false, agg=false, reps:[3 4]
        3:Ready... rep 3, port "9004", N=5, exec=true, pause=false, agg=false, reps:[4 0]
        1:Ready... rep 1, port "9002", N=5, exec=true, pause=false, agg=false, reps:[2 3]
        0:Ready... rep 0, port "9001", N=5, exec=true, pause=false, agg=false, reps:[1 2]
        4:Ready... rep 4, port "9005", N=5, exec=true, pause=false, agg=false, reps:[0 1]
        0:Sending msg "PREACCEPT" to 2
        0:Sending msg "PREACCEPT" to 1
        2:Sending msg "PREACCEPTREPLY" to 0
        1:Sending msg "PREACCEPTREPLY" to 0
        0:Sending msg "COMMIT" to 4
        0:Sending msg "COMMIT" to 1
        0:Sending msg "COMMIT" to 2
        0:Sending msg "COMMIT" to 3
        3:Sending msg "PREACCEPT" to 0
        3:Sending msg "PREACCEPT" to 4
        4:Sending msg "PREACCEPTREPLY" to 3
        0:Sending msg "PREACCEPTREPLY" to 3
        3:Sending msg "COMMIT" to 4
        3:Sending msg "COMMIT" to 0
        3:Sending msg "COMMIT" to 1
        3:Sending msg "COMMIT" to 2
        
3) [5 pts] As above, but I will turn on the
   replicas' **pause** flags (see below). This delays commits by ten seconds, allowing me to
   guarantee that the slow path will be necessary (not the case above). Note
   that replica 3 must gain a majority quorum on an additional **accept**
   round after receiving PREACCEPT replies. This and all remaining tests will
   use thriftiness:
        
        CLIENT
        go run client.go -o 0,w,k,v
        go run client.go -o 3,w,k,v
        
        REPLICAS
        hub:~/dss/epaxos/solution/replica> ./run.rb 5 -d 1 -t -p
        go run replica.go -N 5 -r 0 -d 1 -p&
        go run replica.go -N 5 -r 1 -d 1 -p&
        go run replica.go -N 5 -r 2 -d 1 -p&
        go run replica.go -N 5 -r 3 -d 1 -p&
        go run replica.go -N 5 -r 4 -d 1 -p&
        hub:~/dss/epaxos/solution/replica> 0:Ready... rep 0, port "9001", N=5, exec=true, pause=true, agg=false, reps:[1 2]
        4:Ready... rep 4, port "9005", N=5, exec=true, pause=true, agg=false, reps:[0 1]
        1:Ready... rep 1, port "9002", N=5, exec=true, pause=true, agg=false, reps:[2 3]
        3:Ready... rep 3, port "9004", N=5, exec=true, pause=true, agg=false, reps:[4 0]
        2:Ready... rep 2, port "9003", N=5, exec=true, pause=true, agg=false, reps:[3 4]
        0:Sending msg "PREACCEPT" to 2
        0:Sending msg "PREACCEPT" to 1
        1:Sending msg "PREACCEPTREPLY" to 0
        2:Sending msg "PREACCEPTREPLY" to 0
        3:Sending msg "PREACCEPT" to 0
        3:Sending msg "PREACCEPT" to 4
        0:Sending msg "PREACCEPTREPLY" to 3
        4:Sending msg "PREACCEPTREPLY" to 3
        3:Sending msg "ACCEPT" to 0
        3:Sending msg "ACCEPT" to 4
        0:Sending msg "ACCEPTREPLY" to 3
        4:Sending msg "ACCEPTREPLY" to 3
        0:Sending msg "COMMIT" to 4
        0:Sending msg "COMMIT" to 2
        0:Sending msg "COMMIT" to 3
        0:Sending msg "COMMIT" to 1
        3:Sending msg "COMMIT" to 0
        3:Sending msg "COMMIT" to 4
        3:Sending msg "COMMIT" to 1
        3:Sending msg "COMMIT" to 2

4) [5 pts] As above, but I also want to see `Deps` and `seq` numbers on preaccept and accept messages.
        
        CLIENT
        go run client.go -o 0,w,k,v
        go run client.go -o 3,w,k,v
        
        REPLICAS
        hub:~/dss/epaxos/solution/replica> ./run.rb 5 -d 2 -t -p
        go run replica.go -N 5 -r 0 -d 2 -p&
        go run replica.go -N 5 -r 1 -d 2 -p&
        go run replica.go -N 5 -r 2 -d 2 -p&
        go run replica.go -N 5 -r 3 -d 2 -p&
        go run replica.go -N 5 -r 4 -d 2 -p&
        hub:~/dss/epaxos/solution/replica> 0:Ready... rep 0, port "9001", N=5, exec=true, pause=true, agg=false, reps:[1 2]
        4:Ready... rep 4, port "9005", N=5, exec=true, pause=true, agg=false, reps:[0 1]
        1:Ready... rep 1, port "9002", N=5, exec=true, pause=true, agg=false, reps:[2 3]
        2:Ready... rep 2, port "9003", N=5, exec=true, pause=true, agg=false, reps:[3 4]
        3:Ready... rep 3, port "9004", N=5, exec=true, pause=true, agg=false, reps:[4 0]
        0:preaccept deps: [-1 -1 -1 -1 -1], seq 1
        0:Sending msg "PREACCEPT" to 2
        0:Sending msg "PREACCEPT" to 1
        1:preacceptReply deps: [-1 -1 -1 -1 -1], seq 1
        2:preacceptReply deps: [-1 -1 -1 -1 -1], seq 1
        1:Sending msg "PREACCEPTREPLY" to 0
        2:Sending msg "PREACCEPTREPLY" to 0
        3:preaccept deps: [-1 -1 -1 -1 -1], seq 1
        3:Sending msg "PREACCEPT" to 0
        3:Sending msg "PREACCEPT" to 4
        0:preacceptReply deps: [0 -1 -1 -1 -1], seq 2
        0:Sending msg "PREACCEPTREPLY" to 3
        4:preacceptReply deps: [-1 -1 -1 -1 -1], seq 1
        4:Sending msg "PREACCEPTREPLY" to 3
        3:accept deps: [0 -1 -1 -1 -1], seq 2
        3:Sending msg "ACCEPT" to 0
        3:Sending msg "ACCEPT" to 4
        0:replyAccept deps: [0 -1 -1 -1 -1], seq 2
        0:Sending msg "ACCEPTREPLY" to 3
        4:replyAccept deps: [0 -1 -1 -1 -1], seq 2
        4:Sending msg "ACCEPTREPLY" to 3
        0:Sending msg "COMMIT" to 2
        0:Sending msg "COMMIT" to 3
        0:Sending msg "COMMIT" to 4
        0:Sending msg "COMMIT" to 1
        3:Sending msg "COMMIT" to 0
        3:Sending msg "COMMIT" to 4
        3:Sending msg "COMMIT" to 2
        3:Sending msg "COMMIT" to 1

5) [10 pts] Testing of your execute procedure. Three tests are defined in `tarjan_test.go`. I will
run them with `go test`, and the output should be:
        
        ubuntu-bionic:~/v/p4/solution/replica> go test
        0:
        one: 
        scc {
        	rep 0/sl 0/se 0/st 4 "text" deps [-1 -1 -1 -1 0]
        	rep 4/sl 0/se 0/st 4 "text" deps [0 -1 -1 -1 -1]
        }
        scc {
        	rep 0/sl 1/se 1/st 4 "text" deps [-1 -1 -1 -1 1]
        	rep 4/sl 1/se 1/st 4 "text" deps [1 -1 -1 -1 -1]
        }
        
        0:
        two: 
        scc {
        	rep 2/sl 0/se 1/st 4 "text" deps [-1 0 -1 -1 -1]
        	rep 1/sl 0/se 2/st 4 "text" deps [0 -1 -1 -1 -1]
        	rep 0/sl 0/se 3/st 4 "text" deps [-1 -1 0 -1 -1]
        }
        scc {
        	rep 3/sl 0/se 0/st 4 "text" deps [0 -1 -1 -1 -1]
        }
        
        0:
        three: 
        scc {
        	rep 1/sl 0/se 0/st 4 "text" deps [-1 -1 -1 -1 -1]
        }
        scc {
        	rep 0/sl 0/se 1/st 4 "text" deps [-1 0 -1 -1 -1]
        }
        scc {
        	rep 1/sl 1/se 2/st 4 "text" deps [0 0 -1 -1 -1]
        }
        scc {
        	rep 2/sl 0/se 0/st 4 "text" deps [-1 1 -1 -1 -1]
        }
        scc {
        	rep 1/sl 2/se 3/st 4 "text" deps [-1 1 0 -1 -1]
        }
        scc {
        	rep 2/sl 1/se 4/st 4 "text" deps [0 2 0 -1 -1]
        }
        
        PASS
        ok  	818f19/p4/solution/replica	0.006s
        
6) [5 pts] I will test your data store by writing into it, such as with `go run client.go,r,nice,55`,
followed by reading from a different replica (`go run client.go,r,nice,`). I should see
the new value of "nice" as follows:
        
        ./run.rb 3 -d 1
        
           and
        
        client> go run client.go,w,nice,55
        wrote to 0: "nice" = "55"
        client> go run client.go,r,nice
        read from 2: "nice" is "55"

7) [5 pts] Same thing, but here the replica pauses on the commit, also delaying the read:
        
        ./run.rb 3 -d 1 -p
        
           and
        
        client> go run client.go,w,nice,55
        wrote to 0: "nice" = "55"
        client> go run client.go,r,nice
        read from 2: "nice" is "55"

8) [5 pts] Same thing, but now the read is asynchronous, and returns immediately with the wrong value BEFORE the write completes:
        
        ./run.rb 3 -d 1 -p
        
           and
        
        client> go run client.go,w,nice,55
        wrote to 0: "nice" = "55" 
        client> go run client.go,a,nice
        read from 2: "nice" is "55"

## Video

Note that prints from reads and writes are slightly different from the above. I prefer your output to match the above.

![video](OUTp4complete.mp4)


