Peter Keleher
cmsc818ePublicF2021b

Repository



A Database, and Anti-Entropy
Due: Oct 10, 2021, 11:59:59 pm, v1

Recent Changes:

Overview
This project is a relatively straightforward one-way syncing of one server to
another, but there are a few tricky aspects and you will need to spend a bit of time learning gRPC.
There are two aspects to our filesystem's state: the log and file
chunks. The log is easy to synchronize (for our purposes, at least), but we
are going to synchronize the chunk database using Merkle trees implemented
using gRPC.

Links


p3.tgz: provided files. Your P3 directory on the repository should have the structure given in this tarball.

gRPC (go quick start)


Anti-Entropy and Merkle Trees
Merkle Trees are forms of hash trees where each leaf is the hash of a
single chunk, and each non-leaf is the hash of the concatenation of
the hash of it's child nodes. We are going to take severe liberties w/
the definition of a hash or merkle tree in order to make the project
tractible. For this project, we
define the hash of a chunk or a string as the base32 encoding of the sha1
of that chunk or string.
Salient points:

The tree is a full balanced tree:

all leaves at same depth
all nodes except leaves have exactly 32 children
depth is an integer greater than 0


The tree's purpose is to summarize the set of all chunks locally
present at a server.
Each node has a sig, consisting of the hash of the data rooted at the subtree below.
We divide the space of chunk hashes into uniform-sized buckets
whose width is determined solely by the tree's height.
Each leaf represents one bucket, and summarizes all chunks present at
the server in that bucket.
The sig of a leaf represents the hash of the sigs of the chunks
under the leaf, where the chunk sigs are sorted alphanumerically and concatenated.
For example, given sig values "3B", "F3", "7A", and "K4", the hash
would be computed as hash(3B|7A|F3|K4).
The sig of an empty bucket is the hash of an empty string.
The Sig of a non-leaf is the hash of the concatenation of the hash
values of all children of that node, not sorted.

For this project we have defined the fan-out of the tree as 32, which
is conveniently the base of the encoding we use to create
signatures, allowing the tree to be visualized easily.

This figure is only showing a small portion of a depth (height) 3
tree. There are 32 interior nodes on the second level and
32² leaves on the third level.
The labels of leaf nodes are the hash of the concatenation of the
hashes of chunks in that leaf's bucket (buckets are not strictly
relevant for interior nodes).
Buckets get less wide as you go down the tree.
If this were a depth 1 tree (consisting only of the root), the root's
bucket would contain all possible hash values.
If this were a depth 2 tree, Node 0, Node 1 etc. would be leaves, and
their buckets would contain chunks whose hashes start with "A", "B", etc.
Node 0-0's bucket is those chunks whose hashes start with
"AA", Node 0-1's bucket is those chunks whose hashes start with "AB" etc.
Since this is a depth 3 tree, only the last level has "buckets" (the
ability to have chunks).
The buckets at Node 0, Node 1 and the root are empty. They are
just shown for explanatory purposes, and to show how chunks would be
assigned to buckets in level 1 or 2 trees.
The match between our fanout and the encoding means that it will be
easy to determine the correct bucket for each chunk.

Merkle Workflow
Note: most of the input and output messages for the gRPC are not yet
implemented.
The following is an example anti-entropy session initiated by from the command line.
The exchange is initiated by the command line client in the "cmd"
subdirectory by issuing a SyncChunks message to a server. The
following tells the server at port 8000 to pull all chunks from the server at port
8001, using a depth 3 tree at both servers:

   go run main.go -L ":8000" -R ":8001" -h 3 -m


Call "localhost:8000" s₁, and "localhost:8001"
s₂. As a result of the above:


s₁ sends a "TreeGet" msg to
s₂, specifying a tree height of 3.

s₂ locally creates a complete Merkle tree of height 3,
placing signatures of all chunks at the appropriate leaf nodes of the fully
balanced tree, and replies to
to s₁ with the root. This struct defines the root's
hash, as well as
the hashes of its 32 children in a ChildSigs field. s₂ caches
its tree as s₁ might need to query it again.
Meanwhile, s₁ has created a tree describing it's own local chunks,
and compares it's root hash with that returned by it's request to
s₂. If the two are equal, the exchange is complete.
If not, s₁ determines which branch(s) under the root are different
by comparing ChildSigs[i] on s₁ and s₂, for i in the range
0..31.
For each difference found, s₁ sends a "TreeGet" message to s₂, specifying
the desired node by supplying the sig. A sig of "" is
interpreted as a request for the root. Otherwise, the tree is
searched for the sig.
When the above recursive procedure reaches a leaf, the TreeSig
node returned has no ChildSigs defined, but may have defined
ChunkSigs, representing chunks in that leaf's bucket.
For each such chunk sig that is not already present locally in s₁,
S₁ issues an ordinary "get" message to s₂, bringing that chunk back
locally. s₁ then stores this chunk in the local store.

In the absence of failure or other operations proceeding concurrently
(both of which we ignore), this procedure will terminate with s₁'s set
of chunks including all those at s₂.  In the absence of any other
changes, a second identical sync will not need retrieve any additional chunks.
Note: This does not, however, imply that such a second sync will
not require multiple request-response pairs to establish this set
inclusion. s₁ might well have chunks that s₂ does not, so their trees
and hashes will continue to differ because  our anti-entropy is
uni-directional. In the absence of ongoing new data, however, syncing
s₂ to s₁, followed by s₁ to s₂, should always result in identical
chunk databases.

Your Tasks.

Task 1:  Define the RPC Interface
I have given you the layout of your p3 directory in the repository. The
pb directory once again holds protobuf definitions, but that same
file also now defines the RPC interface between servers (TreeGet, ChunkGet, LogGet), and between
servers and the command line tool (SyncChunks, SyncLog, Die).
I have not created all the service definitions, only
the first. However, cmd/main.go should run unaltered with your code,
giving you some help in figuring out what the service definitions
should look like.

Task 2: Implement Merkle Trees in your Code
Synchronizing the sets of chunks present at different servers requires
first describing those set, which we do with Merkle trees. You must
implement the merkle trees in your code. No using third-party
libraries for this.

Task 3: Implement the RPC Communication Routines
Synchronizing servers must both create merkle trees. The RPC routines
compare the two trees by requesting individual nodes from the remote
server. Only remote nodes differing from the corresponding local nodes
need be retrieved. Likewise, only remote chunks not present in the
local database need be retrieved.
The remote log is copied in it's entirety, and completely overwrites
the local log. FUSE calls can be used to invalidate nodes of the
local file system so that refreshes can be used to change the view to
the revise version. However, this is more trouble that it is worth for
this project.  Therefore, we assume that the local (pulling) server is
always restarted before inspecing its new namespace.
Factoids:

A tree of height 1 consists solely of the root, which is a leaf, and
has all chunks it its 'ChunkSigs' field.
Each TreeGet response from s₂ contains only a single tree node, but
that node may have multiple hashes in either ChildSigs or ChunkSigs.


Testing
I will run through the tests shown in the video, plus a few others. In
particular, I will:

Create a single file on one server, synchronize it to the other
server, verify chunk sizes and total data transferred.
Do the same with redis.
Verify that repeated pulls of identical systems only require a
single node, and no chunks, to be transferred. In particular,
appropriate data should be printed out from the tool in /cmd).
Verify that the above works for any height (though I will probably
just check height 3 and 1).


Notes

The maximum size of gRPC message is 4MB by default. We should not be
running up against this limit in my testing.
The _opt flavor of arguments is only supported on newer versions of the protobuf compiler. Should work fine on lagoon, where the compiler version is 3.6.1.
You do not need to worry about changes happening concurrently with
synchronization, or simultaneous synchronization with multiple partners.


Grading
I will use the following rough guide to assign grades:

60 pts: application of -m, -l, and -D commands from the tool result
in identical namespaces (after a restart)
20 pts: minimal number of chunks and nodes are transferred
20 pts: merkle tree works with different depths (but still same on
both communicating servers).

Create a README.md file in your p3 directory describing
what works, what doesn't, and any divergences from this
document.

Extra credit

20 pts: Get your system to immediately reflect changes without a restart.