-
Peter J. Keleher authoredPeter J. Keleher authored
- Anti-Entropy, and Containers
- Recent Changes:
- Overview
- Setup
- Anti-Entropy and Merkle Trees
- Merkle Workflow
- Your Tasks.
- Task 1: Meet the Merkles
- Summary of interface changes.
- Server command line args:
- Antientropy
- Task 2: Containerize using docker compose
- Task 3: Visualize w/ prometheus and docker (optional, bonus points)
- Notes
- Testing
- Notes
- Testing
- Grading
- Video walkthrough
- Submit
Anti-Entropy, and Containers
Due: Oct 29, 2023, 11:59:59 pm, v2
Recent Changes:
Overview
This project is a relatively straightforward one-way syncing of one server to another, but there are a few tricky aspects, and you will need to spend time getting better at gRPC, plus docker containerization. We are going to synchronize the blobs using Merkle trees over gRPC.
Setup
Download files here.
Anti-Entropy and Merkle Trees
Merkle Trees are forms of hash trees where each leaf is the hash of a
single blob, and each non-leaf is the hash of the concatenation of
the hash of it's child nodes. We are going to take severe liberties w/
the definition of both hashes and merkle tree in order to make the project
tractible. For this project, we
define the hash of a blob or a string as the base32
encoding of the sha256
of that blob or string. Basically, the output of computeSig()
(which has been modified to accommodate empty strings, see the distribution).
Salient points:
- The tree is a full balanced tree:
- all leaves at same depth
- all nodes except leaves have exactly 32 children
- depth is an integer greater than 0, and defined as the number of levels in the tree.
- The tree's purpose is to summarize the set of all blobs locally present at a server.
- Each node has a sig, consisting of the hash of the data rooted at the subtree below.
- We divide the space of blob hashes into uniform-sized buckets whose width is determined solely by the tree's height.
- Each leaf represents one bucket, and summarizes all blobs present at the server in that bucket.
- The sig of a leaf represents the hash of the sigs of the blobs under the leaf, where the blob sigs are sorted alphanumerically and concatenated. For example, given sig values "3B", "F3", "7A", and "K4", the hash is computed approximately as hash(3B|7A|F3|K4).
- The sig of an empty bucket is an empty string.
- The Sig of a non-leaf is the hash of the concatenation of the hash values of all children of that node, not sorted. We also define the hash of an empty string as an empty string.
For this project we have defined the fan-out of the tree as 32, which is conveniently the base of the encoding we use to create signatures, allowing the tree to be visualized easily.
This figure is only showing a small portion of a depth (height) 3 tree. There are 32 interior nodes on the second level and 322 leaves on the third level.
The labels of leaf nodes are the hash of the concatenation of the hashes of blobs in that leaf's bucket (buckets are not strictly relevant for interior nodes). Buckets get less wide as you go down the tree. If this were a depth 1 tree (consisting only of the root), the root's bucket would contain all possible hash values.
If this were a depth 2 tree, the tree would consist of the root and
leaves. The first leaf would contain all local sigs
starting w/ A
(actually sha256_32_A).
The match between our fanout and our base32
encoding means that it will be
easy to determine the correct bucket for each blob.
Merkle Workflow
The distributed merkle tree comparison works through three RPCs: build, path, and get.
build tells the remote site to make a merkle tree summarizing all
blobs. We are using base 32 encoding, i.e. 234567ABCDEFGHIJKLMNOPQRSTUVWXYZ
. So we navigate from
the root to leaf via a string containing only the above
characters. The path to the root is "". A path from the
root to the a leaf in a depth 4 (height 3 in the usual nomenclature)
will be a string of length 3, e.g.: "Z4A".
The build RPC will return the blob tree hash (hash of top node in the merkle tree). A server saves all blob trees it has built in response to remote requests, never garbage-collecting. The build RPC should also return the total number of blob hashes in the tree.
A remote build trees is persistent in the remote server's memory, so a client can then navigate the remote tree via the path RPC. Arguments are the blob tree hash and a path string. The return value will include an array of sigs that describe the children of the node defined by the blob tree hash and the path. The children are blob hashes (sigs) if the node is a leaf, or hashes of hashes otherwise.
The hash of a node is defined as the hash of the concatenation of the hashes of the node's children, whether those children are also other nodes (for an interior nodes), or blobs for a leaf). The one caveat to this is that debugging is easier if you define the hash of an empty string as an empty string.
In the absence of any other activity, if
Your Tasks.
Task 1: Meet the Merkles
I have given you the layout of your p4 directory in the repository. The pb directory once again holds protobuf definitions, but you should alter that file to include RPCs for build, path, and pull.
Synchronizing the sets of blobs present at different servers requires first describing those sets, which we do with Merkle trees. You must implement the merkle trees in your code. No using third-party libraries for this.
A server pulling from another server:
- creates a local blob tree
- tells the remote server to create a blob tree by sending it a build RPC.
- uses path and get RPCs to navigate the remote tree and download blobs, if needed.
Summary of interface changes.
Client commands:
-
-s <> build: Send request toinstructing it to create a blob (merkle) tree, saving it at the remote site under the blob tree sig that is returned from this call. Return something like:
4619-sig tree on :5557: sha256_32_735PHA56VU35N75KLJIRP5WZJBYCVH6TPV7JPL53EQSUMUGASAPQ====
This says 4619 total blobs indexed by the tree w/ the blob tree (merkle tree) root's hash as above. -
-s <s_1> path : Send request tos_1instructing it to return the sigs: child sigs if interior node, blob sigs if leaf node.
io:~/p4> ./cli -s :5557 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/p4> ./cli -s localhost:5557 list | wc
96 97 6373
io:~/p4> ./cli -s :5557 build
95-sig tree on :5557: sha256_32_F7VND2B6WVI4XQTZN2FQI3UMIASI6VYKKDJ3DKCU5CLXWHF346AQ====
io:~/p4> ./cli -s :5557 path sha256_32_F7VND2B6WVI4XQTZN2FQI3UMIASI6VYKKDJ3DKCU5CLXWHF346AQ==== 244
sigs: 1
hash: sha256_32_23DQYGTDTBZTXT7YDIANEFWCNFERR4MZDHFQXM2M2GSN5IY233CA====
io:~/p4> ./cli -s :5557 path last 244
sigs: 1
hash: sha256_32_23DQYGTDTBZTXT7YDIANEFWCNFERR4MZDHFQXM2M2GSN5IY233CA====
io:~/p4> ./cli -s :5557 path last 24
sigs: 2
hash: sha256_32_64NGXT5VZ2WC56FIY2RAKU3MLHOWEYBNJARJ2G3L57FNGYSYGQVQ====
io:~/p4> ./cli -s :5557 path last 2
sigs: 3
hash: sha256_32_VZGPJQR5YRFZBCHDSL2OPUJ2SEYC74IDIQYYWPDCG3M44K5TE73A====
io:~/p4> ./cli -s :5557 path last ""
sigs: 95
hash: sha256_32_F7VND2B6WVI4XQTZN2FQI3UMIASI6VYKKDJ3DKCU5CLXWHF346AQ====
-
-s <s_1> pull <s_2>: Tellss_1to pull froms_2by sending a
build
tos_1and then using the returned blob tree sig to navigate through the remote blob tree usingpath
andget
calls. Return the number of RPCs and total time from the perspective of the server that is doing anti-entropy from another server.
io:~/818/projects/p4/solution> cli -s :5557 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/818/projects/p4/solution> cli -s :5558 pull localhost:5557
Pulled by :5558 from localhost:5557: 95 blobs using 215 RPCs, 0.081192458 secs
io:~/818/projects/p4/solution> cli -s :5558 pull localhost:5557
Pulled by :5558 from localhost:5557: 0 blobs using 1 RPCs, 0.044928375 secs
Note that this is show counts of blobs and RPCs used for each pull request, though the video does not. Yours should do this.
Server command line args:
-
-b : Directory where the blobdirectory should be
placed. CHANGES: your blob directory should be called
<dir>/blob<port>
, where the dir is specified like-b /tmp/
, and the port is specified w/-s <port>
. For example: "/tmp/blob5001". - -s : Bare integer.
-
-S serv1:port1,serv2:port2... : Comma-separated list of
servname:portno. If specified, this takes precendence over
anything
compose.yaml
says. Intended for use in local debugging, whereas the server names fromcompose.yaml
are used in the containerization test. - -p : Changes the anti-entropy period.
- -D : Changes the tree depth. Remember that a depth N counds all levels, including root and leaves (i.e. height+1).
Antientropy
Implemented just like the command-line pull command, except initiated by an anti-entropy threads spun off and waking up periodically. The duration between each anti-entropy should be a random interval of +/- 1 second from the value of the specified anti-entropy period.
Task 2: Containerize using docker compose
There are lots of good tutorials of this online. I'm going to give you the compose file for free:
version: "3"
services:
ubi1:
container_name: ubi1
hostname: ubi1
build:
context: .
dockerfile: Dockerfile
image: ulean:alpha
ports:
- 5001:5000
ubi2:
container_name: ubi2
hostname: ubi2
image: ulean:alpha
ports:
- 5002:5000
ubi3:
container_name: ubi3
hostname: ubi3
image: ulean:alpha
ports:
- 5003:5000
You can use this unmodified, as I will use it in testing. I will probably use more hosts, but they will be defined exactly like the above
The same Dockerfile
image, ulean:alpha
should be used for all
containers, but this image can be built any way you wish.
I do recommend using a multi-stage dockerfile to limit the size. The first stage has the
entire build environment and you compile your server binary
there. This might be 900 MB. In the second stage you copy your binary
from the first stage and just have a minimal environment for running
the binary. I'm currently using debian:bookworm-slim
and it results
in an image on the order of 95 MB.
Task 3: Visualize w/ prometheus and docker (optional, bonus points)
In the real world you want to hook your containers up to a dashboard
that will visually show you what is going on. Typical workflows use
prometheus to scrape http
ports on your
servers, and to then feed Grafana, which gives you an easy means of
defining a dashboard.
There are many paywalled articles about doing this, such as this
one.
I will give up to 20 bonus points for a good, clean docker compose
solution w/ prometheus, grafana each in different containers, plus the ubi containers, all in one
compose.yaml
. Your solution must include screen recording showing you
building the dashboard, starting up the system, and explaining the
whole thing well enough for me to duplicate what you have done.
As far as the the metrics, even a dashboard that shows a real-time bar graph of total number blobs at each server would be a nice start. Exposing this info should be straightforward, something along the lines of:
package store
import (
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
myMetric = prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "my_metric",
Help: "My custom metric",
},
)
)
// start up in it's own goroutine, like "go serveMetrics()"
func serveMetrics() {
prometheus.MustRegister(myMetric)
http.Handle("/metrics", promhttp.Handler())
go func() {
for {
// Update the metric value over time
myMetric.Set(42) // Update this value with number of local blobs
// Sleep for some time
time.Sleep(time.Second)
}
}()
http.ListenAndServe(":8080", nil)
}
Notes
-
server.go
is nowubiserver.go
. - trees should default to depth 4, each level 32-wide
The distro includes a simple bld
script that rebuilds the protobuf
interface and creates executables ubiserver
, cli
, and l
.
Testing
I will run through the tests shown in the video, plus a few others. In particular, with anti-entropy set to a very large value I will:
- bring up two local server as shown in the video, put data to one
of them, use build and path commands to navigate them, and they
should behave similarly to the video. Also, I will look in the
.blobstore
. - I'll also do pulls like the videos, except that I expect to see counts of blobs pulled and RPCs as described above.
- I'll then load a big distro into a server, possibly something else into another, and let anti-entropy equalize.
- I will bring it all up w/ docker compose and repeat the above.
Notes
- The maximum size of gRPC message is 4MB by default. We should not be running up against this limit in my testing.
-
async
NOT TESTED.
Testing
The following will be my testing procedure. Yours should look very similar, and accept the same command line arguments as below, or you will not get credit.
First thing we test is probing your merkle tree. To make this concrete, grab this file, and then untar it in /tmp/
, so that you now have a directory /tmp/blob5001
that has 95 blobs in it.
Test 1 (10 pts): Then fire up an ubiserver, which should NOT wipe out the blobdir you just uncompressed, and instead should treat it as its own.
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 1000 &
[1] 46194
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc
96 97 6373
Test 2 (10 pts): The above showed that your ubiserver can find my blobs. Next, create and probe a merkle tree:
io:~/818/projects/p4/solution> go run cli.go -s :5001 build
95-sig tree on :5001: sha256_32_MEKYINRK35PJ67GPQZRFM2X4KUOTOU756ZJWNDCCY6U4LYEHN3YQ====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last E
sigs: 3
combined: sha256_32_2VAB4CK4VUVO36JU5HPLONSF7MGG64DPMN4TRFQURUC5ZOOSBYRA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last EM
sigs: 2
combined: sha256_32_CN757EJBFCJFXNORM5QSS76GGB5TC2QAKTE3GV5MAQZAQDSG3UQQ====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last EMW
sigs: 1
combined: sha256_32_B5XMMYDZ2WFBHXRRICT7WMRSYEMTYGINA6UBVICJAAU6GPJ2RSBA====
"sha256_32_EMWG6JUKVXSLYUE4UWUR6HJIITXFJFF4S5DGQLEJ5QDLWFW3CSLQ===="
Test 3 (10 pts): Now do the same w/ a tree of depth 2:
killall ubiserver
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 1000 -D 2 &
[1] 47308
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc
96 97 6373
io:~/818/projects/p4/solution> go run cli.go -s :5001 build
95-sig tree on :5001: sha256_32_75LWMMDVKOOBFSXIFSDYFSP6PELM3YUZ4YKTHYH4LD5HAIITLN4A====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last ""
sigs: 95
combined: sha256_32_75LWMMDVKOOBFSXIFSDYFSP6PELM3YUZ4YKTHYH4LD5HAIITLN4A====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last E
sigs: 3
combined: sha256_32_N3C5MBRDCCJLLX3BBPEOUU2TVWSOMOD7DLN5DYILHEG52EVBMSUA====
"sha256_32_EMORFEZQVOUTK5JX66U7KR3M4IEHQK6WKV2DY6FBZORBSHKDQF6Q===="
"sha256_32_EMWG6JUKVXSLYUE4UWUR6HJIITXFJFF4S5DGQLEJ5QDLWFW3CSLQ===="
"sha256_32_ETBRHI7Y53CYS6IRZW5GETFFK3MG4RBD66JYJEQZS3HBNZ7SY3QQ===="
Test 4 (10 pts): Now start up another ubiserver and see if we can sync:
io:~/818/projects/p4/solution> killall ubiserver
No matching processes belonging to you were found
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 1000 &
[1] 47967
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5002 -S localhost:5001 -p 1000 &
[2] 47998
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5001]
serving gRPC on :5002, host "io.lan", servers [localhost:5001], /tmp/blob5002/
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc
96 97 6373
io:~/818/projects/p4/solution> go run cli.go -s :5002 list | wc
1 2 7
io:~/818/projects/p4/solution> go run cli.go -s :5001 pull "localhost:5002"
Pulled by :5001 from localhost:5002: 0 blobs using 215 RPCs, 0.079132334 secs
io:~/818/projects/p4/solution> go run cli.go -s :5002 pull "localhost:5001"
Pulled by :5002 from localhost:5001: 95 blobs using 215 RPCs, 0.096492958 secs
io:~/818/projects/p4/solution> go run cli.go -s :5002 pull "localhost:5001"
Pulled by :5002 from localhost:5001: 0 blobs using 1 RPCs, 0.054424625 secs
io:~/818/projects/p4/solution> go run cli.go -s :5001 pull "localhost:5002"
Pulled by :5001 from localhost:5002: 0 blobs using 1 RPCs, 0.0569965 secs
We:
- verified that the blobs were only in 5001's blobstore
- tried to pull from 5002 (empty) to 5001. This took many RPC even though it received nothing, as every level of the trees are different.
- pulled from 5001 to 5002 and saw many blobs moving, still w/ many rpcs
- repeated the above, this time took only 1 RPC to see there was nothing needed from 5001
- pulling from 5002 to 5001 brought nothing, but only took a single RPC this time
Test 5 (15 pts): Last, we need to check anti entropy. The above had anti-entropy turned off by setting a period of 20 minutes, so the first anti-entropy never happened.
io:~/818/projects/p4/solution> killall ubiserver
No matching processes belonging to you were found
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5002 -S localhost:5001 -p 10 &
[1] 49441
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5001]
serving gRPC on :5002, host "io.lan", servers [localhost:5001], /tmp/blob5002/
go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 10 &
[2] 49467
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
1 2 7
1 2 7
Both sides are empty. Now read sampledir into 5001:
io:~/818/projects/p4/solution> go run cli.go -a -s :5001 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
96 97 6373
82 83 5435
io:~/818/projects/p4/solution> pulled 95 blobs using 214 rpcs from localhost:5001
go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
96 97 6373
96 97 6373
Test 6 (15 pts): Now load a real big directory into the 5002 and make sure that 5001 can grab it as well:
io:~/818/projects/p4/solution> go run cli.go -a -s :5002 put sampledis-7.2.1
sha256_32_UGPT64YNNMEPL3SVEJJHCR2M5QC3EYD5YKOEPOSFJVDJRVSTFYEA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
96 97 6373
4715 4716 315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
96 97 6373
4715 4716 315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
pulled 4619 blobs using 5361 rpcs from localhost:5002
4715 4716 315848
4715 4716 315848
Test 7 (30 pts):
I will then verify that the containerization works. With the
Dockerfile
and compose.yaml
you have been given, I will bring up
three hosts:
docker compose up
In another window I will load sampledir
into one container and
sampledis-7.2.1
into another:
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc; go run cli.go -s :5002 list | wc; go run cli.go -s :5003 list | wc
1 2 7
1 2 7
1 2 7
io:~/818/projects/p4/solution> go run cli.go -s :5001 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
96 97 6373
1 2 7
1 2 7
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
96 97 6373
96 97 6373
96 97 6373
io:~/818/projects/p4/solution>
io:~/818/projects/p4/solution> go run cli.go -s :5003 put sampledis-7.2.1
sha256_32_UGPT64YNNMEPL3SVEJJHCR2M5QC3EYD5YKOEPOSFJVDJRVSTFYEA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
96 97 6373
3209 3210 214946
4715 4716 315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
96 97 6373
4715 4716 315848
4715 4716 315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
4715 4716 315848
4715 4716 315848
4715 4716 315848
Done!
For your convenience, all these commands are condensed here.
Grading
I will use the following rough guide to assign grades:
- 70 pts: from the first six tests
- 30 pts: containerization
Bonus!
- prometheus and grafana
Video walkthrough
Submit
Create a README.md
file in your p4
directory describing
what works, what doesn't, and any divergences from this
document.
Commit to the repository.