Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

Anti-Entropy, and Containers

Due: Oct 29, 2023, 11:59:59 pm, v2

Recent Changes:

Overview

This project is a relatively straightforward one-way syncing of one server to another, but there are a few tricky aspects, and you will need to spend time getting better at gRPC, plus docker containerization. We are going to synchronize the blobs using Merkle trees over gRPC.

Setup

Download files here.

Anti-Entropy and Merkle Trees

Merkle Trees are forms of hash trees where each leaf is the hash of a single blob, and each non-leaf is the hash of the concatenation of the hash of it's child nodes. We are going to take severe liberties w/ the definition of both hashes and merkle tree in order to make the project tractible. For this project, we define the hash of a blob or a string as the base32 encoding of the sha256 of that blob or string. Basically, the output of computeSig() (which has been modified to accommodate empty strings, see the distribution).

Salient points:

  • The tree is a full balanced tree:
    • all leaves at same depth
    • all nodes except leaves have exactly 32 children
    • depth is an integer greater than 0, and defined as the number of levels in the tree.
  • The tree's purpose is to summarize the set of all blobs locally present at a server.
  • Each node has a sig, consisting of the hash of the data rooted at the subtree below.
  • We divide the space of blob hashes into uniform-sized buckets whose width is determined solely by the tree's height.
  • Each leaf represents one bucket, and summarizes all blobs present at the server in that bucket.
  • The sig of a leaf represents the hash of the sigs of the blobs under the leaf, where the blob sigs are sorted alphanumerically and concatenated. For example, given sig values "3B", "F3", "7A", and "K4", the hash is computed approximately as hash(3B|7A|F3|K4).
  • The sig of an empty bucket is an empty string.
  • The Sig of a non-leaf is the hash of the concatenation of the hash values of all children of that node, not sorted. We also define the hash of an empty string as an empty string.

For this project we have defined the fan-out of the tree as 32, which is conveniently the base of the encoding we use to create signatures, allowing the tree to be visualized easily.

merkle

This figure is only showing a small portion of a depth (height) 3 tree. There are 32 interior nodes on the second level and 322 leaves on the third level.

The labels of leaf nodes are the hash of the concatenation of the hashes of blobs in that leaf's bucket (buckets are not strictly relevant for interior nodes). Buckets get less wide as you go down the tree. If this were a depth 1 tree (consisting only of the root), the root's bucket would contain all possible hash values.

If this were a depth 2 tree, the tree would consist of the root and leaves. The first leaf would contain all local sigs starting w/ A (actually sha256_32_A).

The match between our fanout and our base32 encoding means that it will be easy to determine the correct bucket for each blob.

Merkle Workflow

The distributed merkle tree comparison works through three RPCs: build, path, and get. build tells the remote site to make a merkle tree summarizing all blobs. We are using base 32 encoding, i.e. 234567ABCDEFGHIJKLMNOPQRSTUVWXYZ. So we navigate from the root to leaf via a string containing only the above characters. The path to the root is "". A path from the root to the a leaf in a depth 4 (height 3 in the usual nomenclature) will be a string of length 3, e.g.: "Z4A".

The build RPC will return the blob tree hash (hash of top node in the merkle tree). A server saves all blob trees it has built in response to remote requests, never garbage-collecting. The build RPC should also return the total number of blob hashes in the tree.

A remote build trees is persistent in the remote server's memory, so a client can then navigate the remote tree via the path RPC. Arguments are the blob tree hash and a path string. The return value will include an array of sigs that describe the children of the node defined by the blob tree hash and the path. The children are blob hashes (sigs) if the node is a leaf, or hashes of hashes otherwise.

The hash of a node is defined as the hash of the concatenation of the hashes of the node's children, whether those children are also other nodes (for an interior nodes), or blobs for a leaf). The one caveat to this is that debugging is easier if you define the hash of an empty string as an empty string.

In the absence of any other activity, if

s1s_1
pulls from
s2s_2
, a subsequent pull will reveal that the remote and local blob stores are identical (the remote and local blob tree hashes will be the same).

Your Tasks.

Task 1: Meet the Merkles

I have given you the layout of your p4 directory in the repository. The pb directory once again holds protobuf definitions, but you should alter that file to include RPCs for build, path, and pull.

Synchronizing the sets of blobs present at different servers requires first describing those sets, which we do with Merkle trees. You must implement the merkle trees in your code. No using third-party libraries for this.

A server pulling from another server:

  1. creates a local blob tree
  2. tells the remote server to create a blob tree by sending it a build RPC.
  3. uses path and get RPCs to navigate the remote tree and download blobs, if needed.

Summary of interface changes.

Client commands:

  • -s <
    s1s_1
    > build
    : Send request to
    s1s_1
    instructing it to create a blob (merkle) tree, saving it at the remote site under the blob tree sig that is returned from this call. Return something like: 4619-sig tree on :5557: sha256_32_735PHA56VU35N75KLJIRP5WZJBYCVH6TPV7JPL53EQSUMUGASAPQ==== This says 4619 total blobs indexed by the tree w/ the blob tree (merkle tree) root's hash as above.
  • -s <
    s_1
    > path
    : Send request to
    s_1
    instructing it to return the sigs: child sigs if interior node, blob sigs if leaf node.
io:~/p4> ./cli -s :5557 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/p4> ./cli -s localhost:5557 list | wc
      96      97    6373
io:~/p4> ./cli -s :5557 build
95-sig tree on :5557: sha256_32_F7VND2B6WVI4XQTZN2FQI3UMIASI6VYKKDJ3DKCU5CLXWHF346AQ====
io:~/p4> ./cli -s :5557 path sha256_32_F7VND2B6WVI4XQTZN2FQI3UMIASI6VYKKDJ3DKCU5CLXWHF346AQ==== 244
sigs: 1
hash: sha256_32_23DQYGTDTBZTXT7YDIANEFWCNFERR4MZDHFQXM2M2GSN5IY233CA====
io:~/p4> ./cli -s :5557 path last 244
sigs: 1
hash: sha256_32_23DQYGTDTBZTXT7YDIANEFWCNFERR4MZDHFQXM2M2GSN5IY233CA====
io:~/p4> ./cli -s :5557 path last 24
sigs: 2
hash: sha256_32_64NGXT5VZ2WC56FIY2RAKU3MLHOWEYBNJARJ2G3L57FNGYSYGQVQ====
io:~/p4> ./cli -s :5557 path last 2
sigs: 3
hash: sha256_32_VZGPJQR5YRFZBCHDSL2OPUJ2SEYC74IDIQYYWPDCG3M44K5TE73A====
io:~/p4> ./cli -s :5557 path last ""
sigs: 95
hash: sha256_32_F7VND2B6WVI4XQTZN2FQI3UMIASI6VYKKDJ3DKCU5CLXWHF346AQ====
  • -s <
    s_1
    > pull <
    s_2
    >
    : Tells
    s_1
    to pull from
    s_2
    by sending a build to
    s_1
    and then using the returned blob tree sig to navigate through the remote blob tree using path and get calls. Return the number of RPCs and total time from the perspective of the server that is doing anti-entropy from another server.
io:~/818/projects/p4/solution> cli -s :5557 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/818/projects/p4/solution> cli -s :5558 pull localhost:5557
Pulled by :5558 from localhost:5557: 95 blobs using 215 RPCs, 0.081192458 secs
io:~/818/projects/p4/solution> cli -s :5558 pull localhost:5557
Pulled by :5558 from localhost:5557: 0 blobs using 1 RPCs, 0.044928375 secs

Note that this is show counts of blobs and RPCs used for each pull request, though the video does not. Yours should do this.

Server command line args:

  • -b : Directory where the blobdirectory should be placed. CHANGES: your blob directory should be called <dir>/blob<port>, where the dir is specified like -b /tmp/, and the port is specified w/ -s <port>. For example: "/tmp/blob5001".
  • -s : Bare integer.
  • -S serv1:port1,serv2:port2... : Comma-separated list of servname:portno. If specified, this takes precendence over anything compose.yaml says. Intended for use in local debugging, whereas the server names from compose.yaml are used in the containerization test.
  • -p : Changes the anti-entropy period.
  • -D : Changes the tree depth. Remember that a depth N counds all levels, including root and leaves (i.e. height+1).

Antientropy

Implemented just like the command-line pull command, except initiated by an anti-entropy threads spun off and waking up periodically. The duration between each anti-entropy should be a random interval of +/- 1 second from the value of the specified anti-entropy period.

Task 2: Containerize using docker compose

There are lots of good tutorials of this online. I'm going to give you the compose file for free:

version: "3"
services:
  ubi1:
    container_name: ubi1
    hostname: ubi1
    build:
      context: .
      dockerfile: Dockerfile
    image: ulean:alpha
    ports:
      - 5001:5000
  ubi2:
    container_name: ubi2
    hostname: ubi2
    image: ulean:alpha
    ports:
      - 5002:5000
  ubi3:
    container_name: ubi3
    hostname: ubi3
    image: ulean:alpha
    ports:
      - 5003:5000

You can use this unmodified, as I will use it in testing. I will probably use more hosts, but they will be defined exactly like the above

The same Dockerfile image, ulean:alpha should be used for all containers, but this image can be built any way you wish.

I do recommend using a multi-stage dockerfile to limit the size. The first stage has the entire build environment and you compile your server binary there. This might be 900 MB. In the second stage you copy your binary from the first stage and just have a minimal environment for running the binary. I'm currently using debian:bookworm-slim and it results in an image on the order of 95 MB.

Task 3: Visualize w/ prometheus and docker (optional, bonus points)

In the real world you want to hook your containers up to a dashboard that will visually show you what is going on. Typical workflows use prometheus to scrape http ports on your servers, and to then feed Grafana, which gives you an easy means of defining a dashboard. There are many paywalled articles about doing this, such as this one.

I will give up to 20 bonus points for a good, clean docker compose solution w/ prometheus, grafana each in different containers, plus the ubi containers, all in one compose.yaml. Your solution must include screen recording showing you building the dashboard, starting up the system, and explaining the whole thing well enough for me to duplicate what you have done.

As far as the the metrics, even a dashboard that shows a real-time bar graph of total number blobs at each server would be a nice start. Exposing this info should be straightforward, something along the lines of:

package store

import (
    "net/http"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    myMetric = prometheus.NewGauge(
        prometheus.GaugeOpts{
            Name: "my_metric",
            Help: "My custom metric",
        },
    )
)

// start up in it's own goroutine, like "go serveMetrics()"
func serveMetrics() {
    prometheus.MustRegister(myMetric)

    http.Handle("/metrics", promhttp.Handler())
    go func() {
        for {
            // Update the metric value over time
            myMetric.Set(42) // Update this value with number of local blobs
            // Sleep for some time
            time.Sleep(time.Second)
        }
    }()

    http.ListenAndServe(":8080", nil)
}

Notes

  • server.go is now ubiserver.go.
  • trees should default to depth 4, each level 32-wide

The distro includes a simple bld script that rebuilds the protobuf interface and creates executables ubiserver, cli, and l.

Testing

I will run through the tests shown in the video, plus a few others. In particular, with anti-entropy set to a very large value I will:

  • bring up two local server as shown in the video, put data to one of them, use build and path commands to navigate them, and they should behave similarly to the video. Also, I will look in the .blobstore.
  • I'll also do pulls like the videos, except that I expect to see counts of blobs pulled and RPCs as described above.
  • I'll then load a big distro into a server, possibly something else into another, and let anti-entropy equalize.
  • I will bring it all up w/ docker compose and repeat the above.

Notes

  • The maximum size of gRPC message is 4MB by default. We should not be running up against this limit in my testing.
  • async NOT TESTED.

Testing

The following will be my testing procedure. Yours should look very similar, and accept the same command line arguments as below, or you will not get credit.

First thing we test is probing your merkle tree. To make this concrete, grab this file, and then untar it in /tmp/, so that you now have a directory /tmp/blob5001 that has 95 blobs in it.

Test 1 (10 pts): Then fire up an ubiserver, which should NOT wipe out the blobdir you just uncompressed, and instead should treat it as its own.

io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 1000 &
[1] 46194
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/

io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc
      96      97    6373

Test 2 (10 pts): The above showed that your ubiserver can find my blobs. Next, create and probe a merkle tree:

io:~/818/projects/p4/solution> go run cli.go -s :5001 build
95-sig tree on :5001: sha256_32_MEKYINRK35PJ67GPQZRFM2X4KUOTOU756ZJWNDCCY6U4LYEHN3YQ====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last E
sigs: 3
combined: sha256_32_2VAB4CK4VUVO36JU5HPLONSF7MGG64DPMN4TRFQURUC5ZOOSBYRA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last EM
sigs: 2
combined: sha256_32_CN757EJBFCJFXNORM5QSS76GGB5TC2QAKTE3GV5MAQZAQDSG3UQQ====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last EMW
sigs: 1
combined: sha256_32_B5XMMYDZ2WFBHXRRICT7WMRSYEMTYGINA6UBVICJAAU6GPJ2RSBA====
	"sha256_32_EMWG6JUKVXSLYUE4UWUR6HJIITXFJFF4S5DGQLEJ5QDLWFW3CSLQ===="

Test 3 (10 pts): Now do the same w/ a tree of depth 2:

killall ubiserver
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 1000 -D 2 &
[1] 47308
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/

io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc
      96      97    6373
io:~/818/projects/p4/solution> go run cli.go -s :5001 build
95-sig tree on :5001: sha256_32_75LWMMDVKOOBFSXIFSDYFSP6PELM3YUZ4YKTHYH4LD5HAIITLN4A====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last ""
sigs: 95
combined: sha256_32_75LWMMDVKOOBFSXIFSDYFSP6PELM3YUZ4YKTHYH4LD5HAIITLN4A====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last E
sigs: 3
combined: sha256_32_N3C5MBRDCCJLLX3BBPEOUU2TVWSOMOD7DLN5DYILHEG52EVBMSUA====
	"sha256_32_EMORFEZQVOUTK5JX66U7KR3M4IEHQK6WKV2DY6FBZORBSHKDQF6Q===="
	"sha256_32_EMWG6JUKVXSLYUE4UWUR6HJIITXFJFF4S5DGQLEJ5QDLWFW3CSLQ===="
	"sha256_32_ETBRHI7Y53CYS6IRZW5GETFFK3MG4RBD66JYJEQZS3HBNZ7SY3QQ===="

Test 4 (10 pts): Now start up another ubiserver and see if we can sync:

io:~/818/projects/p4/solution> killall ubiserver
No matching processes belonging to you were found
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 1000 &
[1] 47967
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/

io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5002 -S localhost:5001 -p 1000 &
[2] 47998
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5001]
serving gRPC on :5002, host "io.lan", servers [localhost:5001], /tmp/blob5002/

io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc
      96      97    6373
io:~/818/projects/p4/solution> go run cli.go -s :5002 list | wc
       1       2       7
io:~/818/projects/p4/solution> go run cli.go -s :5001 pull "localhost:5002"
Pulled by :5001 from localhost:5002: 0 blobs using 215 RPCs, 0.079132334 secs
io:~/818/projects/p4/solution> go run cli.go -s :5002 pull "localhost:5001"
Pulled by :5002 from localhost:5001: 95 blobs using 215 RPCs, 0.096492958 secs
io:~/818/projects/p4/solution> go run cli.go -s :5002 pull "localhost:5001"
Pulled by :5002 from localhost:5001: 0 blobs using 1 RPCs, 0.054424625 secs
io:~/818/projects/p4/solution> go run cli.go -s :5001 pull "localhost:5002"
Pulled by :5001 from localhost:5002: 0 blobs using 1 RPCs, 0.0569965 secs

We:

  • verified that the blobs were only in 5001's blobstore
  • tried to pull from 5002 (empty) to 5001. This took many RPC even though it received nothing, as every level of the trees are different.
  • pulled from 5001 to 5002 and saw many blobs moving, still w/ many rpcs
  • repeated the above, this time took only 1 RPC to see there was nothing needed from 5001
  • pulling from 5002 to 5001 brought nothing, but only took a single RPC this time

Test 5 (15 pts): Last, we need to check anti entropy. The above had anti-entropy turned off by setting a period of 20 minutes, so the first anti-entropy never happened.

io:~/818/projects/p4/solution> killall ubiserver
No matching processes belonging to you were found
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5002 -S localhost:5001 -p 10 &
[1] 49441
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5001]
serving gRPC on :5002, host "io.lan", servers [localhost:5001], /tmp/blob5002/
                               go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 10 &
[2] 49467
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/

io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
       1       2       7
       1       2       7

Both sides are empty. Now read sampledir into 5001:

io:~/818/projects/p4/solution> go run cli.go -a -s :5001 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
      96      97    6373
      82      83    5435
io:~/818/projects/p4/solution> pulled 95 blobs using 214 rpcs from localhost:5001
go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
      96      97    6373
      96      97    6373

Test 6 (15 pts): Now load a real big directory into the 5002 and make sure that 5001 can grab it as well:

io:~/818/projects/p4/solution> go run cli.go -a -s :5002 put sampledis-7.2.1
sha256_32_UGPT64YNNMEPL3SVEJJHCR2M5QC3EYD5YKOEPOSFJVDJRVSTFYEA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
      96      97    6373
    4715    4716  315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
      96      97    6373
    4715    4716  315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
pulled 4619 blobs using 5361 rpcs from localhost:5002
    4715    4716  315848
    4715    4716  315848

Test 7 (30 pts):

I will then verify that the containerization works. With the Dockerfile and compose.yaml you have been given, I will bring up three hosts:

  docker compose up

In another window I will load sampledir into one container and sampledis-7.2.1 into another:

io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc; go run cli.go -s :5002 list | wc; go run cli.go -s :5003 list | wc
       1       2       7
       1       2       7
       1       2       7
io:~/818/projects/p4/solution> go run cli.go -s :5001 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
      96      97    6373
       1       2       7
       1       2       7
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
      96      97    6373
      96      97    6373
      96      97    6373
io:~/818/projects/p4/solution>
io:~/818/projects/p4/solution> go run cli.go -s :5003 put sampledis-7.2.1
sha256_32_UGPT64YNNMEPL3SVEJJHCR2M5QC3EYD5YKOEPOSFJVDJRVSTFYEA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
      96      97    6373
    3209    3210  214946
    4715    4716  315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
      96      97    6373
    4715    4716  315848
    4715    4716  315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
    4715    4716  315848
    4715    4716  315848
    4715    4716  315848

Done!

For your convenience, all these commands are condensed here.

Grading

I will use the following rough guide to assign grades:

  • 70 pts: from the first six tests
  • 30 pts: containerization

Bonus!

  • prometheus and grafana

Video walkthrough

p4 walkthrough

Submit

Create a README.md file in your p4 directory describing what works, what doesn't, and any divergences from this document.

Commit to the repository.