# Anti-Entropy, and Containers
**Due: Oct 29, 2023, 11:59:59 pm, v2**

### Recent Changes:

## Overview
This project is a relatively straightforward one-way *syncing* of one
server to another, but there are a few tricky aspects, and you will
need to spend time getting better at gRPC, plus docker
containerization.  We are going to synchronize the blobs using Merkle
trees over gRPC.

## Setup
Download files [here](https://sedna.cs.umd.edu/818/projects/p4.tgz).

## Anti-Entropy and Merkle Trees

Merkle Trees are forms of hash trees where each leaf is the hash of a
single blob, and each non-leaf is the hash of the concatenation of
the hash of it's child nodes. We are going to take severe liberties w/
the definition of both hashes and merkle tree in order to make the project
tractible. For this project, we
**define the *hash* of a blob or a string as the `base32` encoding of the `sha256`
of that blob or string**. Basically, the output of `computeSig()`
(which has been modified to accommodate empty strings, see the distribution).

Salient points:
- The tree is a full balanced tree:
  - all leaves at same depth
  - all nodes except leaves have exactly 32 children
  - depth is an integer greater than 0, and defined as the number of
    levels in the tree.
- The tree's purpose is to summarize the set of all blobs locally
  present at a server.
- Each node has a *sig*, consisting of the hash of the data rooted at the subtree below.
- We divide the space of blob hashes into uniform-sized buckets
  whose width is determined solely by the tree's height. 
- Each leaf represents one bucket, and summarizes all blobs present at
  the server in that bucket.
- The sig of a leaf represents the hash of the sigs of the blobs
  under the leaf, where the blob sigs are sorted alphanumerically and concatenated.
  For example, given sig values "3B", "F3", "7A", and "K4", the hash
  is computed approximately as *hash(3B|7A|F3|K4)*.
- The sig of an empty bucket is *an empty string*. 
- The Sig of a non-leaf is the *hash* of the concatenation of the hash
  values of all children of that node, *not sorted*. We also define the
  hash of an empty string as an empty string.

For this project we have defined the fan-out of the tree as 32, which
is conveniently the base of the encoding we use to create
signatures, allowing the tree to be visualized easily.

![merkle](merkleP3818F20.svg)

This figure is only showing a small portion of a depth (height) 3
tree. There are 32 interior nodes on the second level and
32<sup>2</sup> leaves on the third level. 

The labels of leaf nodes are the hash of the concatenation of the
hashes of blobs in that leaf's bucket (buckets are not strictly
relevant for interior nodes).
Buckets get less wide as you go down the tree.
If this were a depth 1 tree (consisting only of the root), the root's
bucket would contain all possible hash values.

If this were a depth 2 tree, the tree would consist of the root and
leaves. The first leaf would contain all local `sigs` starting w/ *A*
(actually *sha256_32_**A***).

The match between our fanout and our `base32` encoding means that it will be
easy to determine the correct bucket for each blob.

## Merkle Workflow

The distributed merkle tree comparison works through three RPCs: *build*, *path*, and *get*.
*build* tells the remote site to make a merkle tree summarizing all
blobs. We are using base 32 encoding, i.e.  `234567ABCDEFGHIJKLMNOPQRSTUVWXYZ`. So we navigate from
the root to leaf via a string containing only the above
characters. The path to the root is "". A path from the
root to the a leaf in a depth 4 (height 3 in the usual nomenclature)
will be a string of length 3, e.g.: "Z4A". 

The *build* RPC will return the *blob tree hash* (hash of top node in
the merkle tree). A server saves all blob trees it has built in
response to remote requests, never garbage-collecting. The build RPC should also return
the total number of blob hashes in the tree.

A remote build trees is persistent in the remote server's memory, so a client can then navigate the remote tree
via the *path* RPC. Arguments are the blob tree hash and a path
string. The return value will include an array of *sigs* that describe
the children of the node defined by the blob tree hash and the
path. The children are blob hashes (*sigs*) if the node is a leaf, or hashes
of hashes otherwise. 

*The hash of a node is defined as the hash of the concatenation of the
hashes of the node's children*, whether those children are also other
nodes (for an interior nodes), or blobs for a leaf).  The one caveat to this is that debugging is easier
if you **define the hash of an empty string as an empty string**.

In the absence of any other activity, if $s_1$ pulls from $s_2$, a
subsequent pull will reveal that the remote and local blob stores are
identical (the remote and local blob tree hashes will be the same).


## Your Tasks.

### Task 1: Meet the Merkles
I have given you the layout of your p4 directory in the repository. The
*pb* directory once again holds protobuf definitions, but you should
alter that file to include RPCs for *build*, *path*, and *pull*.

Synchronizing the sets of blobs present at different servers requires
first describing those sets, which we do with Merkle trees. You must
implement the merkle trees in your code. No using third-party
libraries for this.

A server *pulling* from another server:
1. creates a local blob tree
2. tells the remote server to create a blob tree by sending it a
*build* RPC. 
3. uses *path* and *get* RPCs to navigate the remote tree and download
blobs, if needed. 

### Summary of interface changes.

Client commands:
- **-s <$s_1$> build**: Send request to $s_1$ instructing it to
		create a blob (merkle) tree, saving it at the remote site
		under the *blob tree sig* that is returned from this call. Return something like:
		```
		4619-sig tree on :5557: sha256_32_735PHA56VU35N75KLJIRP5WZJBYCVH6TPV7JPL53EQSUMUGASAPQ====
		```
		This says 4619 total blobs indexed by the tree w/ the blob tree (merkle tree) root's hash as above.
- **-s <$s_1$> path <blob tree sig> <path from root>**: Send request to
		$s_1$ instructing it to return the sigs: child sigs if
		interior node, blob sigs if leaf node.

```
io:~/p4> ./cli -s :5557 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/p4> ./cli -s localhost:5557 list | wc
      96      97    6373
io:~/p4> ./cli -s :5557 build
95-sig tree on :5557: sha256_32_F7VND2B6WVI4XQTZN2FQI3UMIASI6VYKKDJ3DKCU5CLXWHF346AQ====
io:~/p4> ./cli -s :5557 path sha256_32_F7VND2B6WVI4XQTZN2FQI3UMIASI6VYKKDJ3DKCU5CLXWHF346AQ==== 244
sigs: 1
hash: sha256_32_23DQYGTDTBZTXT7YDIANEFWCNFERR4MZDHFQXM2M2GSN5IY233CA====
io:~/p4> ./cli -s :5557 path last 244
sigs: 1
hash: sha256_32_23DQYGTDTBZTXT7YDIANEFWCNFERR4MZDHFQXM2M2GSN5IY233CA====
io:~/p4> ./cli -s :5557 path last 24
sigs: 2
hash: sha256_32_64NGXT5VZ2WC56FIY2RAKU3MLHOWEYBNJARJ2G3L57FNGYSYGQVQ====
io:~/p4> ./cli -s :5557 path last 2
sigs: 3
hash: sha256_32_VZGPJQR5YRFZBCHDSL2OPUJ2SEYC74IDIQYYWPDCG3M44K5TE73A====
io:~/p4> ./cli -s :5557 path last ""
sigs: 95
hash: sha256_32_F7VND2B6WVI4XQTZN2FQI3UMIASI6VYKKDJ3DKCU5CLXWHF346AQ====
```
- **-s <$s_1$> pull <$s_2$>**: Tells $s_1$ to pull from
$s_2$ by sending a  `build` to $s_1$ and then using the
	returned blob tree sig to navigate through the remote blob tree using
	`path` and `get` calls. Return the number of RPCs and total time
	from the perspective of the server that is doing anti-entropy from
	another server.
```
io:~/818/projects/p4/solution> cli -s :5557 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/818/projects/p4/solution> cli -s :5558 pull localhost:5557
Pulled by :5558 from localhost:5557: 95 blobs using 215 RPCs, 0.081192458 secs
io:~/818/projects/p4/solution> cli -s :5558 pull localhost:5557
Pulled by :5558 from localhost:5557: 0 blobs using 1 RPCs, 0.044928375 secs
```
Note that this is show counts of blobs and RPCs used for each pull
request, though the video does not. Yours should do this.


### Server command line args:
- **-b <dir>** : Directory where the blobdirectory should be
  placed. CHANGES: your blob directory should be called
  `<dir>/blob<port>`, where the dir is specified like `-b /tmp/`, and
  the port is specified w/ `-s <port>`. For example: "/tmp/blob5001".
- **-s <serverport>** : Bare integer.
- **-S <serv1:port1,serv2:port2...>** : Comma-separated list of
		<servname:portno>. If specified, this takes precendence over
		anything `compose.yaml` says. Intended for use in local
		debugging, whereas the server names from `compose.yaml` are
		used in the containerization test.
- **-p <period>**: Changes the anti-entropy period.
- **-D <treedepth>**: Changes the tree depth. Remember that a depth
  *N* counds all levels, including root and leaves (i.e. height+1).



### Antientropy
Implemented just like the command-line *pull* command, except
initiated by an anti-entropy threads spun off and waking up
periodically. The duration between each anti-entropy should be a
random interval of +/- 1 second from the value of the specified
anti-entropy period. 


## Task 2: Containerize using docker compose
There are lots of good tutorials of this online. I'm going to give you
the compose file for free:
```
version: "3"
services:
  ubi1:
    container_name: ubi1
    hostname: ubi1
    build:
      context: .
      dockerfile: Dockerfile
    image: ulean:alpha
    ports:
      - 5001:5000
  ubi2:
    container_name: ubi2
    hostname: ubi2
    image: ulean:alpha
    ports:
      - 5002:5000
  ubi3:
    container_name: ubi3
    hostname: ubi3
    image: ulean:alpha
    ports:
      - 5003:5000
```
You can use this unmodified, as I will use it in testing. I will
probably use more hosts, but they will be defined exactly like the above

The same `Dockerfile` image, `ulean:alpha` should be used for all
containers, but this image can be built any way you wish. 

I do recommend using a multi-stage dockerfile to limit the size. The first stage has the
entire build environment and you compile your server binary
there. This might be 900 MB. In the second stage you copy your binary
from the first stage and just have a minimal environment for running
the binary. I'm currently using `debian:bookworm-slim` and it results
in an image on the order of 95 MB.


## Task 3: Visualize w/ prometheus and docker (optional, bonus points)
In the real world you want to hook your containers up to a dashboard
that will visually show you what is going on. Typical workflows use
[prometheus](https://prometheus.io/) to scrape `http` ports on your
servers, and to then feed [Grafana](https://grafana.com/), which gives you an easy means of
defining a dashboard.
There are many paywalled articles about doing this, such as [this
one](https://hackernoon.com/how-to-set-up-docker-compose-for-prometheus-grafana). 

I will give up to 20 bonus points for a good, clean docker compose
solution w/ prometheus, grafana each in different containers, plus the ubi containers, all in one
`compose.yaml`. Your solution must include screen recording showing you
building the dashboard, starting up the system, and explaining the
whole thing well enough for me to duplicate what you have done.

As far as the the metrics, even a dashboard that shows a real-time bar graph of
total number blobs at each server would be a nice start. Exposing this
info should be
straightforward, something along the lines of:

```
package store

import (
    "net/http"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    myMetric = prometheus.NewGauge(
        prometheus.GaugeOpts{
            Name: "my_metric",
            Help: "My custom metric",
        },
    )
)

// start up in it's own goroutine, like "go serveMetrics()"
func serveMetrics() {
    prometheus.MustRegister(myMetric)

    http.Handle("/metrics", promhttp.Handler())
    go func() {
        for {
            // Update the metric value over time
            myMetric.Set(42) // Update this value with number of local blobs
            // Sleep for some time
            time.Sleep(time.Second)
        }
    }()

    http.ListenAndServe(":8080", nil)
}
```

## Notes
- `server.go` is now `ubiserver.go`.
- trees should default to depth 4, each level 32-wide

The distro includes a simple `bld` script that rebuilds the protobuf
interface and creates executables `ubiserver`, `cli`, and `l`.


## Testing
I will run through the tests shown in the video, plus a few others. In
particular, with anti-entropy set to a very large value I will:
- bring up two local server as shown in the video, *put* data to one
of them, use *build* and *path* commands to navigate them, and they
should behave similarly to the video. Also, I will look in the
`.blobstore`.
- I'll also do *pulls* like the videos, except that I expect to see
counts of blobs pulled and RPCs as described above.
- I'll then load a big distro into a server, possibly something else
  into another, and let anti-entropy equalize.
- I will bring it all up w/ docker compose and repeat the above.

## Notes
- The maximum size of gRPC message is 4MB by default. We should not be
  running up against this limit in my testing.
- `async` NOT TESTED.

## Testing

The following will be my testing procedure. Yours should look very
similar, and accept the same command line arguments as below, or you
will not get credit.


First thing we test is probing your merkle tree. To make this concrete, grab [this file](blob5001.tgz), and then untar it in `/tmp/`, so that you now have a directory `/tmp/blob5001` that has 95 blobs in it.

**Test 1 (10 pts):** Then fire up an ubiserver, which should NOT wipe out the blobdir you just uncompressed, and instead should treat it as its own.
```
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 1000 &
[1] 46194
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/

io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc
      96      97    6373
```

**Test 2 (10 pts):** The above showed that your ubiserver can find my blobs. Next, create and probe a merkle tree:
```
io:~/818/projects/p4/solution> go run cli.go -s :5001 build
95-sig tree on :5001: sha256_32_MEKYINRK35PJ67GPQZRFM2X4KUOTOU756ZJWNDCCY6U4LYEHN3YQ====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last E
sigs: 3
combined: sha256_32_2VAB4CK4VUVO36JU5HPLONSF7MGG64DPMN4TRFQURUC5ZOOSBYRA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last EM
sigs: 2
combined: sha256_32_CN757EJBFCJFXNORM5QSS76GGB5TC2QAKTE3GV5MAQZAQDSG3UQQ====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last EMW
sigs: 1
combined: sha256_32_B5XMMYDZ2WFBHXRRICT7WMRSYEMTYGINA6UBVICJAAU6GPJ2RSBA====
	"sha256_32_EMWG6JUKVXSLYUE4UWUR6HJIITXFJFF4S5DGQLEJ5QDLWFW3CSLQ===="
```

**Test 3 (10 pts):** Now do the same w/ a tree of *depth* 2:
```
killall ubiserver
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 1000 -D 2 &
[1] 47308
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/

io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc
      96      97    6373
io:~/818/projects/p4/solution> go run cli.go -s :5001 build
95-sig tree on :5001: sha256_32_75LWMMDVKOOBFSXIFSDYFSP6PELM3YUZ4YKTHYH4LD5HAIITLN4A====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last ""
sigs: 95
combined: sha256_32_75LWMMDVKOOBFSXIFSDYFSP6PELM3YUZ4YKTHYH4LD5HAIITLN4A====
io:~/818/projects/p4/solution> go run cli.go -s :5001 path last E
sigs: 3
combined: sha256_32_N3C5MBRDCCJLLX3BBPEOUU2TVWSOMOD7DLN5DYILHEG52EVBMSUA====
	"sha256_32_EMORFEZQVOUTK5JX66U7KR3M4IEHQK6WKV2DY6FBZORBSHKDQF6Q===="
	"sha256_32_EMWG6JUKVXSLYUE4UWUR6HJIITXFJFF4S5DGQLEJ5QDLWFW3CSLQ===="
	"sha256_32_ETBRHI7Y53CYS6IRZW5GETFFK3MG4RBD66JYJEQZS3HBNZ7SY3QQ===="
```

**Test 4 (10 pts):** Now start up another ubiserver and see if we can sync:
```
io:~/818/projects/p4/solution> killall ubiserver
No matching processes belonging to you were found
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 1000 &
[1] 47967
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/

io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5002 -S localhost:5001 -p 1000 &
[2] 47998
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5001]
serving gRPC on :5002, host "io.lan", servers [localhost:5001], /tmp/blob5002/

io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc
      96      97    6373
io:~/818/projects/p4/solution> go run cli.go -s :5002 list | wc
       1       2       7
io:~/818/projects/p4/solution> go run cli.go -s :5001 pull "localhost:5002"
Pulled by :5001 from localhost:5002: 0 blobs using 215 RPCs, 0.079132334 secs
io:~/818/projects/p4/solution> go run cli.go -s :5002 pull "localhost:5001"
Pulled by :5002 from localhost:5001: 95 blobs using 215 RPCs, 0.096492958 secs
io:~/818/projects/p4/solution> go run cli.go -s :5002 pull "localhost:5001"
Pulled by :5002 from localhost:5001: 0 blobs using 1 RPCs, 0.054424625 secs
io:~/818/projects/p4/solution> go run cli.go -s :5001 pull "localhost:5002"
Pulled by :5001 from localhost:5002: 0 blobs using 1 RPCs, 0.0569965 secs
```
We:
- verified that the blobs were only in 5001's blobstore
- tried to pull from 5002 (empty) to 5001. This took many RPC even
  though it received nothing, as every level of the trees are
  different.
- pulled from 5001 to 5002 and saw many blobs moving, still w/ many
  rpcs
- repeated the above, this time took only 1 RPC to see there was
  nothing needed from 5001
- pulling from 5002 to 5001 brought nothing, but only took a single
  RPC this time


**Test 5 (15 pts):** Last, we need to check anti entropy. The above had anti-entropy turned
off by setting a period of 20 minutes, so the first anti-entropy never
happened. 
```
io:~/818/projects/p4/solution> killall ubiserver
No matching processes belonging to you were found
io:~/818/projects/p4/solution> go run ubiserver.go -b /tmp/blob -s :5002 -S localhost:5001 -p 10 &
[1] 49441
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5001]
serving gRPC on :5002, host "io.lan", servers [localhost:5001], /tmp/blob5002/
                               go run ubiserver.go -b /tmp/blob -s :5001 -S localhost:5002 -p 10 &
[2] 49467
io:~/818/projects/p4/solution> And we are up..."io.lan"....[localhost:5002]
serving gRPC on :5001, host "io.lan", servers [localhost:5002], /tmp/blob5001/

io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
       1       2       7
       1       2       7
```

Both sides are empty. Now read sampledir into 5001:
```
io:~/818/projects/p4/solution> go run cli.go -a -s :5001 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
      96      97    6373
      82      83    5435
io:~/818/projects/p4/solution> pulled 95 blobs using 214 rpcs from localhost:5001
go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
      96      97    6373
      96      97    6373
```

**Test 6 (15 pts):** Now load a real big directory into the 5002 and make sure that 5001
can grab it as well:
```
io:~/818/projects/p4/solution> go run cli.go -a -s :5002 put sampledis-7.2.1
sha256_32_UGPT64YNNMEPL3SVEJJHCR2M5QC3EYD5YKOEPOSFJVDJRVSTFYEA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
      96      97    6373
    4715    4716  315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
      96      97    6373
    4715    4716  315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc
pulled 4619 blobs using 5361 rpcs from localhost:5002
    4715    4716  315848
    4715    4716  315848
```

**Test 7 (30 pts):**

I will then verify that the containerization works. With the
`Dockerfile` and `compose.yaml` you have been given, I will bring up
three hosts:
```
  docker compose up
```

In another window I will load `sampledir` into one container and
`sampledis-7.2.1` into another:
```
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc; go run cli.go -s :5002 list | wc; go run cli.go -s :5003 list | wc
       1       2       7
       1       2       7
       1       2       7
io:~/818/projects/p4/solution> go run cli.go -s :5001 put sampledir
sha256_32_ITNRHUSBBLCJFBHKUB2ESG24XNUUPF3KH34CRO22JWOIZPZRC5OA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
      96      97    6373
       1       2       7
       1       2       7
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
      96      97    6373
      96      97    6373
      96      97    6373
io:~/818/projects/p4/solution>
io:~/818/projects/p4/solution> go run cli.go -s :5003 put sampledis-7.2.1
sha256_32_UGPT64YNNMEPL3SVEJJHCR2M5QC3EYD5YKOEPOSFJVDJRVSTFYEA====
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
      96      97    6373
    3209    3210  214946
    4715    4716  315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
      96      97    6373
    4715    4716  315848
    4715    4716  315848
io:~/818/projects/p4/solution> go run cli.go -s :5001 list | wc ; go run cli.go -s :5002 list | wc ; go run cli.go -s :5003 list | wc
    4715    4716  315848
    4715    4716  315848
    4715    4716  315848
```
Done!

For your convenience, all these commands are condensed [here](p4cmds.txt).


## Grading
I will use the following rough guide to assign grades:
- 70 pts: from the first six tests
- 30 pts: containerization

Bonus!
- prometheus and grafana

## Video walkthrough
![p4 walkthrough](p4.mp4)

## Submit
Create a `README.md` file in your `p4` directory describing
what works, what doesn't, and any divergences from this
document. 

Commit to the repository.