From 44491ae0a4e2fa90b5021eae9ae14c08bd0a1d08 Mon Sep 17 00:00:00 2001 From: "Peter J. Keleher" <keleher@cs.umd.edu> Date: Sun, 5 Nov 2023 01:44:56 -0400 Subject: [PATCH] auto --- p5.md | 158 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 p5.md diff --git a/p5.md b/p5.md new file mode 100644 index 0000000..e742963 --- /dev/null +++ b/p5.md @@ -0,0 +1,158 @@ +# RAFT +**v1.0** +## Due: Nov 19, 2023, 11:59:59 pm + +## Setup + +Download startup files [here](https://sedna.cs.umd.edu:818/projects/p5.tgz). + +## Part I: Failure-free operation +Implement raft as described in [Figure 2](https://sedna.cs.umd.edu/papers/raftFigure.pdf) of the +[paper](https://sedna.cs.umd.edu/papers/raft.pdf). + +In particular: +- implement the log +- implement leader election + - election timeouts + - election restriction +- appendentries + - heartbeat + - "catching up" a lagging follower + - committing + - informing the client +- failure resilience + + +## Failures you need to handle +- **Failure of a client:** Client failures should not affect anything. + +- **Failure of the leader:** Your system should detect failures, elect a +new leader (with *server selection*), and continue processing commands +from clients while a quorum is still available. For example: a 5-node +system should be able to recover from leader failure and continue +committing client operations. + +- **Slow followers:** i.e. followers that are behind and need to be +"caught up" to the leader's log state. We will implement slow +followers by "pausing" containers use the command `docker compose +pause <hostname>`, and later un-pausing them. + + +## Command Syntax and Observable Output +Your server must be implemented in a single file at the top level named +`raft.go`. Your client must be at the same level and named +`client.go`. The only other files you need are `Dockerfile` and +`compose.yaml`. You will also need the `/pb` subdirectory. + +Your servers and clients are called as follows: +- `raft.go -s <listen addr this server>,<server 0>,<server 1>,.... [-d]`: + For example, we might start a server with `go run raft.go -s localhost:8002,localhost:8000,localhost:8001,localhost:8002 -d`. + This will be a single raft server instance that listens at address + `:8002`, in a system consisting of three servers listening at + `localhost:8000` (id 0), `localhost:8001` (id 1), and `localhost:8002` + (id 2). The `-s` argument is mandatory, but the `-d` flag is not. +- `client.go <leader> <command>`: The "leader" is the presumed leader + of the system and the command is a simple uninterpreted string. + For example: `go run client.go + localhost:8002 one` submits the command "one" to + `localhost:8002`. If this host is not the leader it returns *false* + to the client, which prints this out. The RPC to the server will not + return until the command is committed, at which point the client + will print out "true". + +### Raft Server Output +Your server may print anything with the `-d` flag. Without this flag, +however, it should print at most one line acknowledging when startup +is complete, and then only the following output: + +--- +At each heartbeat the leader should print: +``` +AE [<match index array>] ci:<commit index>, <log len>[<term>-"<cmd>",....] +``` +For example, your leader might print out: +``` +AE [-1 -1] ci:4, 5[8-"L0",8-"one",8-"two",9-"L1",10-"L1"] +``` +The matchindex is an integer slice described as in Figure 2. A +null/nil or "no match" is signified w/ a -1. `ci` is the current +leader commit index. The number before the left bracket ('[') is the +current log length. The entire log is then printed, surrounded by +brackets. Entries have format `<term>-"<cmd>"`, and are separated by +commas. + +--- +Followers should print the same, but use " " instead of "AE". + +--- +When a follower converts to candidacy it should print out +"election" alone on a line. + +--- +During catchup, a server should print a single line for each time it +goes backwards: +``` +lastLogIndex <index> <(address:port) +``` + +--- + +There should be **no other output**, and this output must be followed exactly. + + + +## Details + +- Use whatever timeouts you want during debugging, but you should use those in the startup file when submitting. + + +## Testing. + +**Test 1 (10 pts)** +Two servers. +- Two-server election, both have to agree. +- Submit a command and verify that the client does not return until + the command has been committed and applied. + +**Test 2 (10 pts)** +Continuing the above... +- Kill the leader and immediately restart; the follower must + become the new leader because it's log is more up-to-date. +- Commit another command and power-cycle the leader again, again verifying + that the follower, with it's more up-to-date log, become leader. + +**Test 3 (10 pts)** +Three servers. + +**Test 4 (10 pts)** +Compose w/ three servers. + +**Test 5 (10 pts)** +Compose w/ three servers: +- pause a follower, verify that the commands can still be committed +- unpause the follower, verify that the follower catches up + +**Test 6 (10 pts)** +Compose w/ three servers: +- pause the leader, verify that the commands can still be committed + after new election +- unpause the previous leader, verify that it rejoins, reverts to a follower, catches up + +**Test 7 (10 pts)** +Compose w/ five servers. + + + +``` +> cli localhost:5501 put one +redirect localhost:5502 +> cli localhost:5502 put one +term 1, index 0: commit "one" +``` +Leader fails: +``` +> cli localhost:5502 put two +redirect localhost:5501 +> cli localhost:5502 put two +term 2, index 1: commit "two" +``` -- GitLab