Skip to content
Snippets Groups Projects
Commit 44491ae0 authored by Peter J. Keleher's avatar Peter J. Keleher
Browse files

auto

parent d0faf49d
No related branches found
No related tags found
No related merge requests found
p5.md 0 → 100644
# RAFT
**v1.0**
## Due: Nov 19, 2023, 11:59:59 pm
## Setup
Download startup files [here](https://sedna.cs.umd.edu:818/projects/p5.tgz).
## Part I: Failure-free operation
Implement raft as described in [Figure 2](https://sedna.cs.umd.edu/papers/raftFigure.pdf) of the
[paper](https://sedna.cs.umd.edu/papers/raft.pdf).
In particular:
- implement the log
- implement leader election
- election timeouts
- election restriction
- appendentries
- heartbeat
- "catching up" a lagging follower
- committing
- informing the client
- failure resilience
## Failures you need to handle
- **Failure of a client:** Client failures should not affect anything.
- **Failure of the leader:** Your system should detect failures, elect a
new leader (with *server selection*), and continue processing commands
from clients while a quorum is still available. For example: a 5-node
system should be able to recover from leader failure and continue
committing client operations.
- **Slow followers:** i.e. followers that are behind and need to be
"caught up" to the leader's log state. We will implement slow
followers by "pausing" containers use the command `docker compose
pause <hostname>`, and later un-pausing them.
## Command Syntax and Observable Output
Your server must be implemented in a single file at the top level named
`raft.go`. Your client must be at the same level and named
`client.go`. The only other files you need are `Dockerfile` and
`compose.yaml`. You will also need the `/pb` subdirectory.
Your servers and clients are called as follows:
- `raft.go -s <listen addr this server>,<server 0>,<server 1>,.... [-d]`:
For example, we might start a server with `go run raft.go -s localhost:8002,localhost:8000,localhost:8001,localhost:8002 -d`.
This will be a single raft server instance that listens at address
`:8002`, in a system consisting of three servers listening at
`localhost:8000` (id 0), `localhost:8001` (id 1), and `localhost:8002`
(id 2). The `-s` argument is mandatory, but the `-d` flag is not.
- `client.go <leader> <command>`: The "leader" is the presumed leader
of the system and the command is a simple uninterpreted string.
For example: `go run client.go
localhost:8002 one` submits the command "one" to
`localhost:8002`. If this host is not the leader it returns *false*
to the client, which prints this out. The RPC to the server will not
return until the command is committed, at which point the client
will print out "true".
### Raft Server Output
Your server may print anything with the `-d` flag. Without this flag,
however, it should print at most one line acknowledging when startup
is complete, and then only the following output:
---
At each heartbeat the leader should print:
```
AE [<match index array>] ci:<commit index>, <log len>[<term>-"<cmd>",....]
```
For example, your leader might print out:
```
AE [-1 -1] ci:4, 5[8-"L0",8-"one",8-"two",9-"L1",10-"L1"]
```
The matchindex is an integer slice described as in Figure 2. A
null/nil or "no match" is signified w/ a -1. `ci` is the current
leader commit index. The number before the left bracket ('[') is the
current log length. The entire log is then printed, surrounded by
brackets. Entries have format `<term>-"<cmd>"`, and are separated by
commas.
---
Followers should print the same, but use " " instead of "AE".
---
When a follower converts to candidacy it should print out
"election" alone on a line.
---
During catchup, a server should print a single line for each time it
goes backwards:
```
lastLogIndex <index> <(address:port)
```
---
There should be **no other output**, and this output must be followed exactly.
## Details
- Use whatever timeouts you want during debugging, but you should use those in the startup file when submitting.
## Testing.
**Test 1 (10 pts)**
Two servers.
- Two-server election, both have to agree.
- Submit a command and verify that the client does not return until
the command has been committed and applied.
**Test 2 (10 pts)**
Continuing the above...
- Kill the leader and immediately restart; the follower must
become the new leader because it's log is more up-to-date.
- Commit another command and power-cycle the leader again, again verifying
that the follower, with it's more up-to-date log, become leader.
**Test 3 (10 pts)**
Three servers.
**Test 4 (10 pts)**
Compose w/ three servers.
**Test 5 (10 pts)**
Compose w/ three servers:
- pause a follower, verify that the commands can still be committed
- unpause the follower, verify that the follower catches up
**Test 6 (10 pts)**
Compose w/ three servers:
- pause the leader, verify that the commands can still be committed
after new election
- unpause the previous leader, verify that it rejoins, reverts to a follower, catches up
**Test 7 (10 pts)**
Compose w/ five servers.
```
> cli localhost:5501 put one
redirect localhost:5502
> cli localhost:5502 put one
term 1, index 0: commit "one"
```
Leader fails:
```
> cli localhost:5502 put two
redirect localhost:5501
> cli localhost:5502 put two
term 2, index 1: commit "two"
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment