Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

RAFT

v1.0

Due: Nov 19, 2023, 11:59:59 pm

Setup

Download startup files here.

Part I: Failure-free operation

Implement raft as described in Figure 2 of the paper.

In particular:

  • implement the log
  • implement leader election
    • election timeouts
    • election restriction
  • appendentries
    • heartbeat
    • "catching up" a lagging follower
    • committing
    • informing the client
  • failure resilience

Failures you need to handle

  • Failure of a client: Client failures should not affect anything.

  • Failure of the leader: Your system should detect failures, elect a new leader (with server selection), and continue processing commands from clients while a quorum is still available. For example: a 5-node system should be able to recover from leader failure and continue committing client operations.

  • Slow followers: i.e. followers that are behind and need to be "caught up" to the leader's log state. We will implement slow followers by "pausing" containers use the command docker compose pause <hostname>, and later un-pausing them.

Command Syntax and Observable Output

Your server must be implemented in a single file at the top level named raft.go. Your client must be at the same level and named client.go. The only other files you need are Dockerfile and compose.yaml. You will also need the /pb subdirectory.

Your servers and clients are called as follows:

  • raft.go -s <listen addr this server>,<server 0>,<server 1>,.... [-d]: For example, we might start a server with go run raft.go -s localhost:8002,localhost:8000,localhost:8001,localhost:8002 -d. This will be a single raft server instance that listens at address :8002, in a system consisting of three servers listening at localhost:8000 (id 0), localhost:8001 (id 1), and localhost:8002 (id 2). The -s argument is mandatory, but the -d flag is not.
  • client.go <leader> <command>: The "leader" is the presumed leader of the system and the command is a simple uninterpreted string. For example: go run client.go localhost:8002 one submits the command "one" to localhost:8002. If this host is not the leader it returns false to the client, which prints this out. The RPC to the server will not return until the command is committed, at which point the client will print out "true".

Raft Server Output

Your server may print anything with the -d flag. Without this flag, however, it should print at most one line acknowledging when startup is complete, and then only the following output:


At each heartbeat the leader should print:

AE [<match index array>] ci:<commit index>, <log len>[<term>-"<cmd>",....]

For example, your leader might print out:

AE   [-1 -1] ci:4, 5[8-"L0",8-"one",8-"two",9-"L1",10-"L1"]

The matchindex is an integer slice described as in Figure 2. A null/nil or "no match" is signified w/ a -1. ci is the current leader commit index. The number before the left bracket ('[') is the current log length. The entire log is then printed, surrounded by brackets. Entries have format <term>-"<cmd>", and are separated by commas.


Followers should print the same, but use " " instead of "AE".


When a follower converts to candidacy it should print out "election" alone on a line.


During catchup, a server should print a single line for each time it goes backwards:

lastLogIndex <index> <(address:port)

There should be no other output, and this output must be followed exactly.

Details

  • Use whatever timeouts you want during debugging, but you should use those in the startup file when submitting.

Testing.

Test 1 (10 pts) Two servers.

  • Two-server election, both have to agree.
  • Submit a command and verify that the client does not return until the command has been committed and applied.

Test 2 (10 pts) Continuing the above...

  • Kill the leader and immediately restart; the follower must become the new leader because it's log is more up-to-date.
  • Commit another command and power-cycle the leader again, again verifying that the follower, with it's more up-to-date log, become leader.

Test 3 (10 pts) Three servers.

Test 4 (10 pts) Compose w/ three servers.

Test 5 (10 pts) Compose w/ three servers:

  • pause a follower, verify that the commands can still be committed
  • unpause the follower, verify that the follower catches up

Test 6 (10 pts) Compose w/ three servers:

  • pause the leader, verify that the commands can still be committed after new election
  • unpause the previous leader, verify that it rejoins, reverts to a follower, catches up

Test 7 (10 pts) Compose w/ five servers.

> cli localhost:5501 put one
redirect localhost:5502
> cli localhost:5502 put one
term 1, index 0: commit "one" 

Leader fails:

> cli localhost:5502 put two
redirect localhost:5501
> cli localhost:5502 put two
term 2, index 1: commit "two"