# Raft

([visualization](http://thesecretlivesofdata.com/raft/))



![raft protocol](raftProtocol.png)

## Coolness
- Strong leaders (log entries only flow one way)
- Leadership changes via randomized timeouts
- Joint consensus

## Raft guarantees

- **Election Safety:** at most one leader can be elected in a given term. §5.2
- **Leader Append-Only:** a leader never overwrites or deletes entries in its log; it only appends new entries. §5.3
- **Log Matching:** if two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index. §5.3
- **Leader Completeness:** if a log entry is committed in a given term, then that entry will be present in the logs of the leaders for all higher-numbered terms. §5.4
- **State Machine Safety:** if a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index.



## Operation Commits
![commits](raftCommitted.png)
- op are only executed once committed
- followers generally learn of commits at the following `AppendEntries`
- op is **committed** when present on a majority of logs (except maybe not with leader changes)

## Log weirdness
![logs](raftNewLeader.png)
- **a,b:** missing entries
- **c,d:** extra uncommitted entries
- **e,f:** or both


## No committing entries from previous term, except implicitly
![confusion](raftConfusion.png)

Leader maintains `nextIndex[]` to hold next slot to be sent to follower

## Elections
- election timeout goes off
- increment term
- send CANDIDATE messages
- wait for majority of responses

*Followers* respond yes if:
- the *leader is at least as up-to-date* (compare last entries, bigger term wins, then longer log)
- have not voted for another in same term


## Joint Consensus
(changing system size)
![joint](raftJointConsensus.png)

## Snapshots 
Used to GC prefixes of log.   Each replica decides independently, leader might
have to read from disk to bring new replica up to date



### Questions

- Aaron: how to tell if non-conflicting ops?

- synchronous disk writes
- Reads can occur anywhere, correct? Nodes just do not provide a “to write” value until the leader confirms the write, correct? (zach)
- Why hasn’t RAFT won out over Paxos? (zack)
- "increase the lag between the write operation and availability of the data
  on the leader node thus causing higher inconsistent instances. I would think
  for most practical applications, the cost of consistency would outweigh the
  benefit of having consensus" (guowei)
- but can a follower act in a byzantine way and always make itself leader? (yusuf)
- Why is it that clients should only interact with the leader? (yusuf)
- if there is a major partition with two leaders, one on each side of the partition, won't the system lose data when the partition is healed? (greg)
- What happens in case of 2 or more network partitions, a case where none of the partitions have a clear majority, whose writes are treated as the majority ones then? (sankha)
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>
&nbsp;
<p>



# RAFT project

Implement:
1. raft package
   1. no system membership changes (no joint consensus)
   1. no snapshots
   1. API:
      1. `put()`
1. lock package (based on project 3 approach), w/ API:
   1. `acquire()`: get the lock
   1. `theftHandler()`: respond to lock manager requesting lock back

Need to have lock for:
1. all reads and writes
1. while have any dirty, unflushed data

