#  Byzantine Consensus, and PBFT

## Simple example: Two Generals

![](byzTwoG.png)

- one is a decider:
- both need to attack same time
- need to agree on:
  - time: (easy: msg and an ack)
  - agreement to attack (hard)

&nbsp;<p>
&nbsp;<p>
&nbsp;<p>
&nbsp;<p>

**Example:**
A sends to B "attack at 10".
But did B get it? Can't go unless sure.
B sends an ack,
but did A get the ack?

&nbsp;<p>
&nbsp;<p>
&nbsp;<p>
&nbsp;<p>

### Impossibility

Look at sequence of msg-ack-ack......

Assume there is some subset of i msgs that constitutes a proof, and
that both would attack.  

However, what if the last msg not delivered?
  - receiver presumably would not attack
  - sender, though, sees same msgs as the i-sequence, and so  attacks....

&nbsp;<p>
&nbsp;<p>
&nbsp;<p>
&nbsp;<p>

### Fix?
- A sends a whole bunch, assume one gets through
- A and B send and ack a while

However, *provable agreement between even two parties in asynchronous
environment not possible*.

&nbsp;<p>
&nbsp;<p>
&nbsp;<p>
&nbsp;<p>

## Two Lieutenants Problem

*Safety*:
- all loyal lieutenants make same decision
- all loyal lieutenants follow loyal general

&nbsp;<p>
&nbsp;<p>
&nbsp;<p>
&nbsp;<p>

![](byzL.png)


&nbsp;<p>
&nbsp;<p>
&nbsp;<p>
&nbsp;<p>

![](byzL3.png)


&nbsp;<p>
&nbsp;<p>
&nbsp;<p>
&nbsp;<p>


![](byzAlb.png)




Byzantine faults means byzantine protocols. *3f+1* nodes needed to tolerate *f*
faults.
- proved impossibility of *3* nodes to tolerate a single fault
- showed by contradiction that no protocol of *3f* nodes can be correct.
  - Assume a solution with *3f* nodes
  - Use *3* Byzantine generals to simulate, each Byzantine simulates *f* Albanian generals
  - if a solution for *3f* exists, this simulation will be correct
  - however, there are **really** only *3* generals, meaning we have solved for *3*
    nodes w/ one faulty
  - contradiction!


# Consensus
Problem is for processes to agree on a value that one or more has proposed.

## Asynchronous vs synchronous systems:

In synchronous settings, msgs arrive within fixed amount of time:

- any longer delay means failure

Asynchronous:

- msgs arbitrarily slow (*communication asynchrony*)
- machines arbitrarily slow (*process asynchrony*)
- re-ordered messages (*message order asynchrony*)

Distributed systems must be assumed `concurrent`, `asynchronous`, and
`subject to failure`.

> Fischer et al. ('85) showed that no guarantee possible for dist agreement, but:

Despite real systems being async:

- they work anyway.....
- can reach agreement with "high probability"
   - faults can be masked (flash storage)
   - "perfect" failure detectors
      - not really, but all agree to abide
   - use randomization
      - mal manipulates procs, comm so msgs arrive at just the wrong time
      - random delays make this tougher

# Fault Tolerance

Faults may be:

- transient
- intermittent
- permanent

Failures:

- **fail-stop failures**  (also "fail-silent") processors fail by ceasing to
communicate. No incorrect communication is sent.
- **byzantine failures** allows faulty process to continue
  communicating, but may send arbitrary messages, maybe collaborate
  with other "faulty" messages, etc.

Approaches to faults:

- *avoidance* - Formal validations, code inspection, etc.
- *fault removal* - Encounter bugs fix, re-run.
- *fault tolerance* - correct operation in the presence of faults.
  - information redundancy : replication, coding
  - time redundancy : retries (no good for permanent failures)
  - physical redundancy : backups, master-slaves, RAID disks

Primary backup approach:

- primary does all the work
- backups detect primary failure w/ heartbeat messages
  - cold failover - need to restart everything, loses work
  - warm failover - primary informs backups of changes (checkpoints,
    or msgs), loses little or nothing




# Consensus System Model  (sync)
### Have:

- a collection of procs p_i (1 .. N) w/ msg-passing
- assume reliable communication, but processes might "fail" (fail-stop)
- maybe *digital signatures*, otherwise process can *lie*

### Problem:

- each proc begins "undecided", and proses a val v_i
- procs communicate w/ each other
- each "decides" a value d_i

### Solution requirements :

- *termination*: each correct proc sets a d_i
- *agreement*: decision values of all correct procs is same
- *integrity*: if correct processes all proposed same value, then any correct process has to decided this value


### How solve consensus w/ no failures?
 
- each proc chooses
- reliably multicasts to everyone else
- decisions chosen through majority (or min, or max.....)


# Generalization is the byzantine generals
### (or General and the lieutenants)

- a general decides, tells everyone  
   - (this different from consensus)
- lieutenants talk amongst themselves about what they have been told by general, other lieutenants

### Solution requires:

- (*termination*): finishes
- (*agreement*):  all correct lieutenants decide the same
- (*integrity*): if general correct, all correct lieuts decide correctly

Note
: easy to tell that lying is happening; *hard to tell who*


# First, let's show impossible with three:

- A tells B go, C nogo. 
- B and C tell each other what A said. 
- Both B and C know someone is lying, but can't tell who.

![3 generals](generals3-2.png)
 
So no "agreement" or "integrity"

# If 3 doesn't work, neither does 3f!
Sketch out extension to N = 3f+1;

Relies on the impossibility of with one faulty, 3 total:

Assume have an algorithm that works for N = 3f, where f > 1. 

Then:

- let's have three procs, two correct, one not
- each internally simulates f generals
   - the two correct procs each simulate f correct generals, the bad simulates f bad ones
 
Assumption means:

- consensus must be reached
- and all correct generals (i.e. those in the correct procs) reach the right solution

But:

- we really only have three procs
- could construct from this a way to solve for three, by:
   - each proc "decides" by majority of its simulated generals
   - whole system "decides" by majority of procs.

But now we have solved the problem for three procs! Contradiction....

# But 4 works (oral message solution)

- Oral implies that messages can not be faked, msg origin always known.
- Multiple rounds, recursively telling each other what they've heard
from other nodes.
- works if less than 1/3 faulty  (system size must be 3f+1)

![4 generals](generals4-2.png)

Note that this is a rough approximation of the real algorithm, which
requires rounds proportional to the number of participants.

# What if signatures?

Solution for 3 exists if we are just worrying about equivocation:

- each distributes received msgs to others.
- compare three msgs from command to see if faulty.
- relies on a default choice ("no", or "retreat")

