
# FAWN: A Fast Array of Wimpy Nodes

Motivation: **power**
- power up to 50% of 3-yr cost
- assert: datacenter density limited by cooling
- CPU's outstripping disks
- CPU power needs super-linear w/ speed
- dynamic CPU power scaling no good
- "*systems remain most energy-efficient when operating at peak utilization*"

Approach:
- per-node log-structured FAWN-DS datastore
- strong consistency using chain replication
- distributed by consistent hashing

# FAWN-DS
Fawn fact(s):      *shockingly wimpy*
- 500 Mhz CPU, 1300 queries per second, under 5W.
- 256 MB DRAM
- 4GB flash
![DS](fawnDS.png)


## Lookup
![lookup](fawnLookup.png)


## Points
- V key ranges
  - each with distinct file
    - appending to small set of files (*semi-random writes*) nearly as fast as to a single file ??

## Maintenance
- **split:** split datastore in two to accomodate new VID. 
  - Scan log copying relevant data to new log.
  - *many orphaned values*
- **merge:** appends values of one log to another
- **compact:** cleans log


# FAWN-KV
![fawn-kv](fawnChord.png)

<p>

![fawn-kv](fawnKV.png)

## Front ends
- maintain the entire node membership list and directly forward queries to appriate back-end node
- keyspace divided across multiple front ends
  - no cache consistency because single front end for each range
  - clients should approximately know when front end to use for each key
- management node maps ranges to front ends
    - currently single node
	- should be PAXOS


## Caching
- read from SSD at 1300/sec (per node)
- 85,000/sec if caught by buffer cache (OS)
- tiny in-memory cache (like memcache)

## Chain Replication

![fawnChains](fawnChains.png)

- **consistent:** values read only after being replicated along chain
- high-throughput, not low write-latency
- fragile











## Eval

![fawnEval](fawnTCO.png)


## Cool stuff
- SSDs
- log-structured
- chain replication
- consistency hashing

## Comments:
- (andrew) not clear to me why the non-contiguous nature of the virtual nodes helps in load-balancing and reducing failover times
- (nao) valid concern that we want to save electricity, but I did not imagine it can be published as a paper.

- power becoming more of an issue as companies go green

# Scatter

Two-phase commit across multiple PAXOS groups.

1. The coordinator group replicates the decision to initiate the transaction.
2. The coordinator group broadcasts a transaction prepare message to the nodes of the participant groups.
3. Upon receiving the prepare message, a participant group decides whether or not to commit the proposed transaction and replicates its vote.
4. A participant group broadcasts a commit or abort message to the nodes of the coordinator group.
5. When the votes of all participant groups is known, the coordinator group replicates whether or not the transaction was committed.
6. The coordinator group broadcasts the outcome of the transaction to all participant groups.
7. Participant groups replicate the transaction outcome.
8. When a group learns that a transaction has been committed then it executes the steps of the proposed transaction, the particulars of which depend on the multi-group operation.



- Two-phase commit across paxos groups
![two-phase](scatterTwoPhase.png)

![two-phase](scatterConflict.png)


### Notes
- distributed replication plays the role of write-ahead logging to stable storage in the classic 2PC protocol

![two-phase](scatterKeys.png)

![two-phase](scatterPrimaries.png)

# Comments

OpenDHT is based on chord, and is well-known to be *horrible*, both in performance and in correctness.

- (nao) why not 1pc since already reliable?

- disappointing to see them punt on security concerns, robustness concerns, and linearizability with multi-key transactions - allen

- (katura) policy! Yes, much impact on performance

- (patrick) no byzantine, where would it be?  Would need permissions as well.



