
# FAWN: A Fast Array of Wimpy Nodes

Motivation: **power**
- power up to 50% of 3-yr cost
- assert: datacenter density limited by cooling
- CPU's outstripping disks
- CPU power needs super-linear w/ speed
- dynamic CPU power only semi-efficient
- "*systems remain most energy-efficient when operating at peak utilization*" (cost / op)

Approach:
- per-node log-structured FAWN-DS datastore
- strong consistency using chain replication
- distributed by consistent hashing
- **flash storage**
  - fast random reads
  - power-efficient even under load
  - slow writes


# FAWN-KV
![fawn-kv](fawnChord.png)

<p>

![fawn-kv](fawnKV.png)

## Front ends
- maintain the entire node membership list and directly forward queries to appriate back-end node
- keyspace divided across multiple front ends
  - no cache consistency because single front end for each range
  - clients should approximately know when front end to use for each key
- management node maps ranges to front ends
    - currently single node
	- should be PAXOS

# FAWN-DS
Fawn fact(s):      *shockingly wimpy*
- 500 Mhz CPU, 1300 queries per second, under 5W.
- 256 MB DRAM
- 4GB flash

## Lookup
<img src=fawnLookup.png width=600>

Index periodically checkpointed.

## Points
- Each storage node has V key ranges mapped to it
  - each with distinct file
    - appending to small set of files (*semi-random writes*) nearly as fast as sequential writes

## Maintainance
- **split:** split datastore in two to accomodate new VID. 
  - Scan log copying relevant data to new log.
  - *many orphaned values*
- **merge:** appends values of one log to another
- **compact:** cleans log


## Caching
- read from SSD at 1300/sec (per node)
- 85,000/sec if caught by buffer cache (OS)
- tiny in-memory cache (like memcache)

## Chain Replication

![fawnChains](fawnChains.png)

- **consistent:** values read only after being replicated along chain
- high-throughput, not low write-latency
- fragile











## Eval

![fawnEval](fawnTCO.png)

Low query rates, large size -> FAWN + Disk
High query rates, small size -> FAWN + DRAM


## Cool stuff
- SSDs
- log-structured
- chain replication
- consistency hashing

## Comments:
- mixing specialized hardware and custom code weaken conclusion?

- Bloom filter work instead of hash?

- What changes would need to be made for this system to be able to handle the high throughput and large data of a production data storage system?

- how 4 GB (+256mb ram) is enough storage for a node, considering a significant portion of
that space would be dedicated to storing the log+index


- how much of their advantage comes from the ssds?

- power becoming more of an issue as companies go green
