# Bayou

![](bayouArch.png)


### Assumptions:
- read/write any replica
- full replication
- disconnected operation
- piecewise determinism

### Provides
- eventual consistency
- app-specific conflict resolution


### Methods
- optimism
- anti-entropy sessions
- session semantics
- per-update **dependency checks** and **merge procedures**
- committed vs tentative updates


### Apps:
- scheduler
- bib database

### Dependency checks:
- better than version vectors because
    - read/write conflicts
    - arbitrary, multi-item constraints
        - scheduler
        - bank drafts

![](bayouWrite.png)

## Consistency
### "Log Vectors
- "O"
  - TS’s of last tossed (‘omitted’) writes
  - works because writes tossed in order
  - writes from any server are propogated and committed in TS order
- “C” : max TS’s of committed writes
- “F” : max TS’s of tentative writes
- used for anti-entropy, not conflict detection

![](bayouDB.png)

- new writes *tentative*
    - ordered by local server timestamp
    - TS mono-increasing: <TS, server ID>
    - immediately applied!
	    - must have undo
	    - must have redo

### write stability
- matrix clocks, or
- timestamps, or
- **cheat**

### primary commit
- one server responsible for final ordering of all updates
- ordering?
    - not clear
    - hopefully consistent w/ timestamp order, but maybe not if some servers disconnected

![](bayouReceiveWrite.png)
	


![](bayouAntientropy.png)

![](bayouCommits.png)

![](bayouTruncation.png)

### Transportable media!
- parameters CSN and V define minimum state receiver must have in order to use


### Session guarantees
- writes must be causally ordered
- A precedes B iff A was already known to the server that received B from a client
- scalar logical clock

### Fun things: Policy Choices

- when to reconcile
  - periodically, manual, system trigger
- who to reconcile w/
  - network characteristics, up-to-dateness of replicas, truncations
- how aggressively to truncate write log
  - eh.
- who to create new server from
  - up-to-dateness, identifier length?

### Creation and Retirement Writes
- Bayou server Si creates itself by sending creation write to another server Sk
  - gives Si name <Tk,I, Sk>
  - tells others of <Tk,I, Sk>’s existance
- Disappearing servers issue “retirement writes”
- What do we know if target of anti-entropy, SL, doesn’t have an entry for some server <Tk,I, Sk> ?
  - either SL hasn’t heard of <Tk,I, Sk>, or knows that it is gone
  - tell by looking SL.V[Sk]   (version vector)

### Security
- potentially disconnected
  - still need to make progress
- public-key crypto
- both clients/servers have signed certificates for rights 
- delegation certificates
- revocation

## Conclusions
- non-transparency
- app-specific conflict detection
- per-write resolvers
- partial- and multi-object updates
- tentative vs stable writes

## Your Thoughts
- punt to human(s), humans not deterministic?
- deterministic, even failures because of resources
- long partitions?
- reading from tentative storage
  - CAP
- why (is it?) unidirectional
- tradeoff on truncation vs entire DB
- new replica creation
- What is the point of having multiple servers, if my updates on data can be lost?
(session consistency)

-------------
# Session guarantees
- abstractions of reads and writes
- not atomic
- allows abstraction of single, centralized server
- can think of a sess. guarantee as subsetting acceptable servers

### Terms
- DB(S,t) : ordered seq. of writes seen by S at t
- Weak consistency: DB(S1,t) != DB(S2,t) 
- Eventual consistency: DB(S1,t<sub>inf</sub>) = DB(S2,t<sub>inf</sub>)
- session semantics “guaranteed”
  - they either hold, or app informed  (WTF?)

*Non-commutative writes need to be applied in same order everywhere*

### Supporting Session Guarantees
- Responsibility of “session manager”, not servers!
- WriteOrder(W1,W2) Boolean predicate (W1 before W2)
- Two sets:
  - read-set: set of writes that are relevant to session reads
  - write-set: set of writes performed in session
- Causal ordering of writes
  - Use scalar logical (Lamport) clocks

### Read Your Writes
- If Read R follows Write W in a session, and R is performed at server S at time t, then W is included in DB(S,t)

### Monotonic Reads

- Successive reads reflect a non-decreasing set of writes
- A WS is complete for R, DB(S,t), if R returns same value whether against WS or DB(S,t)
- RelevantWrites(S,t,R) 
  - returns the smallest set of Writes that is complete for Read R and DB(S,t)
- MR-guarantee: 
  - If Read R1 occurs before R2 in a session and R1 accesses server S1 at time t1 and R2 accesses server S2 at time t2, then RelevantWrites(S1,t1,R1) is a subset of DB(S2,t2)

### Writes Follow Reads
- Writes are propagated after reads on which they depend
- WFR-guarantee: 
  - If Read R1 precedes Write W2 in a session and R1 is performed at server S1 at time t1, then, for any server S2, if W2 is in DB(S2) then
    - any W1 in RelevantWrites(S1,t1,R1) is also in DB(S2) and 
    - WriteOrder(W1,W2).

### Monotonic Writes
- New writes propagated after prior session writes
- MW-guarantee: 
  - If Write W1 precedes Write W2 in a session, then, for any server S2, if W2 in DB(S2) then W1 is also in DB(S2) and WriteOrder(W1,W2).



# Elephant

### Goals:
- undo vs long-term history

### Files:
- read-only
- derived
- cached
- temporary
- user-modified  <-- need versioning

### Policies:
- keep one  (browser cache, core, /tmp)
- keep all  
- keep safe   (just undo)
- keep landmarks <--

### Metadata:
- imap
- ptr into
    - inode log (versioned per-file) or 
    - multiple non-versioned file inodes
- temperature
- policies
![imap](imap2.png)

### Directories
- ordinary file
    - versioning inside file, not outside
- basically a log of changes:
    - create
    - mutate
	- delete
- history partition
    - move old deletes into a second inode so as to not slow current operations	

![dirs](dirs.png)

### Useful:

    cd foo/@12-nov-1999:11:30
    tls      'ls @v'
    tgrep


### Application policies:
- user-level process called when the cleaner comes across high-temp file

### Downsides:
- less locality in inodes, data blocks
- pressure on buffer cache

