# Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

## Contributions:

- causal+
- linearizable within cluster
  - replicated keys using `chain replication`
- causal across clusters
  - bolt-on-ish
- use of explicit dependencies (before bolt-on)
- get (read) transactions


![arch](copsArch.png)

## COPS

- individual accesses
- explicit tracking of dependences
  - *nearest*
  - contexts (threads, or explicit)
  - Lamport Clocks
     - version is CLOCK.nodeID
       - convergent, not necessarily eventual

![deps](copsDeps.png)

- "context" implemented incrementally as set of nearest neighbors
  - read: 
    - return value added to context
  - write: 
    - context added to write
    - context zero'd out
	- context set to new write
- propagated with writes to remote clusters
- write installed at new cluster by
  - `depCheck()`s to owner of each write in "nearest"
  - `depCheck()` blocks until true
  - `depCheck()` re-issued on time-out

## COPS-GT

*causal not always enough*

### Straightforward problems
Alice
```
s0: ACL == "world"
p0:"all is good"

s1: change ACL to exclude Boss
p2: "hate my job"
```
Assume reads implemented by ACL check followed by read

Boss:
- check security: see s0, which includes Boss
- read returns p2
- Alice is fired

*TOCTOU* vulnerability (time-of-check-to-time-of-use)

### Another implementation
Might consider re-implementing library to issue GETs in reverse order, checking "nearest" as you go
- works for above example (p2 depends on s1)

But if Alice adds a few more actions:
```
p3: delete last post
s2: add Boss back to ACL
```

...the reverse order doesn't help
- could return p2, s2  (both possibly latest version, causally consistent individually when issued)


### A causally-consistent set of reads
s0, p0:  "world" / "all is good"
s1, p0:  "no-boss" / "all is good"
s1, p2:  "no-boss" / "hate job"
s1, p3:  "no-boss" / "all is good"
s2, p3:  "world" / "all is good"


## GT implementation
- "all dependences"
  - really means one-per-key that have been locally modified
  - "nearest" still big win over this because there might be *many* keys
- individual writes not coordinated
- by keeping **all** dependences, not just nearest
- clearly need GC

![deps](copsGT.png)


## Notes
- Kaitlyn - just measured against themselves
- Elliot - ALPS
- Huayang - CAP, what does paxos give up?
- Nikhil - when not convergent?
# Bolt-On Consistency

### Fun scenario

- Sally posts "I think Billy is missing!"
- Seconds later, Billy calls mom, says "Safe!", Sally posts "False
alarm!"
- James posts "What a relief!"

Henry sees JUST original post and James', is confused.

**Shim layer to support causal consistency over weaker (eventually-consistent) systems.**

![arch](boltonArch.png)

### Design and Implementation

Write path:
- Client write performed in **local store**, send to ECDS w/ metadata.

Read path (causal):
- Client read satistfied by local store

Read path (pessimistic):
- synchronous checking on data, and dependencies, in the ECDS, rather than relying
on async *resolve* process to do this for us.

### Motivation

- Several NoSQL stores have added stronger consistency, implying need:
[Vogels: Choosing consistency.](http://www.allthingsdistributed.com/2010/02/).
- HAT work shows causal even in the face of partitions
- allows purely local reads  (so does eventual!)
- no production-ready store provides it

### Challenges

- handling overwritten histories:
  - eventually-consistent stores often choose
    a single winning write, existing cc algs fail because
    the cc layer might not see all writes
- maintaining availability

### Innovations

- alg that buffers writes and ensures writes satisfy *causal cut*.

### Causality

Differentiate between:

-  **potential causality**: what we usually think of, and *which they do
not consider further*, and
- **explicit causality**, i.e. tags provided by app

### Bolt-on *separation of concerns*

- safety properties (causal *shim*) from
- liveness (eventually consistent data store (ECDS))

### Bolt-on

Operations:

- **get(key)**
- **put(key, value, after)**, where *after* is any set of previously
  returned writes (identifiers)

### Consistency cases

Let's assume x1 -> y1 (x1 existed in store when/where y1 was
created). So any replica reading y1 should have (or have had) x1 as
well. Assume x1, y1 created by different processes, and discuss 
what happens at P when it wants to read *y*:

1. P has not received any writes
2. P has received x1, but not y1
3. P has received y1, but not x1
4. P has received both
&nbsp;<p>&nbsp;
&nbsp;<p>&nbsp;
&nbsp;<p>&nbsp;
&nbsp;<p>&nbsp;
&nbsp;<p>&nbsp;
&nbsp;<p>&nbsp;
&nbsp;<p>&nbsp;
&nbsp;<p>&nbsp;


Answers:
1. P has not received and writes, returns y=null
2. P has received x1, but not y1. x1 put in local store, returns y=null.
3. P has received y1, but not x1. P buffers y1,  returns y=null.
4. P has received both, both put in local store, returns y=y1.

The fun case is 3.  We could just go ahead and let P read *y1*, but
we'd have to remember that, and stall the process if it subsequently
reads *x*.
- this means we give up availability (consider network partition)


### Overwritten Histories

The following is the motivation to NOT just use the underlying store
to hold incoming writes directly.

Example (concurrent):

- P1 writes x1 -> y1 -> z1
- P2 writes y2

Assume shim stores immediate dependencies. Assume *P* has received `z1`,
and then `y2`. *P* knows did know that it needed `y1`, but doesn't know
about `x1`. The overwrite by `y2` loses the dependency on `x1`, and the
transitive dependency `x1 -> z11` is also lost. 

### Causal Cuts

Causal cut (paraphrase)
: A set of key-versions, at most one per key, such that all
dependencies of any key are either part of the cut, or concurrent with the
cut's version of that key.

Example:
![arch](boltonCut.png)

Some causal cuts:

- w1, x1, y1    &nbsp;<p>&nbsp;
  - yes
- x1, w2, y1    &nbsp;<p>&nbsp;
  - yes
- w1, x1, y1, z1    &nbsp;<p>&nbsp;
  - yes
- x1, z1    &nbsp;<p>&nbsp;
  - NO: need y1, w1 or w2
- w2, x1, y1, z1    &nbsp;<p>&nbsp;
  - yes
- w1    &nbsp;<p>&nbsp;
  - yes
- w1, z1    &nbsp;<p>&nbsp;
  - NO: need x1, y1
- w2    &nbsp;<p>&nbsp;
  - yes
- w1, x1 &nbsp;<p>&nbsp;<p>&nbsp;
  - yes
- y1, z1    &nbsp;<p>&nbsp;
  - NO: need x1, w1 or w2



### Metadata

Excpected size of dependency set is number of unique keys int he
causal history.

### Errata

- *Callback on covergence* allows key-versions to be dropped from
histories once seen everywhere (GC).
- store pruning: drop keys not needed any more from local store; might require
dropping other writes that depend on it

## Notes/comments
- Nikhil: first read is null, lots of action no reads.
- Anubhav - assume stickiness which can slow things down under load
- Anubhav - long-range dependencies can add up

- does the shim give eventual consistency?
- Is it possible for one to make a system that will never converge, but still
  allow me to lazily claim eventual consistency since it's impossible to prove
  otherwise? 
- has it caught on? 
- surprised that faster than eventual
- fate-sharing, how does rebooting client know what happened?
- scale-out: assume clients scale w/ replicas (especially since co-located)

  


































# Coordination Avoidance in Database Systems

When is coordination strictly necessary to maintain application-level
consistency? Ask programmers.

"Coordination can only be avoided if all local commit decisions are globally valid"


Factoids:

- invariants can be expressed as part of schema DDL
- replicas eventually consistent

Issues:

- sometimes programmers are bad
- black-and-white model: some communication might be cheap, some expensive

### Invariant confluence

Replica is *i-valid* iff invariant I is true.

*Globally I-valid system* can execute a set of transactions T w/
coord freedom, trans avail, convergence iff T is I-confluent wrt I.

A system provides coordination-free execution for a set of
transactions T iff the progress of executing each *t in T* is only
dependent on the versions of the items t reads (i.e., t's replica
state).

## Example

**PRIMARY KEY/UNIQUE constraints**. Assume userids should be
unique. {Stat:5} and {Mary:5} are both I-T-reachable, but {Stan:5,
Mary:5} is not I-valid. Therefore, uniqueness not I-confluent for
inserts of unique values.

**HOWEVER:** *reads* and *deletes* are both I-confluent under uniqueness.

**AND FURTHER:** could add a function to generate unique IDs s.t. IDs from
distinct replicas doesn't overlap, and then uniqueness would be
I-confluent.