auto

38089991 · Peter J. Keleher · a1ceb7a2 · 38089991
Commit 38089991 authored 1 year ago by Peter J. Keleher
--- a/notes/cb.md
+++ b/notes/cb.md
+# Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS
+
+## Contributions:
+
+- causal+
+- linearizable within cluster
+  - replicated keys using `chain replication`
+- causal across clusters
+  - bolt-on-ish
+- use of explicit dependencies (before bolt-on)
+- get (read) transactions
+
+
+![arch](copsArch.png)
+
+## COPS
+
+- individual accesses
+- explicit tracking of dependences
+  - *nearest*
+  - contexts (threads, or explicit)
+  - Lamport Clocks
+     - version is CLOCK.nodeID
+       - convergent, not necessarily eventual
+
+![deps](copsDeps.png)
+
+- "context" implemented incrementally as set of nearest neighbors
+  - read: 
+    - return value added to context
+  - write: 
+    - context added to write
+    - context zero'd out
+	- context set to new write
+- propagated with writes to remote clusters
+- write installed at new cluster by
+  - `depCheck()`s to owner of each write in "nearest"
+  - `depCheck()` blocks until true
+  - `depCheck()` re-issued on time-out
+
+## COPS-GT
+
+*causal not always enough*
+
+### Straightforward problems
+Alice
+```
+s0: ACL == "world"
+p0:"all is good"
+
+s1: change ACL to exclude Boss
+p2: "hate my job"
+```
+Assume reads implemented by ACL check followed by read
+
+Boss:
+- check security: see s0, which includes Boss
+- read returns p2
+- Alice is fired
+
+*TOCTOU* vulnerability (time-of-check-to-time-of-use)
+
+### Another implementation
+Might consider re-implementing library to issue GETs in reverse order, checking "nearest" as you go
+- works for above example (p2 depends on s1)
+
+But if Alice adds a few more actions:
+```
+p3: delete last post
+s2: add Boss back to ACL
+```
+
+...the reverse order doesn't help
+- could return p2, s2  (both possibly latest version, causally consistent individually when issued)
+
+
+### A causally-consistent set of reads
+s0, p0:  "world" / "all is good"
+s1, p0:  "no-boss" / "all is good"
+s1, p2:  "no-boss" / "hate job"
+s1, p3:  "no-boss" / "all is good"
+s2, p3:  "world" / "all is good"
+
+
+## GT implementation
+- "all dependences"
+  - really means one-per-key that have been locally modified
+  - "nearest" still big win over this because there might be *many* keys
+- individual writes not coordinated
+- by keeping **all** dependences, not just nearest
+- clearly need GC
+
+![deps](copsGT.png)
+
+
+## Notes
+- Kaitlyn - just measured against themselves
+- Elliot - ALPS
+- Huayang - CAP, what does paxos give up?
+- Nikhil - when not convergent?
+# Bolt-On Consistency
+
+### Fun scenario
+
+- Sally posts "I think Billy is missing!"
+- Seconds later, Billy calls mom, says "Safe!", Sally posts "False
+alarm!"
+- James posts "What a relief!"
+
+Henry sees JUST original post and James', is confused.
+
+**Shim layer to support causal consistency over weaker (eventually-consistent) systems.**
+
+![arch](boltonArch.png)
+
+### Design and Implementation
+
+Write path:
+- Client write performed in **local store**, send to ECDS w/ metadata.
+
+Read path (causal):
+- Client read satistfied by local store
+
+Read path (pessimistic):
+- synchronous checking on data, and dependencies, in the ECDS, rather than relying
+on async *resolve* process to do this for us.
+
+### Motivation
+
+- Several NoSQL stores have added stronger consistency, implying need:
+[Vogels: Choosing consistency.](http://www.allthingsdistributed.com/2010/02/).
+- HAT work shows causal even in the face of partitions
+- allows purely local reads  (so does eventual!)
+- no production-ready store provides it
+
+### Challenges
+
+- handling overwritten histories:
+  - eventually-consistent stores often choose
+    a single winning write, existing cc algs fail because
+    the cc layer might not see all writes
+- maintaining availability
+
+### Innovations
+
+- alg that buffers writes and ensures writes satisfy *causal cut*.
+
+### Causality
+
+Differentiate between:
+
+-  **potential causality**: what we usually think of, and *which they do
+not consider further*, and
+- **explicit causality**, i.e. tags provided by app
+
+### Bolt-on *separation of concerns*
+
+- safety properties (causal *shim*) from
+- liveness (eventually consistent data store (ECDS))
+
+### Bolt-on
+
+Operations:
+
+- **get(key)**
+- **put(key, value, after)**, where *after* is any set of previously
+  returned writes (identifiers)
+
+### Consistency cases
+
+Let's assume x1 -> y1 (x1 existed in store when/where y1 was
+created). So any replica reading y1 should have (or have had) x1 as
+well. Assume x1, y1 created by different processes, and discuss 
+what happens at P when it wants to read *y*:
+
+1. P has not received any writes
+2. P has received x1, but not y1
+3. P has received y1, but not x1
+4. P has received both
+&nbsp;<p>&nbsp;
+&nbsp;<p>&nbsp;
+&nbsp;<p>&nbsp;
+&nbsp;<p>&nbsp;
+&nbsp;<p>&nbsp;
+&nbsp;<p>&nbsp;
+&nbsp;<p>&nbsp;
+&nbsp;<p>&nbsp;
+
+
+Answers:
+1. P has not received and writes, returns y=null
+2. P has received x1, but not y1. x1 put in local store, returns y=null.
+3. P has received y1, but not x1. P buffers y1,  returns y=null.
+4. P has received both, both put in local store, returns y=y1.
+
+The fun case is 3.  We could just go ahead and let P read *y1*, but
+we'd have to remember that, and stall the process if it subsequently
+reads *x*.
+- this means we give up availability (consider network partition)
+
+
+### Overwritten Histories
+
+The following is the motivation to NOT just use the underlying store
+to hold incoming writes directly.
+
+Example (concurrent):
+
+- P1 writes x1 -> y1 -> z1
+- P2 writes y2
+
+Assume shim stores immediate dependencies. Assume *P* has received `z1`,
+and then `y2`. *P* knows did know that it needed `y1`, but doesn't know
+about `x1`. The overwrite by `y2` loses the dependency on `x1`, and the
+transitive dependency `x1 -> z11` is also lost. 
+
+### Causal Cuts
+
+Causal cut (paraphrase)
+: A set of key-versions, at most one per key, such that all
+dependencies of any key are either part of the cut, or concurrent with the
+cut's version of that key.
+
+Example:
+![arch](boltonCut.png)
+
+Some causal cuts:
+
+- w1, x1, y1    &nbsp;<p>&nbsp;
+  - yes
+- x1, w2, y1    &nbsp;<p>&nbsp;
+  - yes
+- w1, x1, y1, z1    &nbsp;<p>&nbsp;
+  - yes
+- x1, z1    &nbsp;<p>&nbsp;
+  - NO: need y1, w1 or w2
+- w2, x1, y1, z1    &nbsp;<p>&nbsp;
+  - yes
+- w1    &nbsp;<p>&nbsp;
+  - yes
+- w1, z1    &nbsp;<p>&nbsp;
+  - NO: need x1, y1
+- w2    &nbsp;<p>&nbsp;
+  - yes
+- w1, x1 &nbsp;<p>&nbsp;<p>&nbsp;
+  - yes
+- y1, z1    &nbsp;<p>&nbsp;
+  - NO: need x1, w1 or w2
+
+
+
+### Metadata
+
+Excpected size of dependency set is number of unique keys int he
+causal history.
+
+### Errata
+
+- *Callback on covergence* allows key-versions to be dropped from
+histories once seen everywhere (GC).
+- store pruning: drop keys not needed any more from local store; might require
+dropping other writes that depend on it
+
+## Notes/comments
+- Nikhil: first read is null, lots of action no reads.
+- Anubhav - assume stickiness which can slow things down under load
+- Anubhav - long-range dependencies can add up
+
+- does the shim give eventual consistency?
+- Is it possible for one to make a system that will never converge, but still
+  allow me to lazily claim eventual consistency since it's impossible to prove
+  otherwise? 
+- has it caught on? 
+- surprised that faster than eventual
+- fate-sharing, how does rebooting client know what happened?
+- scale-out: assume clients scale w/ replicas (especially since co-located)
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+# Coordination Avoidance in Database Systems
+
+When is coordination strictly necessary to maintain application-level
+consistency? Ask programmers.
+
+"Coordination can only be avoided if all local commit decisions are globally valid"
+
+
+Factoids:
+
+- invariants can be expressed as part of schema DDL
+- replicas eventually consistent
+
+Issues:
+
+- sometimes programmers are bad
+- black-and-white model: some communication might be cheap, some expensive
+
+### Invariant confluence
+
+Replica is *i-valid* iff invariant I is true.
+
+*Globally I-valid system* can execute a set of transactions T w/
+coord freedom, trans avail, convergence iff T is I-confluent wrt I.
+
+A system provides coordination-free execution for a set of
+transactions T iff the progress of executing each *t in T* is only
+dependent on the versions of the items t reads (i.e., t's replica
+state).
+
+## Example
+
+**PRIMARY KEY/UNIQUE constraints**. Assume userids should be
+unique. {Stat:5} and {Mary:5} are both I-T-reachable, but {Stan:5,
+Mary:5} is not I-valid. Therefore, uniqueness not I-confluent for
+inserts of unique values.
+
+**HOWEVER:** *reads* and *deletes* are both I-confluent under uniqueness.
+
+**AND FURTHER:** could add a function to generate unique IDs s.t. IDs from
+distinct replicas doesn't overlap, and then uniqueness would be
+I-confluent.