From 380899916d50095e986ea5451c6b5122cbe5ab98 Mon Sep 17 00:00:00 2001 From: "Peter J. Keleher" <keleher@cs.umd.edu> Date: Thu, 5 Oct 2023 13:56:07 -0400 Subject: [PATCH] auto --- notes/cb.md | 354 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 354 insertions(+) create mode 100644 notes/cb.md diff --git a/notes/cb.md b/notes/cb.md new file mode 100644 index 0000000..cc090ec --- /dev/null +++ b/notes/cb.md @@ -0,0 +1,354 @@ +# Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS + +## Contributions: + +- causal+ +- linearizable within cluster + - replicated keys using `chain replication` +- causal across clusters + - bolt-on-ish +- use of explicit dependencies (before bolt-on) +- get (read) transactions + + + + +## COPS + +- individual accesses +- explicit tracking of dependences + - *nearest* + - contexts (threads, or explicit) + - Lamport Clocks + - version is CLOCK.nodeID + - convergent, not necessarily eventual + + + +- "context" implemented incrementally as set of nearest neighbors + - read: + - return value added to context + - write: + - context added to write + - context zero'd out + - context set to new write +- propagated with writes to remote clusters +- write installed at new cluster by + - `depCheck()`s to owner of each write in "nearest" + - `depCheck()` blocks until true + - `depCheck()` re-issued on time-out + +## COPS-GT + +*causal not always enough* + +### Straightforward problems +Alice +``` +s0: ACL == "world" +p0:"all is good" + +s1: change ACL to exclude Boss +p2: "hate my job" +``` +Assume reads implemented by ACL check followed by read + +Boss: +- check security: see s0, which includes Boss +- read returns p2 +- Alice is fired + +*TOCTOU* vulnerability (time-of-check-to-time-of-use) + +### Another implementation +Might consider re-implementing library to issue GETs in reverse order, checking "nearest" as you go +- works for above example (p2 depends on s1) + +But if Alice adds a few more actions: +``` +p3: delete last post +s2: add Boss back to ACL +``` + +...the reverse order doesn't help +- could return p2, s2 (both possibly latest version, causally consistent individually when issued) + + +### A causally-consistent set of reads +s0, p0: "world" / "all is good" +s1, p0: "no-boss" / "all is good" +s1, p2: "no-boss" / "hate job" +s1, p3: "no-boss" / "all is good" +s2, p3: "world" / "all is good" + + +## GT implementation +- "all dependences" + - really means one-per-key that have been locally modified + - "nearest" still big win over this because there might be *many* keys +- individual writes not coordinated +- by keeping **all** dependences, not just nearest +- clearly need GC + + + + +## Notes +- Kaitlyn - just measured against themselves +- Elliot - ALPS +- Huayang - CAP, what does paxos give up? +- Nikhil - when not convergent? +# Bolt-On Consistency + +### Fun scenario + +- Sally posts "I think Billy is missing!" +- Seconds later, Billy calls mom, says "Safe!", Sally posts "False +alarm!" +- James posts "What a relief!" + +Henry sees JUST original post and James', is confused. + +**Shim layer to support causal consistency over weaker (eventually-consistent) systems.** + + + +### Design and Implementation + +Write path: +- Client write performed in **local store**, send to ECDS w/ metadata. + +Read path (causal): +- Client read satistfied by local store + +Read path (pessimistic): +- synchronous checking on data, and dependencies, in the ECDS, rather than relying +on async *resolve* process to do this for us. + +### Motivation + +- Several NoSQL stores have added stronger consistency, implying need: +[Vogels: Choosing consistency.](http://www.allthingsdistributed.com/2010/02/). +- HAT work shows causal even in the face of partitions +- allows purely local reads (so does eventual!) +- no production-ready store provides it + +### Challenges + +- handling overwritten histories: + - eventually-consistent stores often choose + a single winning write, existing cc algs fail because + the cc layer might not see all writes +- maintaining availability + +### Innovations + +- alg that buffers writes and ensures writes satisfy *causal cut*. + +### Causality + +Differentiate between: + +- **potential causality**: what we usually think of, and *which they do +not consider further*, and +- **explicit causality**, i.e. tags provided by app + +### Bolt-on *separation of concerns* + +- safety properties (causal *shim*) from +- liveness (eventually consistent data store (ECDS)) + +### Bolt-on + +Operations: + +- **get(key)** +- **put(key, value, after)**, where *after* is any set of previously + returned writes (identifiers) + +### Consistency cases + +Let's assume x1 -> y1 (x1 existed in store when/where y1 was +created). So any replica reading y1 should have (or have had) x1 as +well. Assume x1, y1 created by different processes, and discuss +what happens at P when it wants to read *y*: + +1. P has not received any writes +2. P has received x1, but not y1 +3. P has received y1, but not x1 +4. P has received both + <p> + <p> + <p> + <p> + <p> + <p> + <p> + <p> + + +Answers: +1. P has not received and writes, returns y=null +2. P has received x1, but not y1. x1 put in local store, returns y=null. +3. P has received y1, but not x1. P buffers y1, returns y=null. +4. P has received both, both put in local store, returns y=y1. + +The fun case is 3. We could just go ahead and let P read *y1*, but +we'd have to remember that, and stall the process if it subsequently +reads *x*. +- this means we give up availability (consider network partition) + + +### Overwritten Histories + +The following is the motivation to NOT just use the underlying store +to hold incoming writes directly. + +Example (concurrent): + +- P1 writes x1 -> y1 -> z1 +- P2 writes y2 + +Assume shim stores immediate dependencies. Assume *P* has received `z1`, +and then `y2`. *P* knows did know that it needed `y1`, but doesn't know +about `x1`. The overwrite by `y2` loses the dependency on `x1`, and the +transitive dependency `x1 -> z11` is also lost. + +### Causal Cuts + +Causal cut (paraphrase) +: A set of key-versions, at most one per key, such that all +dependencies of any key are either part of the cut, or concurrent with the +cut's version of that key. + +Example: + + +Some causal cuts: + +- w1, x1, y1 <p> + - yes +- x1, w2, y1 <p> + - yes +- w1, x1, y1, z1 <p> + - yes +- x1, z1 <p> + - NO: need y1, w1 or w2 +- w2, x1, y1, z1 <p> + - yes +- w1 <p> + - yes +- w1, z1 <p> + - NO: need x1, y1 +- w2 <p> + - yes +- w1, x1 <p> <p> + - yes +- y1, z1 <p> + - NO: need x1, w1 or w2 + + + +### Metadata + +Excpected size of dependency set is number of unique keys int he +causal history. + +### Errata + +- *Callback on covergence* allows key-versions to be dropped from +histories once seen everywhere (GC). +- store pruning: drop keys not needed any more from local store; might require +dropping other writes that depend on it + +## Notes/comments +- Nikhil: first read is null, lots of action no reads. +- Anubhav - assume stickiness which can slow things down under load +- Anubhav - long-range dependencies can add up + +- does the shim give eventual consistency? +- Is it possible for one to make a system that will never converge, but still + allow me to lazily claim eventual consistency since it's impossible to prove + otherwise? +- has it caught on? +- surprised that faster than eventual +- fate-sharing, how does rebooting client know what happened? +- scale-out: assume clients scale w/ replicas (especially since co-located) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +# Coordination Avoidance in Database Systems + +When is coordination strictly necessary to maintain application-level +consistency? Ask programmers. + +"Coordination can only be avoided if all local commit decisions are globally valid" + + +Factoids: + +- invariants can be expressed as part of schema DDL +- replicas eventually consistent + +Issues: + +- sometimes programmers are bad +- black-and-white model: some communication might be cheap, some expensive + +### Invariant confluence + +Replica is *i-valid* iff invariant I is true. + +*Globally I-valid system* can execute a set of transactions T w/ +coord freedom, trans avail, convergence iff T is I-confluent wrt I. + +A system provides coordination-free execution for a set of +transactions T iff the progress of executing each *t in T* is only +dependent on the versions of the items t reads (i.e., t's replica +state). + +## Example + +**PRIMARY KEY/UNIQUE constraints**. Assume userids should be +unique. {Stat:5} and {Mary:5} are both I-T-reachable, but {Stan:5, +Mary:5} is not I-valid. Therefore, uniqueness not I-confluent for +inserts of unique values. + +**HOWEVER:** *reads* and *deletes* are both I-confluent under uniqueness. + +**AND FURTHER:** could add a function to generate unique IDs s.t. IDs from +distinct replicas doesn't overlap, and then uniqueness would be +I-confluent. -- GitLab