Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
cb.md 8.42 KiB

Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

Contributions:

  • causal+
  • linearizable within cluster
    • replicated keys using chain replication
  • causal across clusters
    • bolt-on-ish
  • use of explicit dependencies (before bolt-on)
  • get (read) transactions

arch

COPS

  • individual accesses
  • explicit tracking of dependences
    • nearest
    • contexts (threads, or explicit)
    • Lamport Clocks
      • version is CLOCK.nodeID
        • convergent, not necessarily eventual

deps

  • "context" implemented incrementally as set of nearest neighbors
    • read:
      • return value added to context
    • write:
      • context added to write
      • context zero'd out
      • context set to new write
  • propagated with writes to remote clusters
  • write installed at new cluster by
    • depCheck()s to owner of each write in "nearest"
    • depCheck() blocks until true
    • depCheck() re-issued on time-out

COPS-GT

causal not always enough

Straightforward problems

Alice

s0: ACL == "world"
p0:"all is good"

s1: change ACL to exclude Boss
p2: "hate my job"

Assume reads implemented by ACL check followed by read

Boss:

  • check security: see s0, which includes Boss
  • read returns p2
  • Alice is fired

TOCTOU vulnerability (time-of-check-to-time-of-use)

Another implementation

Might consider re-implementing library to issue GETs in reverse order, checking "nearest" as you go

  • works for above example (p2 depends on s1)

But if Alice adds a few more actions:

p3: delete last post
s2: add Boss back to ACL

...the reverse order doesn't help

  • could return p2, s2 (both possibly latest version, causally consistent individually when issued)

A causally-consistent set of reads

s0, p0: "world" / "all is good" s1, p0: "no-boss" / "all is good" s1, p2: "no-boss" / "hate job" s1, p3: "no-boss" / "all is good" s2, p3: "world" / "all is good"

GT implementation

  • "all dependences"
    • really means one-per-key that have been locally modified
    • "nearest" still big win over this because there might be many keys
  • individual writes not coordinated
  • by keeping all dependences, not just nearest
  • clearly need GC

deps

Notes

  • Kaitlyn - just measured against themselves
  • Elliot - ALPS
  • Huayang - CAP, what does paxos give up?
  • Nikhil - when not convergent?

Bolt-On Consistency

Fun scenario

  • Sally posts "I think Billy is missing!"
  • Seconds later, Billy calls mom, says "Safe!", Sally posts "False alarm!"
  • James posts "What a relief!"

Henry sees JUST original post and James', is confused.

Shim layer to support causal consistency over weaker (eventually-consistent) systems.

arch

Design and Implementation

Write path:

  • Client write performed in local store, send to ECDS w/ metadata.

Read path (causal):

  • Client read satistfied by local store

Read path (pessimistic):

  • synchronous checking on data, and dependencies, in the ECDS, rather than relying on async resolve process to do this for us.

Motivation

  • Several NoSQL stores have added stronger consistency, implying need: Vogels: Choosing consistency..
  • HAT work shows causal even in the face of partitions
  • allows purely local reads (so does eventual!)
  • no production-ready store provides it

Challenges

  • handling overwritten histories:
    • eventually-consistent stores often choose a single winning write, existing cc algs fail because the cc layer might not see all writes
  • maintaining availability

Innovations

  • alg that buffers writes and ensures writes satisfy causal cut.

Causality

Differentiate between:

  • potential causality: what we usually think of, and which they do not consider further, and
  • explicit causality, i.e. tags provided by app

Bolt-on separation of concerns

  • safety properties (causal shim) from
  • liveness (eventually consistent data store (ECDS))

Bolt-on

Operations:

  • get(key)
  • put(key, value, after), where after is any set of previously returned writes (identifiers)

Consistency cases

Let's assume x1 -> y1 (x1 existed in store when/where y1 was created). So any replica reading y1 should have (or have had) x1 as well. Assume x1, y1 created by different processes, and discuss what happens at P when it wants to read y:

  1. P has not received any writes
  2. P has received x1, but not y1
  3. P has received y1, but not x1
  4. P has received both  

       

       

       

       

       

       

       

     

Answers:

  1. P has not received and writes, returns y=null
  2. P has received x1, but not y1. x1 put in local store, returns y=null.
  3. P has received y1, but not x1. P buffers y1, returns y=null.
  4. P has received both, both put in local store, returns y=y1.

The fun case is 3. We could just go ahead and let P read y1, but we'd have to remember that, and stall the process if it subsequently reads x.

  • this means we give up availability (consider network partition)

Overwritten Histories

The following is the motivation to NOT just use the underlying store to hold incoming writes directly.

Example (concurrent):

  • P1 writes x1 -> y1 -> z1
  • P2 writes y2

Assume shim stores immediate dependencies. Assume P has received z1, and then y2. P knows did know that it needed y1, but doesn't know about x1. The overwrite by y2 loses the dependency on x1, and the transitive dependency x1 -> z11 is also lost.

Causal Cuts

Causal cut (paraphrase)
A set of key-versions, at most one per key, such that all dependencies of any key are either part of the cut, or concurrent with the cut's version of that key.

Example: arch

Some causal cuts:

  • w1, x1, y1  

     

    • yes
  • x1, w2, y1  

     

    • yes
  • w1, x1, y1, z1  

     

    • yes
  • x1, z1  

     

    • NO: need y1, w1 or w2
  • w2, x1, y1, z1  

     

    • yes
  • w1  

     

    • yes
  • w1, z1  

     

    • NO: need x1, y1
  • w2  

     

    • yes
  • w1, x1  

     

     

    • yes
  • y1, z1  

     

    • NO: need x1, w1 or w2

Metadata

Excpected size of dependency set is number of unique keys int he causal history.

Errata

  • Callback on covergence allows key-versions to be dropped from histories once seen everywhere (GC).
  • store pruning: drop keys not needed any more from local store; might require dropping other writes that depend on it

Notes/comments