Snippets Groups Projects

1 year ago
38089991

auto · 38089991
Peter J. Keleher authored 1 year ago

38089991

History

auto
Peter J. Keleher authored 1 year ago

Code owners

Assign users and groups as approvers for specific file changes. Learn more.

cb.md 8.42 KiB

Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

Contributions:

causal+
linearizable within cluster
- replicated keys using chain replication
causal across clusters
- bolt-on-ish
use of explicit dependencies (before bolt-on)
get (read) transactions

COPS

individual accesses
explicit tracking of dependences
- nearest
- contexts (threads, or explicit)
- Lamport Clocks
  - version is CLOCK.nodeID
    - convergent, not necessarily eventual

"context" implemented incrementally as set of nearest neighbors
- read:
  - return value added to context
- write:
  - context added to write
  - context zero'd out
  - context set to new write
propagated with writes to remote clusters
write installed at new cluster by
- depCheck()s to owner of each write in "nearest"
- depCheck() blocks until true
- depCheck() re-issued on time-out

COPS-GT

causal not always enough

Straightforward problems

Alice

s0: ACL == "world"
p0:"all is good"

s1: change ACL to exclude Boss
p2: "hate my job"

Assume reads implemented by ACL check followed by read

Boss:

check security: see s0, which includes Boss
read returns p2
Alice is fired

TOCTOU vulnerability (time-of-check-to-time-of-use)

Another implementation

Might consider re-implementing library to issue GETs in reverse order, checking "nearest" as you go

works for above example (p2 depends on s1)

But if Alice adds a few more actions:

p3: delete last post
s2: add Boss back to ACL

...the reverse order doesn't help

could return p2, s2 (both possibly latest version, causally consistent individually when issued)

A causally-consistent set of reads

s0, p0: "world" / "all is good" s1, p0: "no-boss" / "all is good" s1, p2: "no-boss" / "hate job" s1, p3: "no-boss" / "all is good" s2, p3: "world" / "all is good"

GT implementation

"all dependences"
- really means one-per-key that have been locally modified
- "nearest" still big win over this because there might be many keys
individual writes not coordinated
by keeping all dependences, not just nearest
clearly need GC

Notes

Kaitlyn - just measured against themselves
Elliot - ALPS
Huayang - CAP, what does paxos give up?
Nikhil - when not convergent?

Bolt-On Consistency

Fun scenario

Sally posts "I think Billy is missing!"
Seconds later, Billy calls mom, says "Safe!", Sally posts "False alarm!"
James posts "What a relief!"

Henry sees JUST original post and James', is confused.

Shim layer to support causal consistency over weaker (eventually-consistent) systems.

Design and Implementation

Write path:

Client write performed in local store, send to ECDS w/ metadata.

Read path (causal):

Client read satistfied by local store

Read path (pessimistic):

synchronous checking on data, and dependencies, in the ECDS, rather than relying on async resolve process to do this for us.

Motivation

Several NoSQL stores have added stronger consistency, implying need: Vogels: Choosing consistency..
HAT work shows causal even in the face of partitions
allows purely local reads (so does eventual!)
no production-ready store provides it

Challenges

handling overwritten histories:
- eventually-consistent stores often choose a single winning write, existing cc algs fail because the cc layer might not see all writes
maintaining availability

Innovations

alg that buffers writes and ensures writes satisfy causal cut.

Causality

Differentiate between:

potential causality: what we usually think of, and which they do not consider further, and
explicit causality, i.e. tags provided by app

Bolt-on separation of concerns

safety properties (causal shim) from
liveness (eventually consistent data store (ECDS))

Bolt-on

Operations:

get(key)
put(key, value, after), where after is any set of previously returned writes (identifiers)

Consistency cases

Let's assume x1 -> y1 (x1 existed in store when/where y1 was created). So any replica reading y1 should have (or have had) x1 as well. Assume x1, y1 created by different processes, and discuss what happens at P when it wants to read y:

P has not received any writes
P has received x1, but not y1
P has received y1, but not x1
P has received both

Answers:

P has not received and writes, returns y=null
P has received x1, but not y1. x1 put in local store, returns y=null.
P has received y1, but not x1. P buffers y1, returns y=null.
P has received both, both put in local store, returns y=y1.

The fun case is 3. We could just go ahead and let P read y1, but we'd have to remember that, and stall the process if it subsequently reads x.

this means we give up availability (consider network partition)

Overwritten Histories

The following is the motivation to NOT just use the underlying store to hold incoming writes directly.

Example (concurrent):

P1 writes x1 -> y1 -> z1
P2 writes y2

Assume shim stores immediate dependencies. Assume P has received z1, and then y2. P knows did know that it needed y1, but doesn't know about x1. The overwrite by y2 loses the dependency on x1, and the transitive dependency x1 -> z11 is also lost.

Causal Cuts

Causal cut (paraphrase): A set of key-versions, at most one per key, such that all dependencies of any key are either part of the cut, or concurrent with the cut's version of that key.

Example:

Some causal cuts:

w1, x1, y1
- yes
x1, w2, y1
- yes
w1, x1, y1, z1
- yes
x1, z1
- NO: need y1, w1 or w2
w2, x1, y1, z1
- yes
w1
- yes
w1, z1
- NO: need x1, y1
w2
- yes
w1, x1
- yes
y1, z1
- NO: need x1, w1 or w2

Metadata

Excpected size of dependency set is number of unique keys int he causal history.

Errata

Callback on covergence allows key-versions to be dropped from histories once seen everywhere (GC).
store pruning: drop keys not needed any more from local store; might require dropping other writes that depend on it

Notes/comments