auto

7a58bc38 · Peter J. Keleher · 14c0b4f3 · 7a58bc38 · 7a58bc38 · 7a58bc38
Commit 7a58bc38 authored 3 years ago by Peter J. Keleher
--- a/notes/lin1.png
+++ b/notes/lin1.png
--- a/notes/lin2.png
+++ b/notes/lin2.png
--- a/notes/linCrash.png
+++ b/notes/linCrash.png
--- a/notes/linRam.md
+++ b/notes/linRam.md
+# Implementing Linearizability at Large Scale and Low Latency
+## RAMCloud
+- basically all RAM
+- "durable writes" get replicated in other RAM
+- log-structured, *cleaner* etc.
+- massively parallel, 5 usec end-to-end RPCs.
+**Linearizability**:
+- collection of operations is *linearizable* if each operation appears to occur **instantaneously** and exactly once at **some point in time between its invocation and its completion**.
+- **must not be possible** for any client of the system, either the one initiating an operation or other clients operating concurrently, **to observe contradictory behavior**
+Good:
+![good](lin1.png)
+Bad:
+![lin](lin2.png)
+Retry of idempotent operation bad:
+![lin](linCrash.png)
+Bottom line:
+- *at-least-once* bad
+- *exactly-once* good
+  - detect and stop retry of completed op
+  - return same value as first execution
+## Architecture
+RPC's needed so that clients can be notified of completed operations.
+Problems / solutions:
+- RPC identification
+  - globally unique names
+- **completion record** durability
+  - must be atomic wrt the actual mutation
+  - store completion record w/ object
+- retry rendezvous
+  - find record, even if sent to different server
+  - store completion record w/ object
+  - for transactions, pick one datum
+- garbage collection: when we know request will never be retried
+  - after client acks response
+  - after client crashes
+## Client failure detection
+- leases
+  - must renew
+  - essentially a heartbeat
+  - want to scale to **a million clients**???!
+## Lifetime of an RPC
+When received by server:
+- `checkDuplicate()` in ResultTracker (on *server*)
+  - normal case returns new, proceeds
+  - completed previously, returns previous value
+  - in-progress (toss the request or nack the client)
+  - stale retry - error to client
+- *normal case of new RPC*
+  - execute the RPC
+  - create completion record
+  - return to client
+- *asssumes some local durable log*
+## Design
+`RequestTracker` (on client)
+- tracks active RPCs
+- `firstIncomplete` sequence number added to outgoing RPCs to servers
+  - server deletes records for earlier RPCs
+- only 512 outstanding RPCs from single client
+`LeaseTracker` (on clients and servers)
+`ResultTracker` (on server)
+### Lifetime of RPC
+- new RPC - unique identifier using client ID w/ new seq num from RequestTracker (from server?)
+- server calls `checkDuplicate`: new / completed / in_progress / stale
+- server executes
+  1. creates RPC identifier
+  1. creates object identifier (completion record w/ migrate w/ object)
+  1. result returns
+  1. operation side effects and completion record **appended to a log atomically**
+  1. `recordCompletion` on ResultTracker, system-dependent
+  1. return result to client
+### Lease management
+- Zookeeper
+- renewal overhead
+  - low because *stable storage not updated on renewal*
+- validation overhead
+  - in-memory
+  - *cluster clock* for server's to do most lease validation
+  - ask lease server only when close to expiration
+## Transactions with RIFL (RAMCloud implementation)
+- sinfonia-ish two-phase commit:
+  - `prepare` (version/lock checks)
+  - `decision` (this phase in background, client already notified)
+     ("the transaction is effectively committed once a durable lock record has been written for each of the objects")
+- updates deferred until commit request
+- reads executed normally, *versions recorded*
+- writes on commit phase, possibly w/ expected version
+  - fail *if version-check fails, or locked by another transaction*
+- fast case
+  - *single server* owns all objects in transaction
+  - *read-only*, even in distributed case only a single round
+- on client crash, recovery coordinator finishes, hoping to abort unless already committed
+## Issues
+- worried that storing completion records will not scale well
+  - local operation
+  - either disk or (for RAMCloud) replicating elsewhere
+- why optional version checking in transactions? (atomic operation primitive)
+## Questions/comments