From 26440eb0594518aed33b0ae621b0b2a9a0ea8aa6 Mon Sep 17 00:00:00 2001 From: "Peter J. Keleher" <keleher@cs.umd.edu> Date: Tue, 31 Oct 2023 09:26:48 -0400 Subject: [PATCH] auto --- notes/ram.md | 136 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 126 insertions(+), 10 deletions(-) diff --git a/notes/ram.md b/notes/ram.md index 4ee086d..4919bfd 100644 --- a/notes/ram.md +++ b/notes/ram.md @@ -34,7 +34,6 @@ Flash latencies:  ----- ---- # Fast Crash Recovery in RAMCloud @@ -119,14 +118,131 @@ Concurrency: **End-to-end recovery times of 1-2 seconds for 35 GB.** -# Comments -- (katura) how locality w/ hashed keys? -- (katura) what when master reboots and rejoins? -- (rebecca) avail from fast recovery rather than weakened consistency -- "if crashes happen infrequently" -- need to keep that map (which would probably need anyway) -- cleaning +# Interesting notes +- scattered backups increase possibility of data lost (vs all segment backups on same replicas (think of probability that that system w/ 2 backups, all three of those would fail, vs three random failures affecting scattered backups. +- avail from fast recovery rather than weakened consistency + +# Linearizability + +# Implementing Linearizability at Large Scale and Low Latency + +## RAMCloud +- basically all RAM +- "durable writes" get replicated in other RAM +- log-structured, *cleaner* etc. +- massively parallel, 5 usec end-to-end RPCs. + +**Linearizability**: +- collection of operations is *linearizable* if each operation appears to occur **instantaneously** and exactly once at **some point in time between its invocation and its completion**. +- **must not be possible** for any client of the system, either the one initiating an operation or other clients operating concurrently, **to observe contradictory behavior** + +Good: + + +Bad: + + +Retry of idempotent operation bad: + + +Bottom line: +- *at-least-once* bad +- *exactly-once* good + - detect and stop retry of completed op + - return same value as first execution + + +## Architecture +RPC's needed so that clients can be notified of completed operations. + +Problems / solutions: +- RPC identification + - globally unique names +- **completion record** durability + - must be atomic wrt the actual mutation + - store completion record w/ object +- retry rendezvous + - find record, even if sent to different server + - store completion record w/ object + - for transactions, pick one datum +- garbage collection: when we know request will never be retried + - after client acks response + - after client crashes + +## Client failure detection +- leases + - must renew + - essentially a heartbeat + - want to scale to **a million clients**???! + +## Lifetime of an RPC + +When received by server: +- `checkDuplicate()` in ResultTracker (on *server*) + - normal case returns new, proceeds + - completed previously, returns previous value + - in-progress (toss the request or nack the client) + - stale retry - error to client +- *normal case of new RPC* + - execute the RPC + - create completion record + - return to client +- *asssumes some local durable log* + +## Design + +`RequestTracker` (on client) +- tracks active RPCs +- `firstIncomplete` sequence number added to outgoing RPCs to servers + - server deletes records for earlier RPCs +- only 512 outstanding RPCs from single client + +`LeaseTracker` (on clients and servers) + +`ResultTracker` (on server) + + +### Lifetime of RPC +- new RPC - unique identifier using client ID w/ new seq num from RequestTracker (from server?) +- server calls `checkDuplicate`: new / completed / in_progress / stale +- server executes + 1. creates RPC identifier + 1. creates object identifier (completion record w/ migrate w/ object) + 1. result returns + 1. operation side effects and completion record **appended to a log atomically** + 1. `recordCompletion` on ResultTracker, system-dependent + 1. return result to client + +### Lease management +- Zookeeper +- renewal overhead + - low because *stable storage not updated on renewal* +- validation overhead + - in-memory + - *cluster clock* for server's to do most lease validation + - ask lease server only when close to expiration + +## Transactions with RIFL (RAMCloud implementation) +- sinfonia-ish two-phase commit: + - `prepare` (version/lock checks) + - `decision` (this phase in background, client already notified) + ("the transaction is effectively committed once a durable lock record has been written for each of the objects") +- updates deferred until commit request +- reads executed normally, *versions recorded* +- writes on commit phase, possibly w/ expected version + - fail *if version-check fails, or locked by another transaction* +- fast case + - *single server* owns all objects in transaction + - *read-only*, even in distributed case only a single round +- on client crash, recovery coordinator finishes, hoping to abort unless already committed + +## Issues +- worried that storing completion records will not scale well + - local operation + - either disk or (for RAMCloud) replicating elsewhere +- why optional version checking in transactions? (atomic operation primitive) + + +## Questions/comments -- tablet profiles -- it it used? -- GitLab