From 26440eb0594518aed33b0ae621b0b2a9a0ea8aa6 Mon Sep 17 00:00:00 2001
From: "Peter J. Keleher" <keleher@cs.umd.edu>
Date: Tue, 31 Oct 2023 09:26:48 -0400
Subject: [PATCH] auto

---
 notes/ram.md | 136 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 126 insertions(+), 10 deletions(-)

diff --git a/notes/ram.md b/notes/ram.md
index 4ee086d..4919bfd 100644
--- a/notes/ram.md
+++ b/notes/ram.md
@@ -34,7 +34,6 @@ Flash latencies:
 ![ram3](ram3.png)
 
 
-----
 ----
 
 # Fast Crash Recovery in RAMCloud
@@ -119,14 +118,131 @@ Concurrency:
 **End-to-end recovery times of 1-2 seconds for 35 GB.**
 
 
-# Comments
-- (katura) how locality w/ hashed keys?
-- (katura) what when master reboots and rejoins?
-- (rebecca) avail from fast recovery rather than weakened consistency
-- "if crashes happen infrequently"
-- need to keep that map (which would probably need anyway)
-- cleaning
+# Interesting notes
+- scattered backups increase possibility of data lost (vs all segment backups on same replicas (think of probability that that system w/ 2 backups, all three of those would fail, vs three random failures affecting scattered backups.
+- avail from fast recovery rather than weakened consistency
+
+# Linearizability
+
+# Implementing Linearizability at Large Scale and Low Latency
+
+## RAMCloud
+- basically all RAM
+- "durable writes" get replicated in other RAM
+- log-structured, *cleaner* etc.
+- massively parallel, 5 usec end-to-end RPCs.
+
+**Linearizability**:
+- collection of operations is *linearizable* if each operation appears to occur **instantaneously** and exactly once at **some point in time between its invocation and its completion**.
+- **must not be possible** for any client of the system, either the one initiating an operation or other clients operating concurrently, **to observe contradictory behavior**
+
+Good:
+![good](lin1.png)
+
+Bad:
+![lin](lin2.png)
+
+Retry of idempotent operation bad:
+![lin](linCrash.png)
+
+Bottom line:
+- *at-least-once* bad
+- *exactly-once* good
+  - detect and stop retry of completed op
+  - return same value as first execution
+
+
+## Architecture
+RPC's needed so that clients can be notified of completed operations.
+
+Problems / solutions:
+- RPC identification
+  - globally unique names
+- **completion record** durability
+  - must be atomic wrt the actual mutation
+  - store completion record w/ object
+- retry rendezvous
+  - find record, even if sent to different server
+  - store completion record w/ object
+  - for transactions, pick one datum
+- garbage collection: when we know request will never be retried
+  - after client acks response
+  - after client crashes
+
+## Client failure detection
+- leases
+  - must renew
+  - essentially a heartbeat
+  - want to scale to **a million clients**???!
+
+## Lifetime of an RPC
+
+When received by server:
+- `checkDuplicate()` in ResultTracker (on *server*)
+  - normal case returns new, proceeds
+  - completed previously, returns previous value
+  - in-progress (toss the request or nack the client)
+  - stale retry - error to client
+- *normal case of new RPC*
+  - execute the RPC
+  - create completion record
+  - return to client
+- *asssumes some local durable log*
+  
+## Design
+
+`RequestTracker` (on client)
+- tracks active RPCs
+- `firstIncomplete` sequence number added to outgoing RPCs to servers
+  - server deletes records for earlier RPCs
+- only 512 outstanding RPCs from single client
+
+`LeaseTracker` (on clients and servers)
+
+`ResultTracker` (on server)
+
+
+### Lifetime of RPC
+- new RPC - unique identifier using client ID w/ new seq num from RequestTracker (from server?)
+- server calls `checkDuplicate`: new / completed / in_progress / stale
+- server executes
+  1. creates RPC identifier
+  1. creates object identifier (completion record w/ migrate w/ object)
+  1. result returns
+  1. operation side effects and completion record **appended to a log atomically**
+  1. `recordCompletion` on ResultTracker, system-dependent
+  1. return result to client
+
+### Lease management
+- Zookeeper
+- renewal overhead
+  - low because *stable storage not updated on renewal*
+- validation overhead
+  - in-memory
+  - *cluster clock* for server's to do most lease validation
+  - ask lease server only when close to expiration
+
+## Transactions with RIFL (RAMCloud implementation)
+- sinfonia-ish two-phase commit:
+  - `prepare` (version/lock checks)
+  - `decision` (this phase in background, client already notified)
+     ("the transaction is effectively committed once a durable lock record has been written for each of the objects")
+- updates deferred until commit request
+- reads executed normally, *versions recorded*
+- writes on commit phase, possibly w/ expected version
+  - fail *if version-check fails, or locked by another transaction*
+- fast case
+  - *single server* owns all objects in transaction
+  - *read-only*, even in distributed case only a single round
+- on client crash, recovery coordinator finishes, hoping to abort unless already committed
+
+## Issues
+- worried that storing completion records will not scale well
+  - local operation
+  - either disk or (for RAMCloud) replicating elsewhere
+- why optional version checking in transactions? (atomic operation primitive)
+
+
+## Questions/comments
 
 
-- tablet profiles
-- it it used?
-- 
GitLab