diff --git a/notes/causalReverse.png b/notes/causalReverse.png new file mode 100644 index 0000000000000000000000000000000000000000..184f8afadd8ae7e54e33ac14fe165558d2ea9f1d Binary files /dev/null and b/notes/causalReverse.png differ diff --git a/notes/hatSalt.md b/notes/hatSalt.md new file mode 100644 index 0000000000000000000000000000000000000000..91f97ef8dae86e6c7d3c7c052d9a64ffbe0cfa16 --- /dev/null +++ b/notes/hatSalt.md @@ -0,0 +1,262 @@ +# Highly Available + +``` + authors: Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica + title: "HAT, not CAP: Towards Highly Available Transactions" + where: HotOS 2013 +``` + +### Definitions, in the context of Partitions: + +- **high availability:** + - each user that can contact a non-failing server eventually receives + response, even in presence of arb long partitions +- **sticky availability:** + - whenever client accesses copy that reflects all prior operations, + eventually receives a response +- **transactional replica availability:** + - if T can contact at least one replica for each item +- **aborts:** + - internal + - external (due to system or operation impl) +"**dirty reads**" +- reading uncommitted +``` + T1: wx(1) wx(2) commit + T2: wx(3) + T3: rx(?) +``` +Read should not ever return '1', and shouldn't return '3' if T2 aborts + +"**dirty writes**" +A *dirty write* occurs when one transaction overwrites a value that has previously been +written by another still in-flight transaction. Why bad? Could violate consistency +guarantees. Assume invariant *x == y*: +``` + T1: wx(1) wy(1) + T2: wx(2) wy(2) +``` +Both preserve consistency in isolation, but not w/ this schedule and dirty writes. + + +## Isolation guarantees: + +"**read uncommitted**" (PL-1) +- writes to each obj totally ordered (prohibits dirty writes) +- writes *across* objects consistently ordered +- implement w/ per-trans time, *last-writer-wins* + +"**read committed**" (PL-2) +- no dirty writes, reads +- implement w/ buffers (though doesn't guarantee recency) + +"**repeatable read**" (cut (*snapshot*) isolation) +- item cut iso (multiple different values): buffer reads +- predicate cut iso (cut over "SELECT ..WHERE....") +- impl both w/ buffering + +---- +### Unachievable isolation levels +- snapshot isolation, + - read from consistent cut + - commit only if items from writeset not committed by another T since snapshot + - partition either delays or suffer lost updates +- cursor stability + - means DB holds lock on a row while accessing, and no other T can + access it during this time, repeatable read often means holding lock + on entire set of results + - violated if lost writes because of locks not reaching across partition. + - therefore not HAT (because can't prevent lost updates) +---- +### Unachievable properties +- preventing lost updates +``` + lost update (a==1) - + T1: Rx(100), Wx(100+20=120) + T2: Rx(100), Wx(100+30=130) +``` + Final value should be 150. Lost update would be 120 or 130. + W/ partition, T1 and T2 might not see each other, hence lost update. + + *Clearly impossible to prevent in dist environment.* + +- preventing write skew. Write Skew generalizes LU to multiple keys. Possible problem is +violation of consistency, such as "x == y"" +``` + T1: t = x; y = t + T2: t = y; x = t +``` +Can happen w/ snapshot isolation. + +- Serializability: + - optimistic requires global validation + - pessimistic requires global coord/locking + + + +- Katura not buying sticky avail (definitional) +- Nao - "really confusing" (yes) +- Patrick/Andrew - causal only w/ sticky (client caching breaks lots of guarantees) + + + +---- + +### HAT-compliant: + + + +---- + +# SALT + +"offering atomicity and isolation at the same granularity is the very reason why ACID +transactions are ill-equipped .....(performance vs programmability)" + +**Pareto Principle:** 80% of effects from 20% of causes + +-splitting ACID transactions up very good for concurrency + - bad for isolation +- key issue is to provide isolation at a finer granularity, same atomicity + - nested transactions give subtrans' atomicity, isolate entire thing + + + *Allow isolation to be specified at smaller granularity than atomicity* + +## Background + +ANSI iso levels: +- read-uncommitted +- read-committed (assumed goal for paper) +- repeatable read +- serializable + +Issues: +- dirty writes: overwrite uncomitted +- dirty reads: read from uncommitted +- non-repeatable reads +- phantom + +## Big Things +- BASE transactions +- salt isolation +- clever names + +## BASE transactions + - **alkaline** subtransactions + - no other transactions can see state of *uncommitted alkaline subtrans* + - **committed** alk subtrans state viewable by other BASE or alkaline transactions + - *not visible* to ACID until entire BASE commit. + - intended for *partition-local* ops + - **salt isolation** allows control of internal states visibility (among other BASE transactions) + - each alkaline sub has associated *exception* + - *BASE transactions look like ACID transactions to other ACID transactions* + - **accepted** once any alkaline trans commits, + - accepted implies commit of entire BASE + - i.e. all operations successfully executed *or bypassed because of some exception* + - aborted only if they encounter an error before the transaction is accepted (unlike ACID) + +## Salt Isolation + +*"If two operations in different transactions conflict, then the temporal dependency + that exists between the earlier and the later of these operations must extend to the + entire transaction"* (allows SALT to work w/ different isolation levels) + +*Isolation:* Let Q be the set of operation types {*read*, *range-read*, *write*} and let L and S +be subsets of Q . Further, let *o<sub>1</sub>* in *txn<sub>1</sub>* and *o<sub>2</sub>* in *txn<sub>2</sub>*, be two operations, respectively +of type *T<sub>1</sub>* ∈ L and *T<sub>2</sub>* ∈ S , that access the same object in a conflicting +(i.e. non read-read) manner. **If *o<sub>1</sub>* completes before *o<sub>2</sub>* starts, then *txn<sub>1</sub>* must decide +before *o<sub>2</sub>* starts.** + +<img src=saltSets.png width=500> + +The Isolation property holds as long as (a) at least one of txn<sub>1</sub> and +txn<sub>2</sub> is an ACID transaction or (b) both txn<sub>1</sub> and txn<sub>2</sub> are alkaline +subtransactions. + +So: +- ACID transactions isolated from all other +- Alkaline subtrans isolated from ACID and other alkaline +- BASE expose states at alkaline boundaries to other BASE + + +- Design + - locks (because high contention) + - type + - ACID - conflict with alkaline and saline + - alkaline - conflict w/ ACID and other alkaline + - saline: conflict with ACID locks (except for read/read) only + - lock duration + - *long term* (life of (sub-)trans, 2PL) + - *short-term* (just the op) + - acquire only an alkaline lock at operation start + - “downgrade†it to saline at end of subtransaction, hold until after the end of the BASE transaction. + - no multi-version concurrency + + + +## Indirect Dirty Reads +<p> + + +<p> +Fixed by: + +- **Read-after-write across transactions** A BASE transaction *B<sub>r</sub>* that reads a value x, which has been written by another *BASE<sub>w</sub>* transaction, cannot release its saline lock on x until *B<sub>w</sub>* has released its own saline lock on x. + +- **Write-after-read within a transaction** An operation *O<sub>w</sub>* that writes a value x cannot + release its saline lock on x until all previous read operations within the **same** BASE + transaction have released their saline locks on their respective objects. + +These two ensure uncommitted writes keep locks until all prior read locks also released. + +<p> + +## Forward Logging +- after BASE knows it will commit (because a subsaline committed), log entire BASE +- prevents cascades that would have occurred because of saline visibility + +### Banking, again + + + + +## Performance + + + + + + +## Comments + + + + +//===================================================================== + +# Correctness Anomalies Under Serializable Isolation + +Background: serializability guarantees an execute (a schedule) +equivalent to a serial schedule, but this serial schedule does not +have to be the ordering that actually occurred in real time (wall +clock time). + +Background: *one-copy serilizability* (1SR): +``` +...either ... will be ordered first in the equivalent serial order. Whichever transaction is second --- when it reads the balance --- it must read the value written by the first transaction +``` + +### The Immortal write +<img src=immortalWrite.png width=500> + +### The Stale Read +<img src=staleRead.png width=500> + +### The Stale Read +<img src=staleRead.png width=500> + +### The Causal Reverse +<img src=causalReverse.png width=500> + +*Strict serializability* prevents the all of these. diff --git a/notes/immortalWrite.png b/notes/immortalWrite.png new file mode 100644 index 0000000000000000000000000000000000000000..c16664567b6a93673fd57121593e673b96369892 Binary files /dev/null and b/notes/immortalWrite.png differ diff --git a/notes/salt.md b/notes/salt.md index 139b9e222c62ee59300e64e698053ec7e6b17bd8..ac699dfec95c69e429df377e76d2312b3a6be77b 100644 --- a/notes/salt.md +++ b/notes/salt.md @@ -120,3 +120,33 @@ These two ensure uncommitted writes keep locks until all prior read locks also r ## Comments + + + +//===================================================================== + +# Correctness Anomalies Under Serializable Isolation + +Background: serializability guarantees an execute (a schedule) +equivalent to a serial schedule, but this serial schedule does not +have to be the ordering that actually occurred in real time (wall +clock time). + +Background: *one-copy serilizability* (1SR): +``` +...either ... will be ordered first in the equivalent serial order. Whichever transaction is second --- when it reads the balance --- it must read the value written by the first transaction +``` + +### The Immortal write +<img src=immortalWrite.png width=500> + +### The Stale Read +<img src=staleRead.png width=500> + +### The Stale Read +<img src=staleRead.png width=500> + +### The Causal Reverse +<img src=causalReverse.png width=500> + +*Strict serializability* prevents the all of these. diff --git a/notes/salt.md~ b/notes/salt.md~ new file mode 100644 index 0000000000000000000000000000000000000000..61078849d78debea1a9229a0f864af9918875138 --- /dev/null +++ b/notes/salt.md~ @@ -0,0 +1,124 @@ +# SALT + +"offering atomicity and isolation at the same granularity is the very reason why ACID +transactions are ill-equipped .....(performance vs programmability)" + +**Pareto Principle:** 80% of effects from 20% of causes + +-splitting ACID transactions up very good for concurrency + - bad for isolation +- key issue is to provide isolation at a finer granularity, same atomicity + - nested transactions give subtrans' atomicity, isolate entire thing + +## Background + +ANSI iso levels: +- read-uncommitted +- read-committed (assumed goal for paper) +- repeatable read +- serializable + +Issues: +- dirty writes: overwrite uncomitted +- dirty reads: read from uncommitted +- non-repeatable reads +- phantom + +## Big Things +- BASE transactions +- salt isolation +- clever names + +## BASE transactions + - **alkaline** subtransactions + - no other transactions can see state of *uncommitted alkaline subtrans* + - **committed** alk subtrans state viewable by other BASE or alkaline transactions + - *not visible* (to ACID?) until entire BASE commit. + - **salt isolation** allows control of internal states visibility (among other BASE transactions) + - each alkaline sub has associated *exception* + - *BASE transactions look like ACID transactions to other ACID transactions* + - **accepted** once any alkaline trans commits, + - accepted implies commit of entire BASE + - i.e. all operations successfully executed *or bypassed because of some exception* + - aborted only if they encounter an error before the transaction is accepted (unlike ACID) + +## Isolation + +*"If two operations in different transactions conflict, then the temporal dependency + that exists between the earlier and the later of these operations must extend to the + entire transaction"* (allows SALT to work w/ different isolation levels) + +*Isolation:* Let Q be the set of operation types {*read*, *range-read*, *write*} and let L and S +be subsets of Q . Further, let $o_1$ in $txn_1$ and $o_2$ in $txn_2$, be two operations, respectively +of type $T_1$ ∈ L and $T_2$ ∈ S , that access the same object in a conflicting +(i.e. non read-read) manner. **If $o_1$ completes before $o_2$ starts, then $txn_1$ must decide +before $o_2$ starts.** + +Isolation property holds if at least one is ACID, or both are alkaline. + + + + + +## Salt Isolation +**Salt Isolation**: +The Isolation property holds as long as (a) at least one of txn<sub>1</sub> and +txn<sub>2</sub> is an ACID transaction or (b) both txn<sub>1</sub> and txn<sub>2</sub> are alkaline +subtransactions. + +So: +- ACID transactions isolated from all other +- Alkaline subtrans isolated from ACID and other alkaline +- BASE expose states at alkaline boundaries to other BASE + + +- Design + - locks (because high contention) + - type + - ACID - conflict with alkaline and saline + - alkaline - conflict w/ ACID and other alkaline + - saline: conflict with ACID locks (except for read/read) only + - lock duration + - *long term* (life of trans, 2PL) + - *short-term* (just the op) + - acquire only an alkaline lock at operation start + - “downgrade†it to saline at end of subtransaction, hold until after the end of the BASE transaction. + - no multi-version concurrency + + + +## Indirect Dirty Reads +<p> + + +<p> +Fixed by: + +- **Read-after-write across transactions** A BASE transaction B<sub>r</sub> that reads a value x, which has been written by another BASE<sub>w</sub> transaction, cannot release its saline lock on x until B<sub>w</sub> has released its own saline lock on x. + +- **Write-after-read within a transaction** An operation O<sub>w</sub> that writes a value x cannot + release its saline lock on x until all previous read operations within the **same** BASE + transaction have released their saline locks on their respective objects. + +These two ensure uncommitted writes keep locks until all prior read locks also released. + +<p> + +## Forward Logging +- after BASE knows it will commit (because a subsaline committed), log entire BASE +- prevents cascades that would have occurred because of saline visibility + +### Banking, again + + + + +## Performance + + + + + + +## Comments + diff --git a/notes/staleRead.png b/notes/staleRead.png new file mode 100644 index 0000000000000000000000000000000000000000..9d8a0bb23895aba59e504000ebf4e8f66d26c94a Binary files /dev/null and b/notes/staleRead.png differ