# Highly Available

```
 authors: Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica
    title: "HAT, not CAP: Towards Highly Available Transactions"
    where: HotOS 2013
```

### Definitions, in the context of Partitions:

- **high availability:**
  - each user that can contact a non-failing server eventually receives
    response, even in presence of arb long partitions
- **sticky availability:**
  - whenever client accesses copy that reflects all prior operations,
    eventually receives a response
- **transactional replica availability:**
  - if T can contact at least one replica for each item
- **aborts:**
    - internal
    - external (due to system or operation impl)
"**dirty reads**"
- reading uncommitted
```
     T1: wx(1) wx(2) commit
     T2: wx(3)
     T3: rx(?)
```
Read should not ever return '1', and shouldn't return '3' if T2 aborts

"**dirty writes**"
A *dirty write* occurs when one transaction overwrites a value that has previously been
written by another still in-flight transaction. Why bad? Could violate consistency
guarantees. Assume invariant *x == y*:
```
     T1: wx(1)            wy(1)
     T2:      wx(2) wy(2)
```
Both preserve consistency in isolation, but not w/ this schedule and dirty writes.	 


## Isolation guarantees:

"**read uncommitted**" (PL-1)
- writes to each obj totally ordered (prohibits dirty writes)
- writes *across* objects consistently ordered
- implement w/ per-trans time, *last-writer-wins*

"**read committed**" (PL-2)
- no dirty writes, reads
- implement w/ buffers (though doesn't guarantee recency)

"**repeatable read**"  (cut (*snapshot*) isolation)
- item cut iso (multiple different values): buffer reads
- predicate cut iso   (cut over "SELECT ..WHERE....")
- impl both w/ buffering

----
### Unachievable isolation levels
- snapshot isolation,
  - read from consistent cut
  - commit only if items from writeset not committed by another T since snapshot
  - partition either delays or suffer lost updates
- cursor stability
  - means DB holds lock on a row while accessing, and no other T can
    access it during this time, repeatable read often means holding lock
    on entire set of results
  - violated if lost writes because of locks not reaching across partition.
  - therefore not HAT (because can't prevent lost updates)
----
### Unachievable properties
- preventing lost updates
```
     lost update (a==1) -
     T1: Rx(100), Wx(100+20=120)
     T2: Rx(100), Wx(100+30=130)
```
  Final value should be 150. Lost update would be 120 or 130.
  W/ partition, T1 and T2 might not see each other, hence lost update.

  *Clearly impossible to prevent in dist environment.*
  
- preventing write skew.  Write Skew generalizes LU to multiple keys. Possible problem is
violation of consistency, such as "x == y""
```
     T1: t = x;  y = t
     T2: t = y;  x = t
```
Can happen w/ snapshot isolation.

- Serializability:
  - optimistic requires global validation
  - pessimistic requires global coord/locking



- Katura not buying sticky avail (definitional)
- Nao -  "really confusing"  (yes)
- Patrick/Andrew - causal only w/ sticky (client caching breaks lots of guarantees)



----

### HAT-compliant:

![pic](hat.png)

----

# SALT

"offering atomicity and isolation at the same granularity is the very reason why ACID
transactions are ill-equipped .....(performance vs programmability)"

**Pareto Principle:** 80% of effects from 20% of causes

-splitting ACID transactions up very good for concurrency
 - bad for isolation
- key issue is to provide isolation at a finer granularity, same atomicity
  - nested transactions give subtrans' atomicity, isolate entire thing


 *Allow isolation to be specified at smaller granularity than atomicity*

## Background

ANSI iso levels:
- read-uncommitted
- read-committed (assumed goal for paper)
- repeatable read
- serializable

Issues:
- dirty writes: overwrite uncomitted
- dirty reads: read from uncommitted
- non-repeatable reads
- phantom

## Big Things
- BASE transactions
- salt isolation
- clever names

## BASE transactions
  - **alkaline** subtransactions
    - no other transactions can see state of *uncommitted alkaline subtrans*
    - **committed** alk subtrans state viewable by other BASE or alkaline transactions
    - *not visible* to ACID until entire BASE commit.
	- intended for *partition-local* ops
  - **salt isolation** allows control of internal states visibility (among other BASE transactions)
  - each alkaline sub has associated *exception*
  - *BASE transactions look like ACID transactions to other ACID transactions*
  - **accepted** once any alkaline trans commits,
    - accepted implies commit of entire BASE
    - i.e. all operations successfully executed *or bypassed because of some exception*
  - aborted only if they encounter an error before the transaction is accepted (unlike ACID)

## Salt Isolation

*"If two operations in different transactions conflict, then the temporal dependency
    that exists between the earlier and the later of these operations  must extend to the
    entire transaction"* (allows SALT to work w/ different isolation levels)

*Isolation:* Let Q be the set of operation types {*read*, *range-read*, *write*} and let L and S
be subsets of Q . Further, let *o<sub>1</sub>* in *txn<sub>1</sub>* and *o<sub>2</sub>* in *txn<sub>2</sub>*, be two operations, respectively
of type *T<sub>1</sub>* ∈ L and *T<sub>2</sub>* ∈ S , that access the same object in a conflicting
(i.e. non read-read) manner. **If *o<sub>1</sub>* completes before *o<sub>2</sub>* starts, then *txn<sub>1</sub>* must decide
before *o<sub>2</sub>* starts.**

<img src=saltSets.png width=500>

The Isolation property holds as long as (a) at least one of txn<sub>1</sub> and
txn<sub>2</sub> is an ACID transaction or (b) both txn<sub>1</sub> and txn<sub>2</sub> are alkaline
subtransactions.  

So:
- ACID transactions isolated from all other
- Alkaline subtrans isolated from ACID and other alkaline
- BASE expose states at alkaline boundaries to other BASE


- Design
  - locks (because high contention)
    - type
      - ACID - conflict with alkaline and saline
      - alkaline - conflict w/ ACID and other alkaline
      - saline: conflict with ACID locks (except for read/read) only
    - lock duration
      - *long term* (life of (sub-)trans, 2PL) 
      - *short-term* (just the op)
    - acquire only an alkaline lock at operation start
      - “downgrade” it to saline at end of subtransaction, hold until after the end of the BASE transaction. 
  - no multi-version concurrency

![sets](saltConcurrent.png)

## Indirect Dirty Reads
<p>

![fig4](saltFig4.png)
<p>
Fixed by:

- **Read-after-write across transactions** A BASE transaction *B<sub>r</sub>* that reads a value x, which has been written by another *BASE<sub>w</sub>* transaction, cannot release its saline lock on x until *B<sub>w</sub>* has released its own saline lock on x. 

- **Write-after-read within a transaction** An operation *O<sub>w</sub>* that writes a value x cannot
  release its saline lock on x until all previous read operations within the **same** BASE
  transaction have released their saline locks on their respective objects.

These two ensure uncommitted writes keep locks until all prior read locks also released.

<p>

## Forward Logging
- after BASE knows it will commit (because a subsaline committed), log entire BASE
- prevents cascades that would have occurred because of saline visibility

### Banking, again

![fig1](saltAcidApp.png)
![banking](saltBanking.png)

## Performance

![saltPerf1](saltPerf1.png)

![saltPerf2](saltPerf2.png)


## Comments




//=====================================================================

# Correctness Anomalies Under Serializable Isolation

Background: serializability guarantees an execute (a schedule)
equivalent to a serial schedule, but this serial schedule does not
have to be the ordering that actually occurred in real time (wall
clock time).

Background: *one-copy serilizability* (1SR):
```
...either ... will be ordered first in the equivalent serial order. Whichever transaction is second --- when it reads the balance --- it must read the value written by the first transaction
```

### The Immortal write
<img src=immortalWrite.png width=500>

### The Stale Read
<img src=staleRead.png width=500>

### The Stale Read
<img src=staleRead.png width=500>

### The Causal Reverse
<img src=causalReverse.png width=500>

*Strict serializability* prevents the all of these.
