# Transactional storage for geo-replicated systems
walter

## Why?

- snapshot isolation imposes a total ordering of the commit time of all transactions, even those that do not conflict
- writes of a committed transaction must be immediately visible to later transactions
   - means commit happens only have writes propogated everywhere

## What?
- per-transaction *site*
- per-object *preferred sites*
  - vs *primary site* (have to be modified at primary. eh)
- *csets* (counting sets)
  - commutative, like multisets but allow negative numbers
- two-phase commit across multiple preferred sites

## Big things

**parallel snapshot isolation**
- enables commit and timeline per site
- causal ordering among transactions across sites
- no write-write conflicts

Assumes *each user communicates with one site at a time*.
- user can modify *any* object, not just those w/ preferred at that site
  - this is the difference with primary, which requires all writes to be done at that
  site. Not all that different, though, as preferred must still be consulted.

## Features
- asynchronous propogation across sites
- efficient update-anywhere (kinda-sorta)
- no conflict-resolution logic (**but:** they serialize at preferred sites)
- isolation within a site

![snapshot](walterIsolation.png)

## PSI
- snapshot isolation locally
- different commit orderings across sites
- but xactions **with overlapping read sets ordered same everywhere**
- causal propogation across sites after-the-fact

![PSI](walterParallel.png)




### Transaction startup
- transaction has a *site* where it will commit
- transaction is assigned a vector timestamp *startVTS*  when it starts.
   - For example, if startVTS = `⟨2, 4, 5⟩` then the transaction reads from the snapshot
     containing 2 transactions from site 1, 4 from site 2, and 5 from site 3.
      - startVTS contains the sequence number of the latest transactions from *each site that were committed at the local site*
   - A version number (or simply version) is a pair ⟨site, seqno⟩ assigned to a
     transaction when it commits; it has the site where the transaction executed, and a
     sequence number local to that site.
- Paxos configuration server maintains preferred sites, system info

### Committing
- `startVTS` is vv at local site
- `fast commit` (all objects in write preferred set local)
  - check that written objects are *unmodified* since start
  - check *none locked* (by slow commit protocol)
  - **only abort point**
  - transaction then given *per-site sequence number*
- `slow commit` (at least one non-local preferred site)
  - local site acts as coordinator in two-phase commit
    - remote site says *yes* if object unmodified, unlocked
      - and locks object
    - second phase says to commit and unlock objects.
  - Note:
    - "unmodified" means since version used by the trans, which might be old
    - entire transaction executed locally, lists of updates pushed at end

## Performance

- BerkeleyDB is a straw man, widely known to be very slow, especially across wide area
- *one site per data center!*
	 
![anomolies](walterAnomolies.png)



# Comments
"Across sites, weaker consistency is acceptable, because users can tolerate a small delay for their actions to be seen by other users"
- They seem to be confusing consistency with *latency*.
- "Sets are interesting and the authors discuss them a lot, but I don't really see their
  point...are they just a data structure optimized and intended to be used to implement the applications in the paper?"




