Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

Tango Distributed Data Structures over a Shared Log

Why?

Existing systems build abstractions for computing over massive data sets:

  • hadoop
  • Spark

Need "application metadata", with persistence and high availability.

  • maps
  • counters
  • queues
  • graphs
  • job assignments
  • network topologies .....

How?

  • client modifies object by appending an update to the log
  • accesses the object by sync'ing local view w/ log
  • elasticity - scaling throughput of linearizable reads by adding new views, w/o slowing write throughput. ("until saturation")
  • transaction atomicity and isolation from log
  • streams to filter log seen at clients

tango

Transactions:

tango

  • optimistic concurrency control
    • writes entered in log as speculative
    • commit record contains a read set w/ versions.
    • transaction succeeds if read objects current at commit record.
    • each reader deterministically evaluates commit record
    • read transactions:
      • nothing inserted in log
      • locally track time (offset) of first read (start of transaction), and last read (end of transaction)
      • commit/abort by as in ordinary read/write transactions
    • write transactions always commit.
  • can use fine-grained per-app versions
    • opaque key parameters in helper funcs
  • crashed client's transaction aborted by others appending crash record

Streams

  • per-object stream
  • transaction commits multi-appended to all relevant streams
  • remote-write transactions
  • decision records
  • a client executing a transaction must insert a decision record for a transaction if there’s some other client in the system that hosts an object in its write set but not all the objects in its read set. "
  • generating clients can not do a remote read in trans (would require RPCs...) tango

Comments/Questions

  • (claude) why metadata?