# Log-Structured File Systems 

![disk pic](harddisk.jpg)

### Why log-structured? 
- Track-caching and large buffer caches make reads fast
- FFS writes can require a lot of disk operations, create does
    - write inode 
    - write dir data 
    - write dir inode 
    - write file data 
    - write inode 

# Main Idea 

Use a log.... 
- all modified bytes (both data and inode) written to a sequential, append-only log
    - write only to end of log
    - cache log in memory
    - periodic flush (aggregation)
    - inode map
        - inumbers don't change, new copies of inodes written
        - periodically re-written
        - cached (small for memory)

- Does well on: 
    - spatial locality 
    - most reads: because of large buffer caches 
    - writes: 
        - no seeks 
        - aggregate writes for inodes and data
    - recover: 
        - log is always consistent, 
        - though it might not be up to date 

- Potential problems: 
    - file fragmentation 
    - “cleaning” 

How is disk layed out?
- Superblock: #segments, segment size, etc.
- Checkpoint region
- Segments
    - Segment summary
        - inumber, version, offset for each block
    - Log entries:
        - Inode: mtime, data/indirect block locations
        - Inode map: inode locations, inode versions, atimes
        - Indirect block
        - Directory change log: link/unlink/rename
        - Segment usage table: #bytes free in segments, write times

How to find the data for a file?
-   Checkpoint region gives you inode map
-   Inode map gives you inode, which gives you file data
-   Changes after checkpoint, generally in memory

![disk layout](log.png)


What happens in a checkpoint?
- Write all modified files: data blocks, indirect blocks, inodes,
   inode map table, segment usage table.
- Write to checkpoint region:
    - address of all blocks in inode map
    - segment usage table
    - current time
    - pointer to last written segment

   
# Recovery
- checkpoints
    - inode map
    - segment info
    - not necessarily current!
- roll-forward 

# Cleaner
- read live segments
- write only live data to fewer write segments

```
    write cost  = (total bytes read and written)/write new 
       = (read segs + write live + write new)/ write new 
       = (N         + N*u        + N*(1-u))  / (N * (1-u))
       = 2/(1-u) 
```

"write amplification"

If utilization goes up?

many policies:
- segregate on age, file size etc.


# Comments from students

- most interesting is crash recover (fast!)
- couldn't work today because not enough memory to cache
- how deal w/ SSD's?  (answer: log still gives: aggregation and fault tolerance)
- sprite LFS "Outperform Unix in all cases but one (files read sequentially after being written randomly)." How often does this happen and why does Sprite fail here?
- what happens with the write request when there is actually not enough space in disk (force clean) (does really crappy when near full)
- turn immutable by skipping the cleaner!

## LFS summary: 
- Handle reads through caching, and
- Writes by appending large segments to a log
- Impact:
    - Greatly increases disk performance on writes, file creates, deletes, ...
    - Reads not handled by buffer cache are same performance as normal file system
    - Requires cleaning daemon to produce clean space
        - disk bandwidth
        - cpu time


# Immutability Changes Everything

![im](immut1.png)

Trends:
- increasing storage
- increasing distribution
- increasing ambiguity
- LSF, COW, LSM

## Append-only computing
- logs are **the truth**
- single-master changes are applied sequentially via single-master or concensus

## Data on the:
- inside
  - mutable relational data
- outside: think artifacts, reports, summaries
  - immutable
  - identity
  - (possibly) versioned

## Immutable Is Not Always Immutable
- optimizing for read access: indexes, de-normalization
- farming out portions of work, with re-try.
- bit meanings change
- normalization is there to eliminate update anomalies
  - table decomposition takes a single large table and makes multiple small tables
  - updating mult tables causes *update anomolies*
  - anomolies addressed by *normalization*, which makes the tables a bit less efficient
  - if immutable, no anomolies, so....
  - versioned data changes, but versions do not

## Bottom line
1. Immutability enables unambiguous identity
1. Immutability enables massive replication/caching/parallelism
1. Immutability eliminates locking
1. Immutability enables re-computation
- from immutable
- of immutable






