# Log-Structured File Systems 

### Why log-structured? 
- FFS writes can require a lot of disk operations: 

#### (FFS) Creating a file (each requires a seek, possibly sync): 
- write inode 
- write dir data 
- write dir inode
- write file data 
- write inode 

or

- write file data  ->
- write inode      ->
- write dir data   ->
- write dir inode  

# Main Idea 

Use a log.... 
- write only to end of log
- cache log in memory
- periodic flush (aggregation)
- inode map
  - inumbers don't change, new copies of inodes written
  - periodically re-written
  - cached (small for memory)

![disk layout](log.png)


append-only log
- Does well on: 
  - spatial locality 
  - most reads: because of large buffer caches 
  - writes: 
    - no seeks 
    - aggregate writes for inodes and data
  - recover: 
    - log is always consistent, 
    - though it might not be up to date 

- Potential problems: 
  - file fragmentation 
  - “cleaning” 




# Recovery
- checkpoints
  - inode map
  - segment info
  - not necessarily current!
- roll-forward 

# Cleaner
- read live segments
- write only live data to fewer write segments
```
    write cost  = (total bytes read and written)/write new 
                = (read segs + write live + write new)/ write new 
                = (N         + N*u        + N*(1-u))  / (N * (1-u))
                = 2/(1-u) 
```
"write amplification"

If utilization goes up?

![write amplification](logWriteCost.png)

many policies:
- segregate on age, file size etc.


## In summary, LFS does: 
- Basic idea is to:
  - handle reads through **caching**, and
  - writes by appending **large segments to a log*
- Greatly increases disk performance on writes, file creates, deletes, ...
- Reads not handled by buffer cache are same performance as normal file system
- Requires cleaning daemon to produce clean space
  - disk bandwidth
  - cpu time

## Seltzer compared more modern FFS to LFS. FFS has:
- bigger block size
- clustering
- rotational-latency-aware file layout

### Results
- LFS is an order of magnitude faster on small file creates and deletes.
- The systems are comparable on creates of large files (one-half megabyte or more).
- The systems are comparable on reads of files less than 64 kilobytes.
- LFS read performance is superior between 64 kilobytes and four megabytes, after which FFS is comparable.
- LFS write performance is superior for files of 256 kilobytes or less.
- FFS write performance is superior for files larger than 256 kilobytes.

# Comments
- most interesting is crash recovery (fast!)
- couldn't work today because not enough memory to cache
- how deal w/ SSD's?  (answer: log still gives: aggregation, fault tolerance,
  ability to deal w/ block writes)
- sprite LFS "Outperform Unix in all cases but one (files read sequentially after being written randomly)." How often does this happen and why does Sprite fail here?
- what happens with the write request when there is actually not enough space in disk (force clean) (does really crappy when near full)
- turn immutable by skipping the cleaner!



# LBFS

![lbfsChanges](lbfsChanges.png)

![lbfsSystem](lbfsSystem.png)

Key ideas:
- chunkifying files
- rabin fingerprints  (overlapping 48-byte regions?)
  - rather than hashes
- server chunk database
- approach used first by?


![lbfsReading](lbfsReading.png)

So what are the implications? 
- commonality in related files
- commonality in unrelated files? Not so much.
- deduplication of duplicated files detected by simpler fixed-size
- MS Single Instance Storage (SIS) was introduced with the Remote
  Installation Services feature of Windows 2000 Server. A typical
  server might hold ten or more unique installation configurations
  (perhaps with different drivers or software suites) but perhaps only
  20% of the data may be unique between configurations
- block checks

Why are we not all using this paper?

Comments:
- privacy, couldn't we encrypt?
- "last write wins" seems bad
- why not rsync?  (two files)
- but isn't that enough?
- consistency
- leases
- could reduce all files stored within the file system to a series of hashes

