# Log-Structured File Systems 

### Why log-structured? 
- Track-caching and large buffer caches make reads fast
- FFS writes can require a lot of disk operations, create does
    - write inode 
    - write dir data 
    - write dir inode 
    - write file data 
    - write inode 

# Main Idea 

Use a log.... 
- all modified bytes (both data and inode) written to a sequential, append-only log
    - write only to end of log
    - cache log in memory
    - periodic flush (aggregation)
    - inode map
        - inumbers don't change, new copies of inodes written
        - periodically re-written
        - cached (small for memory)

- Does well on: 
    - spatial locality 
    - most reads: because of large buffer caches 
    - writes: 
        - no seeks 
        - aggregate writes for inodes and data
    - recover: 
        - log is always consistent, 
        - though it might not be up to date 

- Potential problems: 
    - file fragmentation 
    - “cleaning” 

How is disk layed out?
- Superblock: #segments, segment size, etc.
- Checkpoint region
- Segments
    - Segment summary
        - inumber, version, offset for each block
    - Log entries:
        - Inode: mtime, data/indirect block locations
        - Inode map: inode locations, inode versions, atimes
        - Indirect block
        - Directory change log: link/unlink/rename
        - Segment usage table: #bytes free in segments, write times

How to find the data for a file?
-   Checkpoint region gives you inode map
-   Inode map gives you inode, which gives you file data
-   Changes after checkpoint, generally in memory

![disk layout](log.png)


What happens in a checkpoint?
- Write all modified files: data blocks, indirect blocks, inodes,
   inode map table, segment usage table.
- Write to checkpoint region:
    - address of all blocks in inode map
    - segment usage table
    - current time
    - pointer to last written segment

   
# Recovery
- checkpoints
    - inode map
    - segment info
    - not necessarily current!
- roll-forward 

# Cleaner
- read live segments
- write only live data to fewer write segments

```
    write cost  = (total bytes read and written)/write new 
       = (read segs + write live + write new)/ write new 
       = (N         + N*u        + N*(1-u))  / (N * (1-u))
       = 2N / (N * (1-u))
       = 2 / (1-u)
```
"write amplification"


many policies:
- segregate on age, file size etc.


# Comments from students

- a bit similar to virtual memory - yancheng
- overstated performance gains, large cache a vulnerability - dev
- how async writes combined to one ?   Also, balance w/ disaster recovery - minwei
- FFS vs LFS picture a bit unclear
- likes the discussion of "log as truth" and compaction in DBs - armaan
- informal tone
- log-structuring and defrag, temporal locality - olivia
- should have looked at read performance, data integrity 



## LFS summary: 
- Handle reads through caching, and
- Writes by appending large segments to a log
- Impact:
    - Greatly increases disk performance on writes, file creates, deletes, ...
    - Reads not handled by buffer cache are same performance as normal file system
    - Requires cleaning daemon to produce clean space
        - disk bandwidth
        - cpu time



# LBFS

### Motivation

Much inter-file commonality:
- Editing/word processing workloads
    - Often only modify one part of a large file
    - Generate "autosave" files with mostly redundant content
    - Software development workloads
    - Modify header & recompile -> recreate similar object files - Concatenate object files into a library

LBFS exploits commonality to save bandwidth

### Key ideas:

- chunkifying files
- rabin fingerprints per chunk  (overlapping 48-byte regions?)
- SHA1 hash of chunks delineated by finger prints
- server chunk database
- done first by!!!!

### Usage:

![](lbfsMsgs.png)


### Issues:
- small chunks (use min)
- big chunks (use max)


### Implications? 
- commonality in related files
- commonality in unrelated files? Not so much.
- deduplication of duplicated files detected by simpler fixed-size
- MS Single Instance Storage (SIS) was introduced with the Remote
  Installation Services feature of Windows 2000 Server. A typical
  server might hold ten or more unique installation configurations
  (perhaps with different drivers or software suites) but perhaps only
  20% of the data may be unique between configurations
- block checks
- close-to-open consistency
- file "recipes"
    - metadata
    - ordered list of SHA1 hashes

### Why are we not all using this paper?
    
### Comments:
- why not rsync?  (two files)
    - but isn't that enough?
- consistency
- leases
- if I entered a word in the first chunk of a file, would that offset the whole file and change the hash value of every chunk? 
- did some of the techniques and concepts they introduced, get implemented in subversion (rsync 2-file stuff)
- could reduce all files stored within the file system to a series of hashes
- forcing the bounds to be at certain positions will lead to the same shifting problems in some file positions.

