# LBFS

### Motivation

Much inter-file commonality:
- Editing/word processing workloads
    - Often only modify one part of a large file
    - Generate "autosave" files with mostly redundant content
    - Software development workloads
    - Modify header & recompile -> recreate similar object files - Concatenate object files into a library

LBFS exploits commonality to save bandwidth

### Key ideas:

- chunkifying files
- rabin fingerprints per chunk  (overlapping 48-byte regions?)
- SHA1 hash of chunks delineated by finger prints
- server chunk database
- done first by!!!!

### Usage:

![](lbfsMsgs.png)


### Issues:
- small chunks (use min)
- big chunks (use max)


### Implications? 
- commonality in related files
- commonality in unrelated files? Not so much.
- deduplication of duplicated files detected by simpler fixed-size
- MS Single Instance Storage (SIS) was introduced with the Remote
  Installation Services feature of Windows 2000 Server. A typical
  server might hold ten or more unique installation configurations
  (perhaps with different drivers or software suites) but perhaps only
  20% of the data may be unique between configurations
- block checks
- close-to-open consistency
- file "recipes"
    - metadata
    - ordered list of SHA1 hashes

### Why are we not all using this paper?
    
### Comments:
- like to see chunking implemented!  (awesome, I will add to project2)
- close-to-open consistency limiting for collaboration - olivia


- why not rsync?  (two files)
    - but isn't that enough?
- consistency
- leases
- if I entered a word in the first chunk of a file, would that offset the whole file and change the hash value of every chunk? 
- did some of the techniques and concepts they introduced, get implemented in subversion (rsync 2-file stuff)
- could reduce all files stored within the file system to a series of hashes
- forcing the bounds to be at certain positions will lead to the same shifting problems in some file positions.

