Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

Project 1: Learning Go, Starting to Build UbiStore

Due: Sept 10, 2023, 11:59:59 pm, v1

Systems students often learn best by doing; "Eat your own dogfood!" is a common refrain. In this class we will build UbiStore: a reliable, distributed, consistent, and permanent personal data storage system. The general idea is similar to that of PerKeep, with some necessary simplifications and a few additions.

The vision behind this class's series of six projects is to build a storage meta-service that can hold, replicate, and allow sharing of any kind of data. Replace backups through geo-replicated mirroring! Replace Dropbox through consistent, cooperative sharing! Replace git with...okay, maybe going to far.

The main focus of this p1 is getting up to speed on Go, as most of you have not seen it before. We will build an extremely simply version of UbiStore that will allow you to store files and directories to UbiStore, and to retrieve them.

Go (aka Golang) Resources

At golang.org:

Editors / IDEs

  • As an old-school hacker, I've long used Emacs, together w/ go modes.
  • I've recently switched to Goland, an IDE that has good debugger support and is free for students.
  • VSCode is another great alternative.

Folks in industry seem pretty evenly split between Goland and VSCode, w/ a smattering of others.

Goals

  1. Build the bones of UbiStore, a simple blob-store.
  2. Graft a simple webserver on top of it.

My version of blobs.go is about 350 LOC, whereas serve.go is 87.

Setup

Download the starter files for this project here.

NOT YET Each of you will have a repository on gitlab.cs.umd.edu for version control, and to submit projects. More info on this soon. However, the intent is that each of you will have a gitlab repsository accessible as gitlab.cs.umd.edu/cmsc818efall2023students/blather, where `blather' is your university directory ID.

If you can not clone your repository, first log into gitlab.cs.umd.edu's web interface (using your university directory ID) and try again. If you still have problems, post to piazza.

Your individual projects will live in directories p1, p2, etc. under the above directory. Each project will be a golang module containing a file ubi.go at the top level, which imports subdirectories containg the real code. Do not change this directory structure! NOT YET

You should also minimise changes to ubi.go, as I will be running a script across all the projects to capture output and changes.

Set up your environment as follows:

  1. Install Go (current version 1.21.0), and set up GOPATH, possibly GOBIN.
  2. Grab your initial files here.
  3. All your changes should be made here and pushed back to the repository. I will download your files from here at project deadlines.
  4. Test your setup by modifying or adding a file, pushing to gitlab (i.e. git commit -a -m auto; git push origin master), and checking that the change was committed through gitlab's web interface.

You can run Go on many different platforms. However, the Go team develops on Linux, and I will do all my testing on a Mac. If this starts to cause problems I can make a mac available for remote testing.

Terminology and Overview

Project p1 will allow you to read and write files and directories to and from a local ubistore repository.

Data in UbiStore is stored in "blob files", up to 8k bytes per blob. Blob files might contain raw file data, or JSON-encoded objects. In either case, the blob file's name is an ASCII-armored representation of the SHA256 hash of the blob contents.

JSON is a lightweight, human-readable way to encode an object as a byte array, or string. In this case, we are marshalling data about files and directories by writing JSON descriptions of the file/directory attributes.

We create a file recipe (similarly, directory recipes) in JSON as a set of attributes and an array of blob signatures (or "sigs"), one per blob. For example:

io:~/818/outf23/p1> go run ubi.go put ubi.go
sha256_32_LZUGEAXS5ATYWWRI67PJ25HRBKBMAZE6GTGOPWDOKTE7Q7AGVLLQ====
io:~/818/outf23/p1> go run ubi.go desc sha256_32_LZUGEAXS5ATYWWRI67PJ25HRBKBMAZE6GTGOPWDOKTE7Q7AGVLLQ====
{
    "Name": "ubi.go",
    "Size": 712,
    "Mode": 420,
    "ModTime": "2023-08-26T12:36:17.508365849-04:00",
    "IsDir": false,
    "Version": 1,
    "PrevSig": "",
    "ChildSigs": null,
    "ChildNames": null,
    "DataBlockSigs": [
        "sha256_32_HKRKFA74B6HTV4VVIDP5MKFTI6BKNPLTF4Q3CTLPD5X2U56WYDCA===="
    ]
}

The first command does a "put" on the main.go source. The JSON recipe is written to a block whose signature is sha256_32_HKRKFA74B6HTV4VVIDP5MKFTI6BKNPLTF4Q3CTLPD5X2U56WYDCA====. This sig is created by:

  • marshalling the JSON source of the recipe to a string (actually a byte array, but we can easily convert between the two formats).
  • Taking the sha256 hash (package crypto/sha256, and then ASCII-armoring it via bin32 "std" encoding from encoding/bin32.
  • Prepending the resulting string w/ "sha256_32_" just so that we remember how we got there.

The second command describes the block referenced by an existing sig. For right now we follow these rules:

  • 'Version' is always '1'.
  • DataBlockSigs is an array of sigs that name 8k or shorter blocks comprising the file's data.

TASKS

Your tasks consist of defining the three commands put, get, and desc in store/blobs.go. Additionally, you will implement an http interface.

You will use the UbiNode data structure to describe both files and directories, w/ not all fields applicable to both:

type UbiNode struct {
	Name    string      // base name of the file
	Size    int64       // length in bytes for regular files; system-dependent for others
	Mode    fs.FileMode // file mode bits
	ModTime time.Time   // modification time
	IsDir   bool        // abbreviation for Mode().IsDir()

	Version int
	PrevSig string

	ChildSigs     []string
	ChildNames    []string
	DataBlockSigs []string

	sig       string
	dirty     bool
	metaDirty bool
	expanded  bool
	parent    *UbiNode
	kids      map[string]*UbiNode
	data      []byte
}

Note that in go, open capitalized field names are visible outside the module, implying that only capitalized field names will be serialized via JSON.

Command put <pathname>

put takes a single argument, which will be either a file or a directory. Create recipes and data blocks, and write them to the blobstore. The blobstore is just a directory that you will write blocks to, each block named by its bin32-encoded sha256 hash.

Use .blobstore in the current directory to store your blobs, creating it if it does not yet exist.

A file recipe is turned into a JSON encoding by calling json.MarshalIndent(u, "", " "), where u is an UbiNode specifying either a file or directory. UbiNode is defined in store/blobs.go.

The exact number of spaces you add during marshalling matters only in that any deviation from the above parameters will result in different hash values than my code. This makes my life more difficult.

For a file, the DataBlockSigs array must be an array of sigs, each of which corresponds to a blob of file data, all but the last 8192 bytes in size. Data blobs are not currently encoded in any way.

A directory calls itself recursively for subdirectories, thereby creating signatures for files and sub-directories before containing directories.

Each put command should define a special sig value of last with the top-level sig created by put-ing a file or directory. Anywhere a sig could be passed in at the command line (as an argument to get, desc or in the webserver, see below), last can be used as a synonym for this last created top-level signature.

Command get <sig> <pathname>

get is the converse of put and takes two arguments: the signature of the recipe for the file or directory being retrieved, and the pathname where the result is created. The following two commands result in blah being duplicated as blah2 in the same directory:

     > go run main.go put blah
     sha256_32_LPPPLYS3MR6D6HQHOECC53SQOUVQDTZLSDS6BJHZYWDTM===
     > go run main.go get sha256_32_LPPPLYS3MR6D6HQHOECC53SQOUVQDTZLSDS6BJHZYWDTM=== blah2

Note that the last line could also be written as:

     > go run main.go get last blah2

Command desc <sig>

Prints out information about the block. If the block is a recipe, print to stderr. If it is a datablock, just print the size in bytes.

Command serve <portnum>

You should also create a serve command that listens to port portnum and interprets the path as a sig for a file or directory recipe. Sending a GET request, as in http://localhost:8080/sha256_32_C5BIBH.... where the sig is that of a file recipe, returns the entire corresponding file with proper mime encoding.

The sig can also correspond to a directory, in which case contained subdirectories and files are listed as hyperlinks, allowing files to be retrieved and subdirectories to be entered.

Note that last should work here as in a get or desc command.

Request error codes: Your server should set the result code of each reply with one of the following error codes:

  • 200: Means all good.
  • 404: Means invalid sig or other errors.
  • 500: Means some other error.

Details

  • Create a sig using the following:
// return base32 (stringified) version of sha1 hash of array of bytes
// Prefixes '_' if JSON string.
func computeSig(buf []byte) string {
	sha := sha256.Sum256(buf)
	shasl := sha[:]
	return "sha256_32_" + base32.StdEncoding.EncodeToString(shasl)
}

Hints

  • Use marshalled, err := json.MarshalIndent(f, "", " ") to marshal your recipes and get the same white space as I have. White space will count in this project.

  • ioutil.ReadFile and ioutil.WriteFile are your friends for making "last" persistent.

Testing, and Grading.

Testing

Test parts separately when possible. Go makes it easy and fast to take small sections of code and run them by themselves. The test module allows you to automatically test small chunks of code by typing go test at the command line.

The commands should be put, get, and desc, as discussed above.

Grading

I will use the following rough guide to assign grades:

  • 40%: put functionality.
  • 40%: get functionality.
  • 10%: desc functionality.
  • 10%: serve functionality.

I will run the included test script as follows: sh testscript.sh >& testscript.out (redirects both STDERR and STDOUT on /bin/tcsh) and compare your results w/ mine. The only differences should be due to different timestamps.

Submit

Submit by committing back to your gitlab repository. Since you should be doing this at least hourly anyway; this is not actually an extra step.

Video walkthrough!

p1 walkthrough