-
Peter J. Keleher authoredPeter J. Keleher authored
- Project 1: Learning Go, Starting to Build UbiStore
- Go (aka Golang) Resources
- At golang.org:
- Editors / IDEs
- Goals
- Setup
- Terminology and Overview
- TASKS
- Command put <pathname>
- Command get <sig> <pathname>
- Command desc <sig>
- Command serve <portnum>
- Details
- Hints
- Testing, and Grading.
- Testing
- Grading
- Submit
- Video walkthrough!
UbiStore
Project 1: Learning Go, Starting to Build Due: Sept 10, 2023, 11:59:59 pm, v1
Systems students often learn best by doing; "Eat your own dogfood!" is
a common refrain. In this class we will build
UbiStore
: a reliable, distributed, consistent, and permanent
personal data storage system. The general idea is similar to that of
PerKeep, with some necessary simplifications
and a few additions.
The vision behind this class's series of six projects is to build a
storage meta-service that can hold, replicate, and allow sharing of
any kind of data.
Replace backups through geo-replicated mirroring! Replace Dropbox
through consistent, cooperative sharing! Replace git
with...okay,
maybe going to far.
The main focus of this p1 is getting up to speed on
Go, as most of you have not seen it before. We
will build an extremely simply version of UbiStore
that will allow
you to store files and directories to UbiStore
, and to retrieve
them.
Go (aka Golang) Resources
golang.org:
AtEditors / IDEs
- As an old-school hacker, I've long used Emacs, together w/ go modes.
- I've recently switched to Goland, an IDE that has good debugger support and is free for students.
- VSCode is another great alternative.
Folks in industry seem pretty evenly split between Goland and VSCode, w/ a smattering of others.
Goals
- Build the bones of
UbiStore
, a simple blob-store. - Graft a simple webserver on top of it.
My version of blobs.go
is about 350 LOC, whereas serve.go
is 87.
Setup
Download the starter files for this project here.
NOT YET
Each of you will have a repository on gitlab.cs.umd.edu
for version control, and to
submit projects. More info on this soon. However, the intent is that each of you will have a gitlab
repsository
accessible as gitlab.cs.umd.edu/cmsc818efall2023students/blather
, where `blather' is your university directory ID.
If you can not clone your repository, first log into gitlab.cs.umd.edu's web interface (using your university directory ID) and try again. If you still have problems, post to piazza.
Your individual
projects will live in directories p1
, p2
, etc. under the above
directory. Each project will be a golang module containing a file
ubi.go
at the top level, which imports subdirectories containg the
real code.
Do not change this directory structure!
NOT YET
You should also minimise changes to ubi.go
, as I will be running a script across all the projects to capture output and changes.
Set up your environment as follows:
- Install Go (current version 1.21.0), and set up
GOPATH
, possiblyGOBIN
. - Grab your initial files here.
- All your changes should be made here and pushed back to the repository. I will download your files from here at project deadlines.
- Test your setup by modifying or adding a file, pushing
to gitlab (i.e.
git commit -a -m auto; git push origin master
), and checking that the change was committed through gitlab's web interface.
You can run Go on many different platforms. However, the Go team develops on Linux, and I will do all my testing on a Mac. If this starts to cause problems I can make a mac available for remote testing.
Terminology and Overview
Project p1 will allow you to read and write files and directories to and from a local ubistore repository.
Data in UbiStore
is stored in "blob files", up to 8k
bytes per blob. Blob files might contain raw file data, or
JSON-encoded objects. In either
case, the blob file's name is an ASCII-armored representation of the
SHA256 hash of the blob contents.
JSON is a lightweight, human-readable way to encode an object as a byte array, or string. In this case, we are marshalling data about files and directories by writing JSON descriptions of the file/directory attributes.
We create a file recipe (similarly, directory recipes) in JSON as a set of attributes and an array of blob signatures (or "sigs"), one per blob. For example:
io:~/818/outf23/p1> go run ubi.go put ubi.go
sha256_32_LZUGEAXS5ATYWWRI67PJ25HRBKBMAZE6GTGOPWDOKTE7Q7AGVLLQ====
io:~/818/outf23/p1> go run ubi.go desc sha256_32_LZUGEAXS5ATYWWRI67PJ25HRBKBMAZE6GTGOPWDOKTE7Q7AGVLLQ====
{
"Name": "ubi.go",
"Size": 712,
"Mode": 420,
"ModTime": "2023-08-26T12:36:17.508365849-04:00",
"IsDir": false,
"Version": 1,
"PrevSig": "",
"ChildSigs": null,
"ChildNames": null,
"DataBlockSigs": [
"sha256_32_HKRKFA74B6HTV4VVIDP5MKFTI6BKNPLTF4Q3CTLPD5X2U56WYDCA===="
]
}
The first command does a "put" on the main.go
source. The JSON recipe is
written to a block whose signature is
sha256_32_HKRKFA74B6HTV4VVIDP5MKFTI6BKNPLTF4Q3CTLPD5X2U56WYDCA====
. This sig
is created by:
- marshalling the JSON source of the recipe to a string (actually a byte array, but we can easily convert between the two formats).
- Taking the sha256 hash (package crypto/sha256, and then ASCII-armoring it via bin32 "std" encoding from encoding/bin32.
- Prepending the resulting string w/ "sha256_32_" just so that we remember how we got there.
The second command describes the block referenced by an existing sig. For right now we follow these rules:
- 'Version' is always '1'.
- DataBlockSigs is an array of sigs that name 8k or shorter blocks comprising the file's data.
TASKS
Your tasks consist of defining the three commands put
, get
, and
desc
in store/blobs.go
. Additionally, you will implement an http
interface.
You will use the UbiNode
data structure to describe both files and
directories, w/ not all fields applicable to both:
type UbiNode struct {
Name string // base name of the file
Size int64 // length in bytes for regular files; system-dependent for others
Mode fs.FileMode // file mode bits
ModTime time.Time // modification time
IsDir bool // abbreviation for Mode().IsDir()
Version int
PrevSig string
ChildSigs []string
ChildNames []string
DataBlockSigs []string
sig string
dirty bool
metaDirty bool
expanded bool
parent *UbiNode
kids map[string]*UbiNode
data []byte
}
Note that in go
, open capitalized field names are visible outside
the module, implying that only capitalized field names will be
serialized via JSON.
put <pathname>
Command put
takes a single argument, which will be either a file or a
directory. Create
recipes and data blocks, and write them to the blobstore. The
blobstore is just a directory that you will write blocks to, each
block named by its bin32
-encoded sha256
hash.
Use .blobstore
in the current directory to store your blobs, creating it
if it does not yet exist.
A file recipe is turned into a JSON encoding by calling
json.MarshalIndent(u, "", " ")
, where u
is an UbiNode
specifying either a file or directory.
UbiNode
is defined in store/blobs.go
.
The exact number of spaces you add during marshalling matters only in that any deviation from the above parameters will result in different hash values than my code. This makes my life more difficult.
For a file, the DataBlockSigs
array must be an array of sigs, each of which
corresponds to a blob of file data, all but the last 8192 bytes in
size. Data blobs are not currently encoded in any way.
A directory calls itself recursively for subdirectories, thereby creating signatures for files and sub-directories before containing directories.
Each put
command should define a special sig value of
last with the top-level sig created by put
-ing a file or
directory.
Anywhere a sig could be passed in at the command line (as an
argument to get
, desc
or in the
webserver, see below), last
can be used as a synonym
for this last created top-level signature.
get <sig> <pathname>
Command get
is the converse of put
and takes two arguments: the signature of the recipe for the file or
directory being retrieved, and the pathname where the result
is created.
The following two commands result in blah
being duplicated
as blah2
in the same directory:
> go run main.go put blah
sha256_32_LPPPLYS3MR6D6HQHOECC53SQOUVQDTZLSDS6BJHZYWDTM===
> go run main.go get sha256_32_LPPPLYS3MR6D6HQHOECC53SQOUVQDTZLSDS6BJHZYWDTM=== blah2
Note that the last line could also be written as:
> go run main.go get last blah2
desc <sig>
Command Prints out information about the block. If the block is a recipe,
print to stderr
. If it is a datablock, just print the size in bytes.
serve <portnum>
Command You should also create a serve
command that listens to port
portnum
and interprets the path as a sig for a file or directory recipe.
Sending a GET
request, as in http://localhost:8080/sha256_32_C5BIBH.... where the
sig is that of a file recipe, returns the entire corresponding file
with proper mime encoding.
The sig can also correspond to a directory, in which case contained subdirectories and files are listed as hyperlinks, allowing files to be retrieved and subdirectories to be entered.
Note that last should work here as in a get
or desc
command.
Request error codes: Your server should set the result code of each reply with one of the following error codes:
- 200: Means all good.
- 404: Means invalid sig or other errors.
- 500: Means some other error.
Details
- Create a sig using the following:
// return base32 (stringified) version of sha1 hash of array of bytes
// Prefixes '_' if JSON string.
func computeSig(buf []byte) string {
sha := sha256.Sum256(buf)
shasl := sha[:]
return "sha256_32_" + base32.StdEncoding.EncodeToString(shasl)
}
Hints
-
Use
marshalled, err := json.MarshalIndent(f, "", " ")
to marshal your recipes and get the same white space as I have. White space will count in this project. -
ioutil.ReadFile
andioutil.WriteFile
are your friends for making "last" persistent.
Testing, and Grading.
Testing
Test parts separately when possible. Go makes it
easy and fast to take small sections of code and run them by
themselves. The test module allows
you to automatically test small chunks of code by typing go test
at
the command line.
The commands should be put
, get
, and desc
, as discussed above.
Grading
I will use the following rough guide to assign grades:
- 40%:
put
functionality. - 40%:
get
functionality. - 10%:
desc
functionality. - 10%:
serve
functionality.
I will run the included test script as follows: sh testscript.sh >& testscript.out
(redirects both STDERR
and STDOUT
on /bin/tcsh)
and compare your results w/ mine. The only differences should be due to different timestamps.
Submit
Submit by committing back to your gitlab repository. Since you should be doing this at least hourly anyway; this is not actually an extra step.