updating to a more useful tutorial

a09d6ee5 · Michael Marsh · 49c0fe89 · a09d6ee5
Commit a09d6ee5 authored 7 years ago by Michael Marsh
--- a/README
+++ b/README
-You may have noticed that, when commiting to your git repositories,
-you get a message along the lines of:
+In this course, we will be using git quite heavily. It's worth taking some time
+to familiarize ourselves with some basic, and not-so-basic, concepts.

-  Your name and email address were configured automatically based
-  on your username and hostname. Please check that they are accurate.
-  You can suppress this message by setting them explicitly:
+Exercise 1: Basic Git Operations, Part 1
+========================================

-      git config --global user.name "Your Name"
-      git config --global user.email you@example.com
+Here we have the basic commands that you'll use on a regular basis. You should
+become familiar with all of these. Before we start, we have to introduce the
+concept of a *repository*. This is a directory hierarchy that's managed as a
+single unit under source control. Generally, this is some software project,
+possibly the entire thing or a component (for very large projects). It can
+be anything that's predominantly text files, though. Many people keep documents
+that they're writing in git (or some other source control), especially when
+using LaTeX, HTML, docbook, or any other non-graphical text preparation systems.

-As a result, git has pulled what information it can from your local
-user account, and used that. The result is log entries that look
-something like:
+init

-  commit 852e77ea537eda3a73ac3389babbad04c35e6729
-  Author: Seed <seed@ubuntu.(none)>
-  Date:   Fri Oct 20 06:45:24 2017 -0700
+  This creates a repository, either in an empty directory, or one already
+  containing code. Create a new directory on your VM, let's call it "testing":

-This isn't terribly useful, especially when multiple people might
-be editing a repository.
+    mkdir ~/testing
+  
+  Now, let's go into this directory and create a file:
+  
+    cd ~/testing
+    date > created_on

-Your task in this assignment is to run the git config commands listed
-above. To make sure this was successful, you should do the following
-in this directory:
+  So far, all we have is a directory with a single file in it, nothing
+  special:
+  
+    find .

-  touch testing
-  git add testing
-  git commit
-  # add your commit log comment
-  git log
-  # verify that your name and email address are correct, and if so...
-  git push origin master
+  The find command is an incredibly useful utility, and you'll learn new
+  features of it for years to come, even when using it heavily. Run "man find"
+  to read the documentation.
+  
+  Now let's make this a git repository:
+  
+    git init .

-On ELMS, rather than pasting in the hash of the commit, you should
-paste in the git log entries for the *last two* commits.  Here's
-an example (for a different repository):
+  If we run our find command again, we see that there's now a directory called
+  ".git" with lots of stuff in it. You can also run
+  
+    ls -A

-  commit 47ff823891e44bb7e8b17f5c28c647b827c1a008 (HEAD -> f17, origin/f17)
-  Author: Michael Marsh <mmarsh@cs.umd.edu>
-  Date:   Wed Oct 18 10:06:42 2017 -0400
+  if you don't want to see the whole recursive list of files. There's one file
+  in particular that we're going to look at later: .git/config

-      added anon comms readings
+status

-  commit 8e9c49805ec80fec496755c23db21f72432a4826
-  Author: Michael Marsh <mmarsh@cs.umd.edu>
-  Date:   Tue Oct 17 14:28:49 2017 -0400
+  As you might guess from the name, this is going to tell you things about
+  your repository and working directory. At this point, we need to go into
+  terminology a little bit.
+  
+  The repository is the collection of data currently under git's revision
+  control. It's generally kept in a compressed format, for efficient storage.
+  
+  The working directory, in contrast, is the set of "normal" files that you're
+  working with. Some of these will be in the repository, and some won't be.
+  Let's go back to our example repository and see how these relate.
+  
+  Start by running

-      added lecture slides
+    git status

+  You should see something like the following:
+  
+    On branch master
+    
+    No commits yet
+    
+    Untracked files:
+      (use "git add <file>..." to include in what will be committed)
+    
+    	created_on
+    
+    nothing added to commit but untracked files present (use "git add" to track)

+  This is telling us that the repository is empty (no commits -- more on that
+  later!), but the working directory contains files that aren't in the
+  repository (Untracked files).
+
+add
+
+  OK, so let's add something to the repository! We're going to shorten
+  "repository" to "repo", because that's the term people most commonly use. The
+  *add* command is what will tell git that you want to include a file in a
+  commit:
+  
+    git add created_on
+
+  You can specify a directory, or a wildcard, in your add command. The risk
+  with these is that you end up with derived binary files (or log files) in
+  your repo. These aren't useful, and binary generally can't be compressed by
+  git, so it's wasteful of space. Try to avoid adding directories or using
+  wildcards unless you really know what you're doing.
+
+  Now run "git status" again. You should see something like:
+  
+    On branch master
+    
+    No commits yet
+    
+    Changes to be committed:
+      (use "git rm --cached <file>..." to unstage)
+    
+    	new file:   created_on
+
+  Note the little comment on unstaging files. If you add more than you'd
+  intended to, this can save your bacon.
+
+  This is also how you "stage" modified files for a commit. That is, if a file
+  is under revision control (it's in the repo), but the working directory has
+  a newer version, you would use "git add" to include it in a commit.
+
+commit
+
+  So far, we still don't have anything in our repo. That's what *commit* is
+  for. Why are these separate? Let's say we have some new files to add, some
+  to rename, and a few to delete (we'll talk about these last two later). We
+  can run several git commands to stage the commit, without writing them to
+  the repo yet. This gives us the chance to review what we're planning to do,
+  and fix things as necessary. The commit is then an atomic unit of change to
+  the repo. Consequently, we generally want to group closely related changes
+  in a single commit. Don't be afraid to do multiple commits in a row -- they're
+  cheap!
+  
+  So, what does a commit look like? There are a couple of common ways to do
+  this (there are actually many options you can use):
+  
+    git commit -m 'Committing my first file!'
+
+  or
+  
+    git commit
+
+  The difference between these is this: Every commit must include a log message.
+  This is how you know, at a high level, what the purpose of this commit is.
+  By specifying the "-m" flag and a string, we're passing the log message on
+  the command line. If we omit this, we're put into an editor, where we can
+  add our message interactively. The first line is going to be a short summary,
+  but the log entry itself can be as long as you need it to be. I often use
+  the log message to keep track of things that still need to be done, or some
+  additional information about the commit that's useful to know.
+  
+  If you're using the interactive editor, just save the file and exit, and
+  git will take care of the rest. Let's say we used the command-line message
+  flag.  Let's run "git status" again:
+  
+    On branch master
+    nothing to commit, working tree clean
+
+  Ta-da!
+
+log
+
+  So, we now have a repo with something in it. How do we know what the state
+  of the repo is? The easiest way is with the *log* command:
+  
+    git log
+
+  When I run this, I see the following:
+  
+    commit 702223c70752248a5d54f16586f6501a47fd2e52 (HEAD -> master)
+    Author: Michael Marsh <mmarsh@cs.umd.edu>
+    Date:   Tue Jan 16 16:01:48 2018 -0500
+    
+        Committing my first file!
+    
+  We can get more information with
+  
+    git log -p
+
+  This gives you the log with "patches" that modify the repo from the previous
+  commit to the one listed.
+
+  One thing you might have noticed is that the commit is a long hexadecimal
+  number. This is a SHA-1 hash, which is what git uses to identify absolutely
+  everything: files, directories, commits, etc.
+
+
+Exercise 2: Basic Git Operations, Part 2
+========================================
+
+clone
+
+  git lets you share a repo between users and machines. It does this very
+  well, which is why it's so popular. The way you get a repo from elsewhere
+  is by *cloning* it. Let's see this in action:
+  
+    cd ~
+    mkdir another
+    cd another
+    git clone ~/testing
+
+  Take a look at ~/another/testing, using the commands we've been using so far.
+  Let's do even more! From ~/another:
+  
+    git clone ~/testing more_testing
+
+  Compare ~/another/testing and ~/another/more_testing. They should be
+  identical! Here we've illustrated the ability to specify a destination
+  directory for *clone*. If unspecified, the repo name of our source will
+  be the name of the destination.
+
+  Here, we've cloned a repo in a local directory. This is of limited usefulness,
+  since generally you're going to want to clone repos stored on other machines.
+  We'll deal with this later, but the general thing we'll see is a command
+  like one of the following:
+  
+    git clone https://example.com/repo_name
+    git clone user@example.com:repo_name
+
+  The latter is what we'll mostly be using in this course. More specifically:
+  
+    git clone git@gizmonic.cs.umd.edu:repo_name
+
+init --bare
+
+  We're now going to create another repo, this time slightly differently:
+  
+    mkdir ~/testing2
+    cd ~/testing2
+    git init --bare .
+
+  What we've now done is create a *bare* repository. This is a repo without
+  a corresponding working directory. Delete ~/another/testing and
+  ~/another/more_testing, and re-clone them from ~/testing2 instead of
+  ~/testing.
+
+push
+
+  Go into ~/another/testing (or testing2, depending on whether you provided a
+  destination directory), and create a file. It doesn't matter what
+  you call it, or what's in it. Add and commit it to the repo. Now run
+  
+    git push
+  
+  You should see something like:
+  
+    Counting objects: 3, done.
+    Writing objects: 100% (3/3), 205 bytes | 205.00 KiB/s, done.
+    Total 3 (delta 0), reused 0 (delta 0)
+    To /Users/mmarsh/classes/another/../testing2
+     * [new branch]      master -> master
+
+  What we've just done is to send our commit to the repo that we cloned. In
+  fact, this will send all local commits we may have that the cloned repo
+  (called the "remote") does not yet have. The default remote is named "origin".
+
+pull
+
+  Now go to ~/another/more_testing, and run
+  
+    git pull
+
+  Take a look at the working directory and the git log. They should be identical
+  to the repo and working directory from which we just pushed.
+
+  As with push, this will synchronize our local repo and working directory with
+  whatever was newer at the remote.
+
+config
+
+  You may have noticed that your log messages have a rather generic-looking
+  committer name and email address. You were probably also warned about this.
+  The log message I showed you above, however, had a real name and email
+  address. This seems like it would be really useful!
+  
+  There are a couple of ways to set these. One of which (editing the user's
+  configuration file by hand) is in your setup exercise, which hopefully
+  you've already done. The other way is to run the *config* command:
+  
+    git config --global user.name "Your Name"
+    git config --global user.email "your_email@example.com"
+
+  Any subsequent commits will now have more useful attribution. Make sure you
+  do this on your VM (or anywhere else you use git)!
+
+Exercise 3: More Advanced Git Operations, Part 1
+================================================
+
+You can get pretty far with the previous commands, but there's a lot you'll
+need to do beyond what these cover.
+
+rm
+
+  Projects accumulate garbage. It happens. That means sometimes we need to get
+  rid of a file. That's where *rm* comes in. Let's go back to
+  ~/testing. Now run
+  
+    git rm created_on
+
+  What does "git status" tell us?  Let's commit it now:
+  
+    git commit -m "removed created_on"
+
+  The file is now gone from your working directory, and the repo! But only
+  sort-of...
+  
+  A key feature of git (or any revision control) is the ability to *revert*
+  to previous versions of the repo. Run "git log", and you'll see something
+  like:
+  
+    commit eef4f0ba06411f678bb741aaf6d06d580d82011a (HEAD -> master)
+    Author: Michael Marsh <mmarsh@cs.umd.edu>
+    Date:   Tue Jan 16 16:32:25 2018 -0500
+    
+        removed created_on
+    
+    commit 702223c70752248a5d54f16586f6501a47fd2e52
+    Author: Michael Marsh <mmarsh@cs.umd.edu>
+    Date:   Tue Jan 16 16:01:48 2018 -0500
+    
+        Committing my first file!
+
+  The earlier commit is still there! Let's not worry about this just yet.
+
+mv
+
+  Sometimes you need to rename a file. Let's do the following (in ~/testing):
+  
+    touch foobar
+    git add foobar
+    git commit -m "adding foobar"
+
+  Now we have a file named "foobar". Let's say we really wanted to just call
+  it "foo". There are two ways we can do this. The hard way:
+  
+    mv foobar foo
+    git add foo
+    git rm foobar
+
+  If you run "git status", you'll see:
+
+    On branch master
+    Changes to be committed:
+      (use "git reset HEAD <file>..." to unstage)
+    
+      	renamed:    foobar -> foo
+  
+  git is smart enough to see that there used to be a file that looks extremely
+  close (or identical) to the new file, so it was probably just renamed! We
+  can tell git this explicitly with a single command:
+  
+    git mv foo bar
+
+  Now we see essentially the same result. The file has both been renamed in
+  the working directory, and a commit staged renaming it in the repo.
+
+  Commit this to update the repo. Don't forget to push your commit if you're
+  working with a remote!
+
+.git/config
+
+  There's a lot we can configure about a repo. All of this can be done with
+  command-line utilities, but it's often easier to go right to the configuration
+  file. This is often the only file in the .git directory you'll have to
+  worry about. Here's what my version of ~/testing/.git/config looks like:
+
+    [core]
+  	    repositoryformatversion = 0
+  	    filemode = true
+  	    bare = false
+  	    logallrefupdates = true
+  	    ignorecase = true
+  	    precomposeunicode = true
+
+  This isn't very interesting. Let's look at ~/another/more_testing/.git/config
+  
+    [core]
+            repositoryformatversion = 0
+            filemode = true
+            bare = false
+            logallrefupdates = true
+            ignorecase = true
+            precomposeunicode = true
+    [remote "origin"]
+            url = /Users/mmarsh/classes/another/../testing2
+            fetch = +refs/heads/*:refs/remotes/origin/*
+    [branch "master"]
+            remote = origin
+            merge = refs/heads/master
+
+  There's a lot more going on here! In particular, we've defined a remote and
+  a *branch*. We already saw that a remote is another repo with which we're
+  going to synchronize. Let's look at this a bit more.
+
+remote
+
+  You can have many remotes for a local repo. In most cases, you only have one.
+  In this course, because we're Computer Scientists, we're going to usually
+  have two, and sometimes three!
+
+  The "git remote" command tells you the names of your defined remote repos.
+  More useful is to add the "-v" flag:
+  
+    git remote -v
+
+  should produce something like
+  
+    origin /home/vmuser/testing2 (fetch)
+    origin /home/vmuser/testing2 (push)
+
+  You can add another remote to your .git/config
+
+branch
+
+checkout
+
+tag
+
+merge
+
+dealing with conflicts
+
+gitk