bash.txt

Bash Quick-Reference
====================

The bash man page is very long and detailed, with the result that
it can be difficult to find what you're looking for in it.

Variables
---------

There are many shell variables that can come in handy, both
interactively and in scripts. We typically write them with a dollar
sign in front, because that is how they are referenced (but not
set!).

| *Variable*  | *Contents*                                        |
| ----------  | ----------                                        |
| $$          | Current process (shell) PID                       |
| $!          | PID of last subprocess started in the background  |
| $?          | Return value of the last completed subprocess     |
| $0          | Name with which the shell/script was invoked      |
| $1, $2, ... | Positional parameters to the shell/script         |
| $#          | Number of positional parameters to shell/script   |
| $@          | Positional parameters, expanded as separate words |
| $*          | Positional parameters, expanded as a single word  |
| $HOME       | The current user's home directory                 |
| $OLDPWD     | The previous working directory (see "cd -")       |
| $PATH       | The directories searched for commands             |
| $PPID       | The PID of the shell/script's parent process      |
| $PWD        | The current working directory                     |
| $RANDOM     | A random int in the range [0,32767]               |
| $EUID       | The current effective user numeric ID             |
| $UID        | The current user numeric ID                       |
| $USER       | The current username                              |

You set a variable with

    my_var=123

Referencing variables is done with a dollar sign, but there is more
than one way to do this. These are equivalent:

    $HOME
    ${HOME}

Why would we use the longer version? Try the following:

    for f in *; do echo $f0; done

Now try:

    for f in *; do echo ${f}0; done

The braces give us considerable more control, and some extra features,
as we'll see.

It is also possible to set variables for a single command. There
are two ways to do this:

    /usr/bin/env a=foo my_command
    a=foo my_command

Both of these set the variable `a` to the value `foo`, but *only*
for the environment seen by `my_command`. This is used frequently
to override default variables without changing them:

    JAVA_HOME=${HOME}/my_java some_java_program
    LD_LIBRARY_PATH=${HOME}/build/lib my_c_program

By convention, script-local variables are lowercase, and more global
variables (like HOME) are uppercase.

Subprocesses typically don't get passed the variables you define. To
change this, you need to export the variable:

    export PATH


Parameter Expansion
-------------------

See the "Parameter Expansion" section of the bash manpage for more
details and other expansion options.

| *Expansion* | *Effect*                                            |
| ----------- | --------                                            |
| ${#var}     | The length of *var*                                 |
| ${var:-def} | *var*, if set, otherwise *def*                      |
| ${var:=def} | As above, but *var* will be set to *def* if not set |
| ${var:off}  | Substring of *var*, beginning with character *off*  |
| ${var:o:l}  | As above, but at most *l* characters                |
| ${var/p/s}  | Expand *var*, replace pattern *p* with string *s*   |
|             | If *s* not provided, remove the pattern             |
|             | # and % perform prefix and suffix matches           |

We can use this in a script along with variable overriding for handling
script inputs symbolically, rather than positionally:

    ${target:=foo.txt}
    grep foobar ${target}

We would call this (assuming it's called "myscript"):

    target=/etc/hosts myscript


Quoting
-------

How a language handles single- vs double-quotes varies quite a bit.
Python treats them equivalently, while C only allows a single
character between single-quotes. Bash works a bit differently:
single-quoted strings do not have parameters expanded, while
double-quoted strings do. For example, compare:

    echo '${HOME}'
    echo "${HOME}"

You will most often want double-quotes, but single-quotes are very
useful when preparing input to another program. For example:

    find . -name '*.txt'

which is equivalent to

    find . -name \*.txt


Command Execution
-----------------

There are multiple ways to do this. Consider an executable `foo`:

| *Invocation* | *Effect*                                                 |
| ------------ | --------                                                 |
| `foo`        | foo is run as a subprocess normally                      |
| `foo &`      | foo is run in the background, as execution continues     |
| `. foo`      | foo is run *in the current shell*                        |
| `(foo)`      | a subshell is started, and foo is run as a subproc of it |
| `` `foo` ``  | foo is run as a subprocess, and its STDOUT is returned   |
| `$(foo)`     | Same as the above                                        |

The last two are equivalent, but the $() form is preferable, because
it is clearer and can be nested.

The dot-execution is useful for snippets of bash code which set
environment variables or define functions. Think of it like an
"include" statement.

The advantage of running in a subshell is that it doesn't impact
the current shell. Here's an example:

    for d in *  # "*" expands to the contents of the current directory
    do
        if [ -d $d ]  # Test that $d is a directory
        then
            (cd $d; git pull origin master)
        fi
    done

If we ran in the parent shell, we would have to make this longer:

    cwd=$(pwd)
    for d in *  # "*" expands to the contents of the current directory
    do
        if [ -d $d ]  # Test that $d is a directory
        then
            cd $d
            git pull origin master
            cd $cwd
        fi
    done

We could use `..` instead of `$cwd`, but then we'd have to worry
about `cd $d` failing. With a subshell, we don't need to worry about
this at all.


Working with Positional Parameters
----------------------------------

Sometimes, $* or $@ are good enough:

    foo $*              # Pass all parameters to this other command
    for a in $*; do ... # Loop over the parameters

If the parameters have different meanings, we can do the following:

    a=$1
    b=$2

We can make this more robust, with defaults:

    a=${1:-foo}
    b=${2:-bar}

We can also use `shift`, which pops the first positional parameter:

    a=$1
    shift
    b=$1
    shift

or, more compactly:

    a=$1; shift
    b=$1; shift

The advantage of the `$1; shift` form is that we can add more
positional parameters without having to keep count. We'll see other
uses later.


Mathematical Expressions
------------------------

Many mathematical operations can be put in $(( )). This will only
perform integer math, however. Here's an example:

    total=0
    for thing in $*
    do
        total=$(( ${total} + ${#thing} ))
    done
    echo ${total}

This will sum the lengths of the positional parameters, and print
the result to STDOUT.

For floating-point math, we have to use other options (such as awk).


File Descriptors
----------------

There are three automatic file descriptors (in addition to any files
your program opens):

| *File* | *Descriptor* | *Meaning*                                 |
| ------ | ------------ | ---------                                 |
| STDIN  |      0       | Standard input, reading from the terminal |
| STDOUT |      1       | Standard output, writing to the terminal  |
| STDERR |      2       | Standard error, writing to the terminal   |

The normal thing for a program to do is read from STDIN and write to STDOUT.
Many programs will also write to STDERR, but if all you have is the terminal,
it's hard to tell STDOUT from STDERR. However, the shell still knows, and
lets us treat these differently. Here's an example. First, try:

    grep bash /etc/*

Now, try this:

    grep bash /etc/* 2>/dev/null

The second form told the shell that STDERR (2) should be redirected
to (>) the special file /dev/null.  We could also do:

    grep bash /etc/* 2>/dev/null >bash_in_etc

You shouldn't see any output now, but take a look at the new file
`bash_in_etc`. If unspecified, output redirection applies to STDOUT.

We can also merge STDOUT and STDERR:

    grep bash /etc/* 2>&1

Here, we've specified the redirection target as `&1`, which means
"whatever file descriptor 1 points to".

To append, instead of overwriting, we can use ">>" instead of ">".

We can also redirect STDIN, by using "<":

    wc -l <bash_in_etc # This counts the number of lines in the file


Here Documents
--------------

There's a special form of input redirection, using "<<":

    wc <<EOFWC
    this
    is
    a
    test
    EOFWC

The string "EOFWC" is arbitrary, but "EOF<command>" is fairly common. We
can also pass a single string as STDIN with "<<<":

    wc <<<"this is a test"


Pipelines
---------

The Unix philosophy is that a given tool should do one thing, and
if you have to do multiple things, you should compose different
tools. Pipelines are the shell's way to do this. In short:

    foo | bar | baz

takes STDOUT from foo, redirects that to the STDIN of bar, and
redirects bar's STDOUT to baz's STDIN.

You should become comfortable with this pattern, because it is one
of the keys to creating powerful scripts. We can also combine with
other things we've seen:

    echo "grep produced $(grep bash /etc/* 2>&1 >/dev/null | wc -l) errors"


Control Flow
------------

The simplest form of control flow uses boolean operators to combine
commands

| *Combination* | *Execution*                                      |
| ------------- | -----------                                      |
| `foo ; bar`   | Execute `foo`, then execute `bar`                |
| `foo && bar`  | Execute `foo`; if successful execute `bar`       |
| `foo || bar`  | Execute `foo`; if *not* successful execute `bar` |

The return status is always the status of the last command executed.

Bash has an if/then/elif/else/fi construction. The minimal version is
if/then/fi, as in:

    if $foo
    then
        echo "foo"
    fi

The full form would be:

    if $foo
    then
        echo "foo"
    elif $bar
    then
        echo "bar"
    else
        echo "baz"
    fi

The if statement uses command return codes, so you can put a command
in the test, or use the `test` command (usually written `[`):
    grep foo /etc/hosts
    have_foo=$?
    grep localhost /etc/hosts
    have_local=$?
    if [ 0 -eq ${have_foo} ]
    then
        echo "We have foo"
    elif [ 0 -eq ${have_local} ]
    then
        echo "We have localhost"
    fi

See the `test` manpage for details; there are many tests you can
perform, and the manpage is fairly compact.

We can also construct loops in bash, as we've already seen briefly.
There are "for" loops and "while" loops, and they behave as you'd
expect. Both have the format:

    <for or while>
    do
        # ...
    done

A "for" statement looks like

    for loop_var in <sequence>

Sequence can be something like "$*", or "$a $b $c", or $(ls /etc).
If you want to iterate over numbers, you can do something like

    for loop_var in $(seq 10)
    do
        echo "foo${loop_var}"
    done

The `while` statement takes a conditional, much like `if`. We can loop
indefinitely with it:

    while true
    do
        # ...
        if $condition
        then
            break
        fi
    done

Finally, bash has a `case` statement:

    case ${switch_var} in
        foo) echo "foo";;
        bar|baz) echo "bar"; echo "baz";;
        *) echo "default";;
    esac

Let's combine this for command-line argument parsing:

    a="foo"
    b="bar"
    c=""
    while [ $# -gt 0 ]
    do
        case $1 in
          -a) shift; a=$1; shift;;
          -b) shift; b=$1; shift;;
          *) c="$c $1"; shift;;
        esac
    done

Functions
---------

Defining a function:

    function my_func {
        local a=$1
        echo $a
    }

Functions begin with the `function` keyword, then a name, and then
the body of the function, in curled braces. The `local` keywords
defines a variable in the function's scope. If not used, the variable
will be defined in the global scope, and hence visible outside of
the function. Positional parameters are redefined for the function's
scope.

Once defined, the function behaves like any other command:

    my_func "hello"

You can define particularly useful functions in your ~/.bashrc,
which will be executed (using .) whenever you start a new shell.


Aliases
-------

Common one-liners are often nice to put into simple aliases, which
are defined like:

    alias ls='ls -FC'

Here are some aliases I find useful:

    alias ls='ls -FC'
    alias la='ls -A'
    alias ll='ls -l'
    alias ltr='ls -ltr'
    alias lsd='ls -lsd'

These are defined in my ~/.bashrc. There's also a file called either
~/.bash_profile or ~/.profile, which is only run for a login shell
(that is, only once when you first log in). When you modify one of
these files, make sure you re-dot them

    . ~/.bashrc
    . ~/.bash_profile
    . ~/.profile


Scripting
---------

Some principles I live by:

 1. If I'm going to do something more than once, I script it.
 2. If I do something once, there's a good chance I'm going to have
    to do it again.

Most shells, bash included, treat executable text files specially.
Given no other information, they run them as scripts for the current
shell. *Do not assume that file extensions mean anything.* At the
very least, bash does not care about the name of your file, it only
cares about the content. Naming a file `foo.py` does not mean bash
will treat it as a python file, for instance.

To run the script correctly, there's an easy way and a hard way.
The hard way is to call the appropriate interpreter explicitly:

    bash my_shell_script.sh
    python my_python_script.py

The easy way is to use a convention called *shebang* (short for
"hash-bang"). The shell will look at the first line of an executable
ASCII file. If that line begins with a shebang, the arguments to
that provide the program with which to run the script:

    #! /bin/bash

You can even provide options:

    #! /bin/bash -x

For python, there's a better way to call it:

    #! /usr/bin/env python

What does this do? We're not actually invoking python directly.
Instead, we invoke env, which passes the parent environment. In
particular, this means it's also using the parent shell's $PATH
variable to determine how to find python. This has a couple of
advantages:

 * The location of python may vary from installation to installation,
   but env's location is always predictable.
 * If you use python virtual environments, this will pick up your
   virtualenv python.

As a final note, make sure your scripts are executable! See "chmod"
in the filesystems quick-ref for details, but 99 times out of 100,
you will want to run `chmod a+x` on your script. git keeps track
of file permissions in addition to the contents.