This article is part number 2 of the Shell readability series.


The shell supports defining functions, which, as we learned in the previous post, you should embrace and use. Unfortunately, they are fairly primitive and their use can, paradoxically, introduce other readability problems.

One specific problem is that function parameters are numbered, not named, so the risk of cryptic code is high. Let’s see why this is a problem.

Tell me what this function does and how to invoke it:

fetch() {
    local rflag=
    [ -z "${3}" ] || rflag="-r${3}"
    mkdir -p "${4}"
    ( cd "${4}" && cvs -d"${1}" -q checkout -P ${rflag} "${2}" )
}

As the fetch function name says, it fetches something, and as the content says, it does so from CVS. All good, right? But how do you invoke this function? With some effort, you can see that the function takes up to four parameters, but what is each one meant to be? Will you get them in the right order?

Sadly, the style—or lack thereof—above plagues all shell code out there, giving the shell a worse reputation than it already has.

You could slightly improve the snippet above with a comment that documented each parameter, but the problem is that comments easily get out of sync. You are better off avoiding them. (Docstrings are a different story which I may save for another day, but there is no such thing as docstrings for the shell.)

A nicer solution is to make the parameter names part of the code. Take a look at this idiom:

fetch() {
    local cvsroot="${1}"; shift
    local module="${1}"; shift
    local tag="${1}"; shift
    local dir="${1}"; shift

    local rflag=
    [ -z "${tag}" ] || rflag="-r${tag}"
    mkdir -p "${dir}"
    ( cd "${dir}" && cvs -d"${cvsroot}" -q checkout -P ${rflag} "${module}" )
}

The four local variable definitions at the beginning of the function exist purely to assign names to the parameters. With that trick, the function’s prototype is obvious: you immediately know there are four parameters, you know in which order to pass them, and you can reasonably guess their purpose based on their names. Just like in a “real” programming language.

A few caveats though, because every time I suggest this idiom in a code review, people “tweak” it in harmful ways:

  • Use shift instead of numbering the parameters. The reason is two-fold. First, using shift combined with set -e makes function calls fail if they are not given enough parameters. And second, the order of the statements corresponds exactly to the order in which parameters are expected: there is no room for mistake. A potential code edit that reorders those lines would cause the code to fail instead of becoming more confusing but continuing to work.

  • Keep the shift call on the same line as the variable definition. Avoid the temptation to break them apart. The shift calls on their own lines introduce unnecessary vertical noise. Due to the noise, it becomes more tempting to avoid the shift and number the parameters, which I already said wasn’t a good idea.

  • Use ${@} to refer to an unknown number of parameters. The only safe way to handle a variable number of parameters is to expand them via "${@}"—but assigning that expansion to a variable isn’t possible. In particular, do not write local args="${*}": just keep those arguments unnamed and use "${@}" where necessary to refer to them. (Sure, if you already depend on Bash, you can use an unportable array to store these… but think twice before adding a Bash dependency.)

The code above is a simplified and edited version of the shtk_cvs_checkout function. shtk uses this idiom throughout.