The problem

The code examples below make my eyes bleed, and if you don’t see it, read on – or stop writing Bash.

for f in $(ls *.mp3); do
    rm -rf "$f"
done
for f in $(find -name '*.mp3'); do
    rm -rf $f
done

We will go over what’s wrong with such code, but before, we need to understand Bash a little better.

How bash executes commands

The argument vector and execve

At its core, Bash is simply a command executor. You tell Bash the name of a command and give it a few arguments. A simplified form of a simple command is:

command argument argument argument

Bash’s job is to take this information and transform it into the execve system call, which has the following signature:

#include <unistd.h>
int execve(const char *pathname, char *const argv[], char *const envp[]);

Which in practice looks like this when Bash calls it (super simplified!):

execve("/path/to/command", ["argument", "argument", "argument"], env)

So, for example, when you run this command:

cat file1 file2 file3

Internally, Bash runs this C function:

execve("/bin/cat", ["file1", "file2", "file3"], ...)

Shell expansions

Each and every word that you give to Bash might be expanded and modified. By “word” I mean the command name and the arguments!

Consider the following:

filename=myfile.txt
cat $filename  # Parameter expansion

In this case, Bash will perform parameter expansion before invoking the system call. This means that Bash, quite literally, replaces $filename by myfile.txt, so that the system call looks like this:

execve("/bin/cat", ["myfile.txt"], ...)

In fact, before executing the system call, Bash might manipulate the argument array in 8 ways! These are called shell expansions:

#Expansion typeExample
1Brace Expansiona{b,c} → ab ac
2Tilde Expansion~/projects → /home/lucas/projects
3Shell Parameter Expansion$filename → myfile.txt
4Command Substitution$(echo hello) → hello
5Arithmetic Expansion$((1+1)) → 2
6Process Substitution<(ls) → /tmp/file
7Word SplittingRead on for examples.
8Filename Expansion*.mp3 → ["Stairway to heaven.mp3", "How deep is your love.mp3", "Levels.mp3"]

I’m sorry, but knowing the above list by heart is mandatory for every serious Bash programmer. But worry not, the most important thing to know is:

Word splitting is number 7

The shell treats each character of $IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators.

In other words, Bash splits each argument when it finds any of the IFS (Internal Field Separator) characters in it.

By default, the IFS consists of the usual “blank” characters":

  • Tabs
  • New lines
  • White spaces

Out of these, white spaces are by far the most common. They can, and they will appear in file names, paths, and in all sorts of unexpected places.

The fact that word split happens after 6 other substitutions is the cause of much suffering in this world.

Word splitting in action

Let’s say you wanted to check the permissions of a file using the ls -l command:

$ myfile='Stairway to heaven.mp3'
$ ls -l $myfile
ls: heaven.mp3: No such file or directory
ls: stairway: No such file or directory
ls: to: No such file or directory

That did not do what we wanted. What is going on?

Internally, the following expansions are performed to the argument $myfile:

["-l", "$myfile"]  // Initially parsed word
["-l", "Stairway to heaven.mp3"]  // After parameter expansion
["-l", "Stairway", "to", "heaven.mp3"]  // After word splitting

// And the (extremely simplified version of the) system call
execve("/bin/ls", ["-l", "Stairway", "to", "heave.mp3"]);

We asked Bash to run the ls command with the single positional argument $myfile, but instead, 3 arguments were passed. Unsuspectingly, we had white spaces in the value of the variable $myfile, and because parameter substitution happens before word splitting the substituted value got split into words.

Doing things the right way

Quoting for the win

More often than not we do not want word splitting to happen! Hence, you should get in the habit of double quoting your variables. Double quotes prevent word splitting altogether, while still allowing:

  1. Parameter expansion
  2. Command substitution
  3. Arithmetic expansion

i.e. the $ sign retains its special powers.

Let’s go back to the example where we wanted to use ls -l to check the permissions of a file. Using double quotes we get the correct result:

$ myfile='Stairway to heaven.mp3'
$ ls -l "$myfile"  # notice the quotes here
-rw-r--r--  1 lviana  lviana  0 Mar 13 13:42 Stairway to heaven.mp3

Globbing vs. word splitting

The one expansion that happens after word splitting is file name expansion. This is easy to remember. So this will always do the right thing, independently of whether or not file names have spaces.

$ ls -l *

The implication is deep. This will also work as god intended:

for f in *.mp3; do
    echo "$f"
done

And that, ladies and gentlemen, is the best way to iterate through the files in a directory.

No need to $(ls *.mp3), or $(find -name '*.mp3'). Those are wasteful because they need to create 2 new processes, search through the path, etc. All you need is the good old, builtin glob.

When is word splitting good?

The short answer is – almost never.

The longer answer involves a little historical background. The reason word splitting even exists is that arrays did not exist on the original Bourne Shell, and still don’t exist in strictly POSIX-compliant shells. In these shells, to iterate through things some trickery was needed, so they devised word splitting.

Nowadays, though, Bash is ubiquitous in Linux systems, and MacOS has Zsh by default, which actually does not apply word splitting to parameter expansions at all! And by the way, if you’re on MacOS, do yourself a favor and install Bash.

Talking about MacOS, it comes with the (utterly broken version of) getopt argument parsing utility. On its man page, you have this recommendation:

$ args=`getopt abo: $*`
...
$ set -- $args

And there you see the unquoted $args! This is exactly what makes the getopt utility so broken: it relies on word splitting.

So don’t use getopt (use getopts instead), and don’t rely on word splitting. Even if you need POSIX compliance.