Speaking UNIX, Part 6: Automate, automate, automate!

Mechanize personal and system chores with shell scripts

Discover how shell scripts can mechanize virtually any personal or system task. Scripts can monitor, archive, update, report, upload, and download. Indeed, no job is too small or too great for a script. Here's an introduction.

Share:

Martin Streicher (martin.streicher@gmail.com), Chief Technology Officer, McClatchy Interactive

Photo of Martin StreicherMartin Streicher is a freelance Ruby on Rails developer and the former Editor-in-Chief of Linux Magazine. Martin holds a Masters of Science degree in computer science from Purdue University and has programmed UNIX-like systems since 1986. He collects art and toys. You can reach Martin at martin.streicher@gmail.com.



03 January 2007

Also available in Chinese Russian

If you peer over a longtime UNIX® user's shoulder while he or she works, you might be mesmerized by the strange incantations being murmured at the command line. If you've read any of the previous articles in the Speaking UNIX series (see Resources), at least some of the mystical runes being typed -- such as tilde (~), pipe (|), variables, and redirection (< and >) -- will look familiar. You might also recognize certain UNIX command names and combinations, or realize when an alias is being used as a sorcerer's shorthand.

Still, other command-line conjurations might elude you, because it's typical for an experienced UNIX user to amass a large arsenal of small, highly specialized spells in the form of shell scripts to simplify or automate oft-repeated tasks. Rather than type and re-type a (potentially) complex series of commands to accomplish a chore, a shell script mechanizes the work.

In Part 6 of the Speaking UNIX series (see Resources), you'll learn how to write shell scripts and more command-line tricks.

Ben, just one word: "automation"

Some shell scripts run exactly the same commands, processing the same set of files time and again. For instance, a Z shell script to propagate the entire contents of your home directory to three remote computers could be as simple as Listing 1.

Listing 1. A simple shell script to synchronize your home directory across many remote machines
#! /bin/zsh

for each machine (groucho chico harpo)
    rsync -e ssh --times --perms --recursive --delete $HOME $machine:
end

To use Listing 1 as a shell script, save the contents above to a file -- say, simpleprop.zsh -- and run chmod +x simpleprop.zsh to make the file executable. You can the run the script by typing ./simpleprop.zsh.

If you'd like to see how Z shell expands each command, add the -x option to the end of the #! (the octothorp-exclamation pair is commonly referred to as shuh-bang) line of the script, like so:

#! /bin/zsh -x

For each computer, groucho, chico, and harpo, the script runs the rsync command, replacing $HOME with your home directory (for example, /home/joe) and $machine with a computer name.

As demonstrated in Listing 1, variables and script control structures, such as loops, make scripts easier to write and simpler to maintain. If you'd like to include a fourth computer, such as zeppo, to your pool, simply add it to the list. If you must change the rsync command, say, to add another option, there's only one instance to edit. As in traditional programming, you should strive to avoid cut-and-paste in shell scripts, too.


Making a good argument

Other shell scripts require arguments, or a dynamic list of things -- files, directories, computer names -- to process. As an example, consider Listing 2, a variation of the previous example that allows you to use the command line to name the computers you'd like to synchronize to.

Listing 2. A variation of Listing 1 that allows you to name which computers to process
#! /bin/zsh

for each machine
    rsync -e ssh --times --perms --recursive --delete $HOME $machine:
end

Assuming that you save Listing 2 in a file called synch.zsh, you'd invoke the script as zsh synch.zsh moe larry curly to copy your home directory to the computers moe, larry, and curly.

The missing list on the foreach line isn't a typo: If you omit a list, the foreach structure processes the list of arguments given on the command line. Command-line arguments are also called positional parameters, because the position of an argument on the command line is usually semantically important.

As an example, Listing 2 can leverage the existence or non-existence of positional parameters to provide a helpful usage message if you specify no arguments. The enhanced script is shown in Listing 3.

Listing 3. Many scripts provide helpful messages if no arguments are provided
#! /bin/zsh

if [[ -z $1 || $1 == "--help" ]] 
then
    echo "usage: $0 machine [machine ...]
fi

foreach machine
    rsync -e ssh --times --perms --recursive --delete $HOME $machine:
end

Each space-delimited string on the command line becomes a positional parameter, including the name of the script being invoked. Hence, the command synch.zsh has only one positional parameter, $0. The synch.zsh --help command has two: $0 and $1, where $1 is the string --help.

So, Listing 3 says, "If the first positional parameter is empty (the -z operator tests for an empty string) or (denoted by ||) if the first parameter is equal to '--help', then print a usage message." (If you start writing scripts, consider providing a usage message in each one as a hint. It reminds others -- and even you, if you forget -- how to use the script.)

The phrase [[ -z $1 || $1 == "--help" ]] is the condition of the if statement, but you can also use the same conditional as a command and combine it with other commands to control flow through your script. Take a look at Listing 4. It enumerates all the executable commands in your $PATH, and it uses conditions in combination with other commands to perform suitable work.

Listing 4. List the commands in your $PATH
#! /bin/zsh

directories=(`echo $PATH | column -s ':' -t`) 

for directory in $directories
do
  [[ -d $directory ]] || continue
  
  pushd "$directory"
  
  for file in *
  do
      [[ -x $file && ! -d $file ]] || continue
      echo $file
  done
  
  popd
done | sort | uniq

There's quite a bit going on in the script, so let's break it down into pieces:

  1. The first actual line of the script -- directories=(`echo $PATH | column -s ':' -t`) -- creates an array of named directories. You create an array in zsh by placing parentheses around your arguments, as in directories=(...). In this case, the elements of the array are generated by splitting $PATH at each colon (column -s ':') to yield a space-delimited list (the -t argument of column) of directories.
  2. For each directory in the list, the script attempts to enumerate the executable files in the directory. Steps 3 through 6 describe the process.
  3. The [[ -d $directory ]] || continue line is an example of a so-called short-circuiting command. A short-circuiting command terminates "as soon as" its logical conditions yield a definitive result.

    For instance, the [[ -d $directory ]] || continue phrase uses a logical OR (||) -- it executes the first command and executes the second command if -- and only if -- the first command fails. So, if the entry in $directory exists and is a directory (the -d operator), the test succeeds, evaluation ends, and the continue command, which skips processing of the current element, never executes.

    However, if the first test fails, the next condition of the logical or continue executes. (continue always succeeds, so it typically appears last in a short-circuiting command.)

    Short-circuiting based on logical AND (&&) executes the first command, and then executes the second command if, and only if, the first command succeeds.

  4. The pushd and accompanying popd are used to change to a new directory before processing and change to the previous directory after processing, respectively. Using the directory stack is a good scripting technique to maintain your place in the file system.
  5. The inner for loop enumerates all the files in the current working directory -- the wild card * (asterisk) matches everything -- and then tests whether each entry is a file. The line [[ -x $file && ! -d $file ]] || continue says, "If $file exists and is executable and isn't a directory, then process it; otherwise, continue."
  6. Finally, if all the former conditions are met, the name of the file is printed with echo.
  7. Did you catch the last line of the script? You can send the output of most control structures to another UNIX command -- after all, the shell treats the control structure as a command. Therefore, the output of the entire script is piped through sort, and then uniq to yield an alphabetized list of unique commands found in your $PATH.

If you save Listing 4 to an executable file named listcmds.zsh, the output might look like this:

$ ./listcmds.zsh
[
a2p
ab
ac
accept
accton
aclocal

A short-circuiting command is very useful in scripts. It combines a conditional and an operation in one. And because every UNIX command returns a status code reflecting success or failure, you can use any command as a conditional -- not just the test operators. By convention, UNIX commands return zero (0) for success and non-zero for failure, where the non-zero value reflects the kind of error that occurred.

For example, pushd and popd could have been eliminated from Listing 4 if the line [[ -d $directory ]] || continue was replaced with cd $directory || continue. If the cd command succeeds, it returns 0 and evaluation of the logical OR can end immediately. However, if cd fails, it returns non-zero, evaluation proceeds, and continue executes.


Don't remove. Archive!

Modern UNIX shells -- bash, ksh, zsh -- offer many control structures and operations to create complex scripts. Because you can call upon all the UNIX commands to massage data from one form to another, shell scripting is nearly as rich as programming in a complete language, such as C or Perl.

You can use scripts to mechanize virtually any personal or system task. Scripts can monitor, archive, update, upload, download, and transform data. A script can be a single line or an enormous subsystem. No job is too small or too great (almost) for a shell script. Indeed, if you look at your /etc/init.d directory, you'll find a variety of shell scripts that launch services each time you start your computer. If you create a very useful script, you can even deploy it as a system-wide utility. Just drop it into a directory on $PATH of users.

Let's create a utility to exercise your newfound mojo. The script, myrm, is a replacement for the system's own rm. Rather than deleting a file outright, myrm copies the file to an archive, names it uniquely so you can find it later, and then removes the original file. The myrm script is functional but simple, and you can add many bells and whistles. You can also write an extensive unrm ("un-remove") script as a companion. (You can search the Internet to find a variety of implementations.)

The myrm script is shown in Listing 5.

Listing 5. A simple utility to back up a file before it's removed from the file system
#! /bin/zsh

backupdir=$HOME/.tomb
systemrm=/bin/rm

if [[ -z $1 || $1 == "--help" ]]
then
  exec $systemrm
fi

if [[ ! -d $backupdir ]]
then
  mkdir -m 0700 $backupdir || echo "$0: Cannot create $backupdir"; exit
fi

args$=$( getopt dfiPRrvw $* ) || exec $systemrm

count=0
flags = ""
foreach argument in $args
do
  case $argument in
    --) break;
        ;;

     *) flags="$flags $argument";
        (( count=$count + 1 ));
        ;;
  esac
done
shift $(( $count ))

for file
do
  [[ -e $file ]] || continue
  copyfile=$backupdir/$(basename $file).$(date "+%m.%d.%y.%H.%M.%S")
  /bin/cp -R $file $copyfile
done

exec $systemrm $=flags "$@"

You should find the shell script readable, although there are a few new things that haven't been discussed before. Let's cover those, and then review the entire script.

  1. When a shell launches a command, such as cp or ls, it spawns a new process for the command, and then waits for the (sub)process to finish before proceeding. The exec command also launches a command, but instead of spawning a new process, exec "replaces" the task of the current process -- that is, the shell (or script) process -- with the new command. In other words, exec reuses the same process to start a new task. In the context of the script, an exec immediately "terminates" the script and starts the specified task.
  2. The getopt UNIX utility scans the positional parameters for the named arguments you specify. Here, the dfiPRrvw list looks for -d, -f, -i, -P, -R, -r, -v, and -w. If another option appears, getopt fails. Otherwise, getopt returns a string of the options ending with the special string, --.
  3. The shift command removes positional parameters from left to right. For example, if the command line were myrm, -r -f -P file1 file2 file3, shift 3 would remove $0, $1, and $2 or -r, -f, and -P, respectively. file1, file2, and file3 are renumbered as the new $0, $1, and $2.
  4. The case statement works like its counterparts in traditional programming languages: It compares its argument to each pattern in a list; when a match is found, the corresponding code executes. Much like in the shell, * matches anything and can be used as the default action if no other match is found.
  5. The sigil, $@, expands to all the (remaining) positional parameters.
  6. The zsh operator, $=, splits words at whitespace boundaries. $= is useful when you have a long string and want to split the string into individual arguments. For instance, if the variable x contains the string '-r -f' -- which is one word with five characters -- $=x becomes two separate words, -r and -f.

Given those illuminations, you should now be able to dissect the script fully. Let's look at the code in blocks:

  • The first block sets variables that are used throughout the script.
  • The next block should look familiar: It prints a usage message if no arguments are provided. Why does it exec the real rm utility? If you name this script "rm" and place it earlier in your $PATH, it can act as a surrogate for /bin/rm. A bad option to the script is also a bad option to /bin/rm, so the script lets /bin/rm provide the usage message.
  • The next block creates the backup directory if it does not exist. If the mkdir fails, the script dies with an appropriate error message.
  • The next block finds the dash arguments in the list of positional arguments. If getopt succeeds, $args has a list of options. If getopt fails, which occurs when it doesn't recognize an option, it prints an error message, and the script exits with a usage message.
  • The following block captures all the options intended for rm in a string. Accumulation stops when the special getopt option, --, is encountered. shift removes all the processed arguments from the argument list, leaving the list of files and directories to process.
  • The block that begins for file is where each file or directory is copied for safekeeping in your personal "tomb." Each file's directory is copied verbatim (-R) to the tomb, and it is suffixed with the current date and time to make sure the copy is unique and does not clobber a previous archived entry that shares the same name.
  • Finally, the file or directory is removed using the same command-line options passed to the script.

However, if you happen to need the file or directory you just deleted (by accident?), you can look in your archive for a pristine copy!


Go forth and automate

The more you work with UNIX, the more likely you are to create scripts. A script saves the time and energy required to retype complex and long sequences of commands, preventing mistakes, too. The Web is full of helpful scripts that others have created for many purposes. Soon, you'll be posting your own incantations as well.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=186706
ArticleTitle=Speaking UNIX, Part 6: Automate, automate, automate!
publish-date=01032007