Speaking UNIX, Part 13

Ten more command-line concoctions

Discover more shortcuts and power at the UNIX command line

Content series:

This content is part # of # in the series: Speaking UNIX, Part 13

Stay tuned for additional content in this series.

This content is part of the series:Speaking UNIX, Part 13

Stay tuned for additional content in this series.

This is the thirteenth installment of the "Speaking UNIX" series: an ominous number, I thought, until I scanned the Internet for the origin of the number's nefariousness. As it turns out, thirteen is "good" and "bad" in roughly equal proportions (see Related topics).

The good: Thirteen is the atomic number of aluminum, the container of choice for countless libations; basketball pro Wilt Chamberlain wore number thirteen (and you all know how "lucky" Wilt was); and in a kind of taboo transform, thirteen is the seventh prime number, and seven is very lucky.

The bad: There are (evidently) thirteen steps to the gallows; party crashers Loki and Judas were thirteenth to arrive; and no matter how you cut it—by two, three, four, or six—a table of thirteen is going to be hard to seat in a restaurant, which might explain why Loki and Judas are remembered as outsiders.

At best, the jury is hung on thirteen. So, unless you're reading this on the Friday the 13th, on the thirteenth floor of an office building erected at 1313 Mockingbird Lane (a spot of land with its own history), it's time to celebrate. "Speaking UNIX" is now a pimply-faced teenager. Here are ten command-line concoctions and shell snookers to celebrate its passage into puberty. Mazel tov!

Set an environment variable temporarily

Environment variables, such as EDITOR and TZ, influence the results of commands. (The former chooses what program to launch to edit text; the latter specifies your time zone.) You typically set environment variables in your shell startup files to affect your shell session as a whole, and you can change the value for a shell session at any time with a command like export TZ=GMT.

Additionally, you can temporarily alter the value of an environment variable for a single command. Simply set the environment variable at the start of the command line and the command you want to run. For example, to change your preferred editor for a single command, preface it with EDITOR=editor, as in:

$ printenv
$ EDITOR="pico" less bigfile

This combination pages bigfile in less. If you type v in less to edit the file, pico is launched instead of vi. Here's another practical use:

$ date
Sun Aug  5 16:14:17 EDT 2007
$ TZ="Japan" date
Mon Aug  6 05:14:06 JST 2007

The temporary change to TZ affects how the immediate instance of date interprets the current date and time of the system.

Discover what you're really running

A great number of shell features affect how the command name you type is interpreted. Each shell has an assortment of built-in commands; the PATH environment variable specifies the list and order of directories to search; and each alias acts as shorthand. With so many ways to run a program, how do you know what you're actually executing? Use the built-in type command of the shell to reveal the truth.

Say that you have these shell settings:

alias vi=pico

You can find copies of Perl in both /usr/bin and /usr/local/bin. To find which Perl you're using, type type perl.

$ perl -v 
This is perl, v5.8.7 built for darwin-2level
$ type perl
perl is /use/local/bin/perl
$ type -a perl
perl is /usr/local/bin/perl
perl is /usr/bin/perl
$ type -a -w perl
perl: command
perl: command

The type perl command reveals how the perl command is interpreted on the command line. Here, /usr/local/bin/perl is the expansion. The type -a command reveals all instances of Perl that the shell is aware of, which depends largely on the PATH variable.

Try type with some other commands you typically use:

$ type -a vi
vi is an alias for pico
vi is /usr/bin/vi
$ type -a cd
cd is a shell builtin
cd is /usr/bin/cd

The type command reveals that vi is actually an alias for pico. The type command also shows that cd is a built-in command and is duplicated externally as /usr/bin/cd.

Make find more portable

You've seen many, many uses of find over the past year, but I omitted one option that makes the find command lines portable to other operating systems.

By convention, it's unusual to find file names with spaces on a UNIX® system. However, lengthier, more descriptive file names are common in Mac OS X and Microsoft® Windows®, and they are becoming more commonplace on UNIX, as the operating system accumulates more desktop features. After all, saving a report as 2007 Business Plan is much more obvious that

The find command enumerates long file names with embedded special characters but, if you want to combine find with another command, it's safest to separate the individual file names in the list with a NUL character rather than a space. Let's see the difference.

Let's say you have three folders, each with one or more spaces in the name:

$ ls -1
Business Plan 2007
Expense Report
Pictures from Spain

If you run find on such a batch of files and pass the list of results to xargs, the spaces in the file names cause errors:

$ find . -type f -print | xargs ls -1
ls: ./Business: No such file or directory
ls: ./Expense: No such file or directory
ls: ./Pictures: No such file or directory
ls: 2007: No such file or directory
ls: Plan: No such file or directory
ls: Report: No such file or directory
ls: Spain: No such file or directory
ls: from: No such file or directory

The result passed to xargs is the single string . ./Business Plan 2007 ./Expense Report ./Pictures from Spain. By default, xargs delimits input with a space (or newline) to produce a list of files to operate on. Here, because the file names embed spaces, the rule produces the wrong list, as evidenced above.

The proper, portable technique is to use find -print0, combined with xargs -0, to delimit file names with the NUL character. Here's the favored approach:

$ find . -type f -print0 | xargs -0 ls -1
./Business Plan 2007
./Expense Report
./Pictures from Spain

By the way, if you want to preview the commands that xargs produces, add the option -p or -t. The -p option displays each fabricated command and prompts you for authorization. Type upper- or lowercase y to run the command and anything else to reject it. The -t option echoes each command to stderr before each command is executed.

Have even more fun with find

While find is infinitely useful, it has two implicit settings that limit its results (and might lead you to scratch your head): -name matching is case-sensitive and file system traversals do not follow symbolic links.

Hence, a command that begins find -name '*plan*' omits files with the string Plan anywhere in the name, and it fails to catalog your music when your home directory has a symbolic link named music that points to your terabyte-scale media store mounted on /media/music.

You can override case-sensitive matches with -iname, and you can traverse symbolic links with -follow. Here's an example that applies both options:

$ alias ls='ls -aF'
$ ls -1 
$ find . -name '*music*' -type f -print 
$ find . -iname '*music*' -type f -print 
$ find . -name '*music*' -type f -follow -print 
$ find . -iname '*music*' -type f -follow -print 
./tunes/Muse/Origin Of Symmetry/04 Hyper Music.m4a
./tunes/Radiohead/OK Computer/04 Exit Music (For A Film).mp3

As indicated by the @ sign annotation produced by the -F option, tunes is a symbolic link. To find all songs with any variant of the string "music" in it, you must use -iname *music*. To traverse into the hierarchy that tunes points to, you must use -follow.

To make find even more portable and akin to the search features of Spotlight, say, use -print0 -follow -iname pattern.

Collect the output of many commands the easy way

You can easily capture the output of a command line by using the > output and >> output modifiers, where the former creates or overwrites the file output and the latter appends to output. You can combine either modifier to generate a transcript of a series of commands, which is useful if you're trying to snapshot system state, for example:

$ ps > state.`date '+%F'`
$ w >> state.`date '+%F'`

The back tick or back quote operator (``) expands commands in place. A command between back ticks runs as the shell interprets the command line, and the output of the command is used in the final expansion. Here, the single quotation marks around the argument keep it intact, preventing the shell from interpreting + and %.

After the two commands, the file state.YYYY-MM-DD, such as state.2007-08-05, is created with contents similar to this:

  PID TTY          TIME CMD
 9997 pts/1    00:00:00 zsh
10351 pts/1    00:00:00 ps

 17:56:04 up 21 days,  2:53,  2 users,  load average: 0.89, 0.94, 0.91
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
adamgood pts/0    c-67-169-182-255 Sat17    0.00s  0.37s  0.36s pine
mstreich pts/1    cpe-071-065-224- 17:17    0.00s  0.01s  0.00s w

Typing the back tick operation each time is a hassle, though. You could replace the sequence with this:

$ file=state.`date '+%F'`
$ ps > $file
$ w >> $file

But that's only a little more efficient and still error prone, because it's rather easy to use > instead of >> in the second or subsequent command. The easiest way to capture the output of a series of commands is to combine them within braces ({ }).

$ { ps; w } > state.`date '+%F'`

The ps command runs (listing the user's current processes), followed by w (which shows who is using the machine), and the collected output is captured in a file.

Note: You can also embed a sequence of commands in parentheses to achieve the same result; however, there is one important difference. The series of commands collected in parentheses runs in a subshell and does not affect the state of the current shell.

For example, you might expect the sequence:

$ { cd $HOME; ls
      -1}; pwd

to produce the same output as:

$ (cd $HOME; ls); pwd

The commands in braces change the working directory of the current shell. The latter technique is inert. Whether to use a combination or a subshell depends on your intentions—although the subshell is much more powerful, as described next.

Subshells to the rescue!

While it's common to run a subshell to pipe aggregated output to a single command, you can also use a subshell to expand a command in place, just like back ticks. Better yet, a subshell can contain another subshell, so expansions can be nested, too.

Let's start simply.

$ {ps; w} > state.$(date '+%F')

This command is identical to { ps; w } > state.`date '+%F'`. The $( ) notation runs the commands within the parentheses, and then replaces itself with the output. In other words, $() expands in place, just like back ticks. However, unlike back ticks, $( ) can be very complex and can even include other $( ) expansions. Here are some examples:

$ (cd $(grep strike /etc/passwd | cut -f6 -d':'); ls)

This command searches the password file for the entry for user strike, clips the home directory (field six in the password file, if you count from zero) field, changes to that directory, and lists its contents. The output of grep /etc/passwd strike | cut -f6 -d':' is expanded in place before any other operation.

Here's another example, this time with the user name taken from the environment using whoami:

(cd $(grep $(whoami) /etc/passwd | cut -f6 -d':'); ls)

Because the subshell has so many uses, you might prefer to use it always instead of a combination or the back tick operators.

Stop typing long path names

Features, such as the PATH and MANPATH environment variables, conserve typing. Both variables define a series of directories to search for executables and man pages, respectively.

The shell supports another search path: CDPATH. As its name implies, CDPATH enumerates a list of directories to search for a named directory. Let's see how it works.

Assume that you have three directories—tomb, current, and personal—in your home directory. The tomb directory contains old work projects; current contains things you actively work on; and personal contains files and the like for your interests. Performing ls -R tomb current personal reveals something like this:

$ ls -R tomb current personal
./        ../       einstein/ herbie/

./       ../      fishing/ novel/

./       ../      mariner/ marvin/  voyager/

Given this structure, and without CDPATH, changing to any directory requires that you remember where a folder is located and type its fully qualified (or relative) path name:

$ cd ~/tomb/mariner
$ cd ~/personal/novel
$ cd ~/current/einstein

To simplify this work, set CDPATH to the list of directories you'd like to search for a named directory:

$ export CDPATH=.:~/:..:../..:

This is the minimum setting for CDPATH. It searches, in order, the current directory (., or "dot"), your home directory (~/), the parent directory (.., or "dot dot"), and the grandparent directory (../..). The minimum setting tends to prefer local directories and relatively close directories.

With this CDPATH set, you can quickly change to any of your topmost directories:

$ pwd
$ cd current
$ cd personal/fishing
$ cd novel
$ cd /tmp
$ cd personal/novel
$ cd /tmp
$ cd novel
cd: no such file or directory: novel

In each but the last cd command, the argument matched a directory found in the CDPATH. However, because the personal directory is not yet in the CDPATH, it cannot be found (if you're outside a relative path).

If you want to search the personal directory and the other two directories, add them after the last colon or in whatever order you prefer to search. Add the three directories, assuming that the previous export command is in your shell startup file:

$ export CDPATH=$CDPATH:~/current:~/tomb:~/personal

Now, you can simply type the name of the directory you want to switch to:

$ cd current
$ cd /tmp
$ cd einstein
$ cd fishing
$ cd personal/novel

As with PATH and MANPATH, if more than one entry in the CDPATH contains a match, searching ends at the first match. For example, if you add a directory named novel to tomb, a cd novel command yields ~/tomb/novel.

$ mkdir ~/tomb/novel
$ cd /tmp
$ cd novel
$ cd personal/novel

CDPATH works best when its entries contain unique directory names. Otherwise, type enough of the path to differentiate, as was done with personal/novel.

Make less work more

You've seen many, many examples of how extensively text files are used in a UNIX system. Most system startup files are text files, as are shell scripts, configuration files and, of course, data files. In addition to a text editor, the next most useful utility is a pager, or an application that lets you browse text files page by page.

The application less is one of the most popular pagers, and it offers a raft of options to tweak its behavior. In fact, you can set the LESS environment variable to a list of options to control how less works by default. Here's a collection of useful options:

export LESS="-Nmsx4"
  • -N displays line numbers.
  • -m displays the current position in the file as a percentage.
  • -s "squeezes," or reduces, multiple blank lines into a single blank line.
  • -x4 sets a tab stop every four spaces.

Spend some time with the less man page to find the options most helpful to you.

Read a file from bottom to top

Many files on a UNIX system grow and grow until truncated or archived. For instance, most important system processes, such as e-mail transport and remote access, continuously log activity, appending each new entry to the end of the file. And it's the end of the log file that's most interesting. If a service crashes, the events that occurred at the very end provide the most clues.

There are two ways to display the lines in a file in reverse order: tac (the reverse of cat) and the command tail -r.

$ cat smallfile
$ tac smallfile
$ tail -r smallfile

You might find tac more practical, because it emits the entire file, unlike tail, which truncates the output to some number of lines. For instance, you can combine tac and less to create an alias that pages files in reverse:

$ alias rless="LESSOPEN='|tac %s' less"
$ rless smallfile

The rless alias temporarily sets LESSOPEN, an environment variable specific to less, to |tac %s. This forces each file (the %s is a placeholder for the file name) to be pre-processed (hence the pipe, |) by tac.

Here's another variation of the same trick, but one that leverages perl instead of tac, which might not be available on your system:

LESSOPEN="|perl -e 'print reverse (<>)' %s" less small

The line of perl says, "Read all input lines into an anonymous array ((<>), reverse the order of the elements, and print the new array."

Do new math

If you need to calculate a result, there's no need to jump to a new application. You can stay comfortably at the command line. You can use dc, a reverse-polish notation calculator, or bc, an entire scripting language for math. Or, if you just need an answer fast, use the command line and the $(( )) operator.

$ echo $(( 100 / 10 ))
$ echo $(( 10 ** 2 ))

The shell doesn't have a large collection of arithmetic operators, but it's sufficient for most programming tasks, including bitwise shifts, remainders, and comparisons.

Plenty of room to grow

"Speaking UNIX" might be thirteen, but there's still a lot to be experienced. There are more commands and tricks to learn, a vast array of concepts to explore, not to mention an enormous universe of open source software to boost your productivity.

Oh, and lest I forget, the braces have to come off. There's the ritual hazing by upperclassmen, some really embarrassing moments, and going steady. Or perhaps I'm showing my age . . . kids still go steady, right?

Thanks for reading. I hope you've enjoyed the column so far.

Downloadable resources

Related topics

Zone=AIX and UNIX
ArticleTitle=Speaking UNIX, Part 13: Ten more command-line concoctions