Speaking UNIX
Peering into pipes
Track the progress of protracted operations with Pipe Viewer
Content series:
This content is part # of # in the series: Speaking UNIX
This content is part of the series:Speaking UNIX
Stay tuned for additional content in this series.
One of the cleverest and most powerful innovations in UNIX is the shell. It's more efficient than a GUI, and you can write scripts to automate many tasks. Better yet, the pipe operator assembles ad hoc programs right at the command line. The pipe chains commands in sequence, where the output of an earlier command becomes the input of a subsequent command.
But the pipe has one major detractor: It's something of a black box. If you string
commands together, the only evidence of progress is the output that the last command
in the series generates. Yes, you can interject tee
in the
sequence, and you can watch an output file grow with tail
,
but those solutions work best once, lest the standard output (stdout
)
and standard error (stderr
) of multiple phases commingle.
Further, both solutions are crude indicators and likely mask how much computation each
step requires.
Of course, you could deconstruct a complex sequence into multiple individual steps, each with its own interim output file. And indeed, if you want to verify results at each interval, decomposition is ideal. Write a script, produce one data file for each step, use a data file between each pair of steps as input, and collect the final file as the ultimate result. However, such a practice is not well suited to the impromptu nature of the command line.
What's needed is a progress meter that you can embed in the command line to measure throughput. Ideally, the meter could be repeated to benchmark each step—and because the sky's the limit, the tool would be open source and portable to multiple UNIX variants, such as Linux® and Mac OS X.
Well, wish no more: Pipe Viewer (pv
), written by systems
administrator Andrew Wood and enhanced by many other developers over the course
of the past four years, provides a peek into command-line "plumbing." As stated on its
project page,
pv
"can be inserted into [a] pipeline between two processes
to give a visual indication of how quickly data is passing through, how much time has
elapsed so far, and how near completion [it is]." Remarkably, you can also insert multiple
instances of pv
into the same command line to show relative
throughput.
This article shows you how to build pv
on a UNIX system and
apply it to simple and complex command-line combinations. Let's start, though, with a
review of how pipes connect processes.
UNIX pipes: Plumbing for processes
Figure 1 shows the steps for creating a pipe to connect two independent processes.
Figure 1. Creating a pipe to connect two processes

At the outset, Phase 1, the progenitor process reads from standard input
stdin
, writes output to stdout
,
and emits errors to stderr
. Each of
stdin
, stdout
, and
stderr
is a file descriptor, or a handle to a file.
Each operation on a file handle—open
,
read
, write
,
rewind
, truncate
, and
close
, for example—affects the state of the
file.
Next, in Phase 2, the progenitor creates a pipe. A pipe is composed of a queue and two file descriptors—one to enqueue data and the other to dequeue data. A pipe is a first-in-first out (FIFO) data structure.
By itself, a pipe has little use; its purpose is to connect a producer to a consumer. Hence, the progenitor forks, or creates, a second process in Phase 3 to act as a counterpart.
In Phase 4 (and assuming that the new process is the consumer), the original process replaces
its stdout
with the producer end of the pipe and rewires
the newly forked process to treat the consumer end of the pipe and its
stdin
. After these adjustments, each
write
by the original process (now the producer) is
enqueued and subsequently read by the new process (now the consumer).
Phases 1 through 4 mirror the process your shell uses to connect one utility to another
with the command-line pipe operator (|
), although
the shell spawns a new process for each utility and leaves itself untouched to
perform job control.
For example, Figure 2 shows how a find
,
grep
, and wc
command
might be connected via pipes to find and count all files with names that begin with
lowercase a. The shell remains independent; find
is a producer, grep
acts as a consumer (for
find
) and as a producer (for wc
).
wc
acts a consumer and producer, too: It consumes from
grep
and produces output to stdout
.
Typically, the shell connects stdout
to a terminal, but
redirection can reroute the output to a file.
Figure 2. Connecting commands using pipes

If you want to peer into two UNIX processes, the create two pipes and rewire the file
descriptors of each process to act both as a producer and a consumer.
Figure 3 shows an interprocess exchange that overrides both
processes' stdin
and stdout
.
Figure 3. Looking into two UNIX processes

Given that brief review, let's look at Pipe Viewer.
Pipe Viewer: Conspicuous conduit
Pipe Viewer is an open source application. You can download its source code and build the application from scratch or, if available, pull an existing binary from your UNIX distribution's repository.
To build from scratch, download the latest source tarball from the Pipe Viewer
project page (see Related topics). As of mid-September 2009,
the latest version of the code is 1.1.4. Unpack the tarball, change to the newly
created directory, and type ./configure
followed by
make
and sudo make install
.
By default, the build process installs the executable named pv
into /usr/local/bin. (For a list of configuration options, type
./configure --help
.) Listing 1 shows
the installation code.
Listing 1. Pipe Viewer installation code
$ wget http://pipeviewer.googlecode.com/files/pv-1.1.4.tar.bz2 $ tar xjf pv-1.1.4.tar.bz2 $ cd pv-1.1.4 $ ./configure $ make $ sudo make install $ which pv /usr/local/bin/pv
To pull the pv
binary from a repository, use your
distribution's package manager and search for either pv
or pipe viewer. For example, a search using Ubuntu version 9's APT
package manager yields this match:
$ apt-cache search part viewer pv - Shell pipeline element to meter data passing through
To continue, use your package manager to download and install the package.
For Ubuntu, the command is apt-get install
:
$ sudo apt-get install pv
Once installed, give pv
a try. The simplest use replaces
the traditional cat
utility with pv
to feed bytes to another program and measure overall throughput. For instance,
you can use pv
to monitor a lengthy compress operation:
$ ls -lh listings.txt -r--r--r-- 1 supergiantrobot staff 109M Sep 1 20:47 listings.txt $ pv listings.txt | gzip > listings.gz 96.1MB 0:00:09 [11.3MB/s] [=====================> ] 87% ETA 0:00:01
When the command launches, pv
posts a progress bar
and continually updates the gauge to show headway. From left to right, the typical
pv
display shows how much data has been processed
so far, the time elapsed, throughput in megabytes/second, a visual and numeric
representation of work complete, and an estimate of how much time remains.
In the display above, 96.1MB of 109MB has been processed, leaving about 13
percent of the file to go after 9 seconds of work.
By default, pv
renders all the status indicators for which
it is able to calculate values. For instance, if the input to pv
is not a file and no specific size is manually specified, the progress bar advances
from left to right to show activity, but it cannot measure the percent complete
without a baseline. Here's an example:
$ ssh faraway tar cf - projectx | pv --wait > projectx.tar Password: 4.34MB 0:00:07 [ 611kB/s] [ <=> ]
This example runs tar
on a remote machine and
sends the output of the remote command to the local system to create
projectx.tar. Because pv
cannot calculate the total
number of bytes to expect in the transfer, it shows throughput so far, time elapsed,
and a special indicator that reflects activity. The little "car"
(<=>
) travels left to right as long as data is
streaming through.
The --wait
option delays the rendering of the progress
meter(s) until the first byte is actually received. Here, --wait
is useful, because the ssh
command may prompt for
a password.
You can enable individual indicators at your discretion with eponymous flags:
$ ssh faraway tar cf - projectx | \ pv --wait --bytes > projectx.tar Password: 268kB
The latter command enables the running byte count with --bytes
.
The other options are --progress
,
--timer
, --eta
,
--rate
, and --numeric
.
If you specify one or more display options, all remaining (unnamed) indicators
are automatically disabled.
There is one other simple use of pv
. The
--rate-limit
option can throttle throughput. The
argument to this option is a number and a suffix, such as m
to indicate megabytes/second:
$ ssh faraway tar cf - projectx | \ pv --wait --quiet --rate-limit 1m > projectx.tar
The previous command hides all indicators (--quiet
)
and limits throughout to 1MB/s.
Advanced usage of Pipe Viewer
So far, the examples shown employ a single instance of Pipe Viewer as the producer
or consumer in a pair of commands. However, more complex combinations
are also possible. You can use pv
multiple times
in the same command line, with some provisos. Specifically, you must name
each instance of pv
using --name
,
and you must enable multiline mode with --cursor
.
Combined, the two options create a series of labeled indicators, one indicator per
named instance.
For example, imagine you want to monitor the progress of a data transfer and its
compression separately and simultaneously. You can assign one instance of
pv
to the former operation and another to the
latter, like so:
$ ssh faraway tar cf - projectx | pv --wait --name ssh | \ gzip | pv --wait --name gzip > projectx.tgz
After you type a password, the Pipe Viewer commands produce a two-line progress meter:
ssh: 4.17MB 0:00:07 [ 648kB/s] [ <=> ] gzip: 592kB 0:00:06 [62.1kB/s] [ <=> ]
The first line is labeled ssh
and shows the progress of
the transfer; the second line, tagged gzip
, shows
the progression of the compression. Because each command cannot determine
the number of bytes in its respective operation, the accumulated totals and the
activity bar are shown on each line.
If you know or are able to approximate or calculate the number of bytes in an
operation, use the --size
option. Adding this option
provides some finer-grained detail in the progress bars.
For instance, if you want to monitor the progress of a significant archiving task, you
can use other UNIX utilities to approximate the total size of the original files.
The df
utility can show statistics for an entire file
system, while du
can calculate the size of an
arbitrarily deep hierarchy:
$ tar cf - work | pv --size `du -sh work | cut -f1` > work.tar
Here, the subshell command du -sh work | cut -f1
yields the total size of the work directory in a format compatible with
pv
. Namely, du -h
produces a human-readable format such as 17M for 17
megabytes—perfect for use with pv
. (The
ls
and df
commands
also support -h
for human-readable format.) Because
pv
now expects a specific number of bytes to
transit through the pipe, it can render a true progress bar:
700kB 0:00:07 [ 100kB/s] [> ] 4% ETA 0:02:47
Finally, there is one additional technique you're sure to find useful. Beside counting
bytes, Pipe Viewer can visualize progress by counting lines. If you specify
the modifier --line-mode
, pv
advances the progress meter each time a newline is encountered. You can also
provide --size
, and the number is interpreted as the
expected number of lines.
Here's an example. Oftentimes, find
is helpful for locating
a needle in a haystack, such as locating all the uses of a particular system call in
a large body of application code. In such circumstances, you might run something
like this:
$ find . -type f -name '*.c' -exec grep --files-with-match fopen \{\} \; > results
This code finds all C
source files and emits the file's
name if the string fopen
appears anywhere in the
file. Output is collected in a file named results. To reflect activity, add
pv
to the mix:
$ find . -type f -name '*.c' -exec grep --files-with-match fopen \{\} \; | \ pv --line-mode > results
Line mode is phenomenal, because many UNIX commands, like
find
, operate on a file's metadata, not on the
contents of the file. Line mode is ideal for systems administration scripts that
copy or compress large collections of files.
In general, you can inject Pipe Viewer into command lines and scripts whenever rate
is measurable. You may have to get creative, though. For example, to measure
how quickly a directory is copied, switch from cp -pr
to tar
:
$ # an equivalent of cp -pr old/somedir new $ (cd old; tar cf - somedir) | pv | (cd new; tar xf - )
You might also consider line mode for use with networking utilities such as
wget
, curl
, and
scp
. For instance, you can use pv
to measure the progress of a sizable upload. And because many of the networking
tools can take input from a file, you can use the length of such a file as an argument
to --size
.
A little gem
Pipe Viewer is one of those little-known gems that once you find it, you can't recall how
you lived without it. You may find some applications of pv
in your daily command-line use, but you are likely to find oodles of uses for it in
your automation scripts. Rather than stare at a blinking cursor waiting patiently for
some indication that all is well, you can now insert a probe to give you real-time
feedback. Pipe Viewer adds a heartbeat to the soul of the machine.
Downloadable resources
Related topics
- Speaking UNIX: Check out other parts in this series.
- UNIX shells: Learn more about UNIX shells.
- Pipe Viewer: Find more information about and download Pipe Viewer.