Over the past 20 years, the use of computer networks has exploded. The growth of the Internet, commensurate and reciprocal investments in national and international backbone infrastructure, and the plummeting price of networking and computing hardware have driven usage. Today, networks are both pervasive and commonplace, and applications still push the envelope of network scale and speed. The Internet may have gotten its start on a handful of tiny workstations, but it and its private analogs now connect countless computers.
Over the same period, UNIX® has grown as well and kept pace with increasingly
capable networking software. FTP was among the first tools to share files between
systems and remains in widespread use. rcp, short for
"remote copy," improved on FTP, because it mimicked the traditional
cp utility but copied files from machine to machine.
rdist, based on rcp, distributed
files from one machine to many systems automatically.
Today, all the latter utilities are antiques: rcp and rdist
were made obsolete because both were inherently insecure. scp
took their place. While FTP remains in wide use, Secure FTP (SFTP), the secure version
of FTP, should be used whenever possible. Other options exist, too—WebDAV
and BitTorrent™ among them. Of course, the more machines you have, the more
difficult it is to keep all in sync—or at least in a known state—and
scp and WebDAV offer no respite, unless you want to script
a solution yourself.
The best tool for distributing files is rsync. rsync
can resume a transfer after interruption; it transfers only those portions of a file that
differ between source and destination; and rsync can perform
entire or incremental backups. Better yet, rsync is available
on every flavor of UNIX, including Mac OS X, so it's easy to interconnect virtually any set
of systems.
Let's look at some common uses of rsync as review, then look
at more advanced applications. The demonstration systems employed here are Mac
OS X version 10.5 Leopard (a variant of FreeBSD) and Ubuntu Linux® version 8. If
you use a different operating system, chances are, most of the examples here are
portable; check your machine's rsync man page to verify
proper operation.
Much like cp, rsync copies
files from a source to a destination. Unlike cp, the
source and destination of an rsync operation can be
local or remote. For instance, the command in Listing 1
copies the directory /tmp/photos and its entire contents verbatim to a home
directory.
Listing 1. Copy the contents of a directory verbatim
$ rsync -n -av /tmp/photos ~ building file list ... done photos/ photos/Photo 2.jpg photos/Photo 3.jpg photos/Photo 6.jpg photos/Photo 9.jpg sent 218 bytes received 56 bytes 548.00 bytes/sec total size is 375409 speedup is 1370.11 |
The -v option enables verbose messages. The
-a option (where a stands for archive),
is a shorthand for -rlptgoD (recurse, copy symbolic
links as symbolic links, preserve permissions, preserve file times, preserve group,
preserve owner, and preserve devices and special files, respectively). Typically,
-a mirrors files; exceptions occur when the destination
cannot or does not support the same attributes. For example, copying a directory
from UNIX to Windows® does not map perfectly. Some suggestions for unusual
cases appear below.
rsync has a lot of options. If you worry that your options or
source or destination specifications are incorrect, use -n
to perform a dry run. A dry run previews what will happen to each file but
does not move a single byte. When you are confident of all the settings, drop the
-n and proceed.
Listing 2 provides an example where -n
is invaluable. The command in Listing 1 and the following
command yield different results.
Listing 2. Copy the contents of a named directory
$ rsync -av /tmp/photos/ ~ ./ Photo 2.jpg Photo 3.jpg Photo 6.jpg Photo 9.jpg sent 210 bytes received 56 bytes 532.00 bytes/sec total size is 375409 speedup is 1411.31 |
What is the difference? The difference is the trailing slash on the source argument. If the source has a trailing slash, the contents of the named directory but not the directory itself are copied. A slash on the end of the destination is immaterial.
And Listing 3 provides an example of moving the same directory to another system.
Listing 3. Move a directory to a
$ rsync -av /tmp/photos example.com:album created directory album Photo 2.jpg Photo 3.jpg Photo 6.jpg Photo 9.jpg sent 210 bytes received 56 bytes 21.28 bytes/sec total size is 375409 speedup is 1411.31 |
Assuming that you have the same login name on the remote machine,
rsync prompts you with a password and, given the
proper credential, creates the directory album and copies the images to
that directory. By default, rsync uses Secure Shell (SSH)
as its transport mechanism; you can reuse your machine aliases and public keys
with rsync.
The examples in Listing 2 and Listing 3
demonstrate two of rsync's four modes. The first
example was shell mode, also dubbed local mode. The second sample
was remote shell mode and is so named because SSH powers the underlying
connection and transfers. rsync has two additional
modes. List mode acts like ls: It lists the contents
of source, as shown in Listing 4.
Listing 4. List the contents of a source
$ drwxr-xr-x 238 2009/08/22 18:49:50 photos -rw-r--r-- 6148 2008/07/03 01:36:18 photos/.DS_Store -rw-r--r-- 71202 2008/06/18 04:51:36 photos/Photo 2.jpg -rw-r--r-- 69632 2008/06/18 04:51:45 photos/Photo 3.jpg -rw-r--r-- 61046 2008/07/14 00:31:17 photos/Photo 6.jpg -rw-r--r-- 167381 2008/07/14 00:31:56 photos/Photo 9.jpg |
The fourth mode is server mode. Here, the rsync
daemon runs perennially on a machine, accepting requests to transfer files. A
transfer can send files to the daemon or request files from it. Server mode is
ideal for creating a central backup server or project repository.
To differentiate between remote shell mode and server mode, the latter employs two
colons (:) in the source and destination names. Assuming
that whatever.example.com exists, the next command copies files from the source
to a local destination:
$ rsync -av whatever.example.com::src /tmp |
And what exactly is src? It's an rsync
module that you define and configure on the daemon's host. A module has a name,
a path that contains its files, and some other parameters, such as
read only, which protects the contents from modification.
To run an rsync daemon, type:
$ sudo rsync --daemon |
Running the rsync daemon as the superuser, root, is not
strictly necessary, but the practice protects other files on your machine. Running as
root, rsync restricts itself to the module's directory
hierarchy (its path) using chroot. After a
chroot, all other files and directories seem to vanish.
If you choose to run the rsync daemon with your own
privileges, choose an unused socket and make sure its modules have sufficient
permissions to allow download and/or upload. Listing 5 shows
a minimal configuration to share some files in your home directory
without the need for sudo. The configuration is stored
in file rsyncd.conf.
Listing 5. Simple configuration for sharing files
motd file = /home/strike/rsyncd/rsync.motd_file pid file = /home/strike/rsyncd/rsyncd.pid port = 7777 use chroot = no [demo] path = /home/strike comment = Martin home directory list = no [dropbox] path = /home/strike/public/dropbox comment = A place to leave things for Martin read only = no [pickup] path = /home/strike/public/pickup comment = Get your files here! |
The file has two segments. The first segment—here, the first four
lines—configures the operation of the rsync
daemon. (Other options are available, too.) The first line points to a file with a
friendly message to identify your server. The second line points to another file to
record the process ID of the server. This is a convenience in the event you must
manually kill the rsync daemon:
kill -INT `cat /home/strike/rsyncd/rsyncd.pid` |
The two files are in a home directory, because this example does not use superuser
privileges to run the software. Similarly, the port chosen for the daemon is above
1000, which users can claim for any application. The fourth line turns off
chroot.
The remaining segment is subdivided into small sections, one section per module. Each
section, in turn, has a header line and a list of (key-value) pairs to set options
for each module. By default, all modules are read only; set
read only = no to allow Write operations. Also by
default, all modules are listed in the module catalog; set
list = no to hide the module.
To start the daemon, run:
$ rsync --daemon --config=rsyncd.conf |
Now, connect to the daemon from another machine, and omit a module name. You should see this:
rsync --port=7777 mymachine.example.com:: Hello! Welcome to Martin's rsync server. dropbox A place to leave things for Martin pickup Get your files here! |
If you do not name a module after the colons (::), the
daemon responds with a list of available modules. If you name a module but do not
name a specific file or directory within the module, the daemon provides a catalog
of the module's contents, as shown in Listing 6.
Listing 6. Catalog output of a module's contents
rsync --port=7777 mymachine.example.com::pickup Hello! Welcome to Martin's rsync server. drwxr-xr-x 4096 2009/08/23 08:56:19 . -rw-r--r-- 0 2009/08/23 08:56:19 article21.html -rw-r--r-- 0 2009/08/23 08:56:19 design.txt -rw-r--r-- 0 2009/08/23 08:56:19 figure1.png |
And naming a module and a file copies the file locally, as shown in Listing 7.
Listing 7. Name a module to copy files locally
rsync --port=7777 mymachine.example.com::pickup/ Hello! Welcome to Martin's rsync server. drwxr-xr-x 4096 2009/08/23 08:56:19 . -rw-r--r-- 0 2009/08/23 08:56:19 article21.html -rw-r--r-- 0 2009/08/23 08:56:19 design.txt -rw-r--r-- 0 2009/08/23 08:56:19 figure1.png |
You can also perform an upload by reversing the source and destination, then pointing to the module for writes, as shown in Listing 8.
Listing 8. Reverse source and destination directories
$ rsync -v --port=7777 application.js mymachine.example.com::dropbox Hello! Welcome to Martin's rsync server. application.js sent 245 bytes received 38 bytes 113.20 bytes/sec total size is 164 speedup is 0.58 |
That's a quick but thorough review. Next, let's see how you can apply
rsync to daily tasks. rsync
is especially useful for backups. And because it can synchronize a local file with
its remote counterpart—and can do that for an entire file system, too—it's
ideal for managing large clusters of machines that must be (at least partially)
identical.
Performing backups on a frequent basis is a critical but typically ignored chore. Perhaps it's the demands of running a lengthy backup each day or the need to have large external media to store files; never mind the excuse, copying data somewhere for safekeeping should be an everyday practice.
To make the task painless, use rsync and point to a
remote server—perhaps one that your service provider hosts and backs
up. Each of your UNIX machines can use the same technique, and it's ideal for
keeping the data on your laptop safe.
Establish SSH keys and an rsync daemon on the remote
machine, and create a backup module to permit writes. Once established, run
rsync to create a daily backup that takes hardly any
space, as shown in Listing 9.
Listing 9. Create daily backups
#!/bin/sh # This script based on work by Michael Jakl (jakl.michael AT gmail DOTCOM) and used # with express permission. HOST=mymachine.example.com SOURCE=$HOME PATHTOBACKUP=home-backup date=`date "+%Y-%m-%dT%H:%M:%S"` rsync -az --link-dest=$PATHTOBACKUP/current $SOURCE $HOST:PATHTOBACKUP/back-$date ssh $HOST "rm $PATHTOBACKUP/current && ln -s back-$date $PATHTOBACKUP/current" |
Replace HOST with the name of your backup host and
SOURCE with the directory you want to save. Change
PATHTOBACKUP to the name of your module. (You can
also embed the three final lines of the script in a loop, dynamically change
SOURCE, and back up a series of separate directories
on the same system.) Here's how the backup works:
- To begin,
dateis set to the current date and time and yields a string like2009-08-23T12:32:18, which identifies the backup uniquely. - The
rsynccommand performs the heavy lifting.-azpreserves all file information and compresses the transfers. The magic lies in--link-dest=$PATHTOBACKUP/current, which specifies that if a file has not changed, do not copy it to the new backup. Instead, create a hard link from the new backup to the same file in the existing backup. In other words, the new backup only contains files that have changed; the rest are links.More specifically (and expanding all variables),
mymachine.example.com::home-backup/currentis the current archive. The new archive for /home/strike is targeted tomymachine.example.com::home-backup/back-2009-08-23T12:32:18. If a file in /home/strike has not changed, the file is represented in the new backup by a hard link to the current archive. Otherwise, the new file is copied to the new archive.If you touch but a few files or perhaps a handful of directories each day, the additional space required for what is effectively a full backup is paltry. Moreover, because each daily backup (except the very first) is so small, you can keep a long history of the files on hand.
- The last step is to alter the organization of the backups on the remote machine to promote the newly created archive to be the current archive, thereby minimizing the differences to record the next time this script runs. The last command removes the current archive (which is merely a symbolic link) and recreates the same symbolic link pointing to the new archive.
Keep in mind that a hard link to a hard link points to the same file. Hard links are very cheap to create and maintain, so a full backup is simulated using only an incremental scheme.
Other advanced tricks and tips
Once you begin using remote rsync in daily tasks,
you'll likely find it necessary to keep your daemon running at all times. Linux
and UNIX machines have a startup script for rsync,
usually in /etc/init.d/rsync. Check your operating system for a startup script and
the utility that enables and disables components. In contrast, if you are running
rsync as a daemon for your own use, or if you do
not have access to the startup scripts, you can still start rsync
with cron:
@reboot /usr/bin/rsync --daemon --port=7777 --config=/home/strike/rsyncd/rsyncd.conf |
This command launches the daemon each time the machine restarts. Place this line in your crontab file, and save the file.
You saw how a preview with -n can reveal problems
before any occur. You can also monitor the state of your transfers with two
options: --progress and --stats.
The former renders a progress bar. The latter shows how compression and transmission.
Further, you can hasten the transfer between two machines with
--compress. Rather than send raw data, the data is
compressed by the sender and decompressed by the receiver, making the transit across
the wire faster—fewer bytes translates to better times.
By default, rsync ensures that all files in the source are
copied to the destination. This is duplication. If you want a mirror, where
the destination is an exact copy of the source, provide --delete.
For example, if the source has files A, B, and C, a standard rsync
copy duplicates A, B, and C to the destination. However, if you delete B from the
source and duplicate again, the destination no longer mirrors the source: B is no
longer valid. The --delete command mirrors and removes
files in the destination that no longer exist in the source.
Oftentimes, there are files you never want to copy to a backup or an archive. These
include scratch files created by editors (usually denoted by a trailing tilde
[~]) and other utilities and a wide variety of files that
are nonessential, such as the MP3 files in your home directory that can be recreated
if need be. You can exclude files from processing using patterns. You can specify a
pattern on the command line or a list of patterns in a text file. You can also combine
the patterns with the --delete-excluded command to remove
files from the destination.
To exclude files based on a pattern using the command line, use --exclude.
Remember that if any characters in the pattern have special meaning to the shell,
such as *, wrap the pattern in single quotes:
$ rsync -a --exclude='*~' /home/strike/data example.com::data |
Assuming that the file /home/strike/excludes had a list of patterns like this:
*~ *.old *.mp3 tmp |
you can exclude all files that match any of those patterns with:
$ rsync -a --exclude-from=/home/strike/excludes /home/strike/data example.com::data |
Now that you know about rsync, you have no excuse to
skip a healthy backup regimen. What's that? Your dog ate your hard disk? (Plausible
these days, no?) See, and you said your data would be just fine. Now your valuable
files live in FIDOnet.
Learn
-
rsync: Visit the project home page to read more aboutrsyncand follow its development. -
rsynctutorial: Read this for a host of rsync recipes. -
rsyncman page: Read thersyncman page. -
Speaking
UNIX: Check out other parts in this series.
-
UNIX shells: Learn more about
UNIX shells.
-
AIX and UNIX developerWorks
zone: The AIX and UNIX zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
-
New to AIX and UNIX?
Visit the New to AIX and UNIX page to learn more.
-
Technology
bookstore: Browse the technology bookstore for books on this and other
technical topics.
Discuss
-
developerWorks blogs: Check out
our blogs and get involved in the developerWorks
community.
-
Participate in the AIX and UNIX forums:
- AIX Forum
- AIX Forum for developers
- Cluster Systems Management
- IBM Support Assistant Forum
- Performance Tools Forum
- Virtualization Forum
- More AIX and UNIX Forums

Martin Streicher is a freelance Ruby on Rails developer and the former Editor-in-Chief of Linux Magazine. Martin holds a Masters of Science degree in computer science from Purdue University and has programmed UNIX-like systems since 1986. He collects art and toys. You can reach Martin at martin.streicher@gmail.com.





