Advanced applications of rsync
This content is part # of # in the series: Speaking UNIX
This content is part of the series:Speaking UNIX
Stay tuned for additional content in this series.
Over the past 20 years, the use of computer networks has exploded. The growth of the Internet, commensurate and reciprocal investments in national and international backbone infrastructure, and the plummeting price of networking and computing hardware have driven usage. Today, networks are both pervasive and commonplace, and applications still push the envelope of network scale and speed. The Internet may have gotten its start on a handful of tiny workstations, but it and its private analogs now connect countless computers.
Over the same period, UNIX® has grown as well and kept pace with increasingly
capable networking software. FTP was among the first tools to share files between
systems and remains in widespread use.
rcp, short for
"remote copy," improved on FTP, because it mimicked the traditional
cp utility but copied files from machine to machine.
rdist, based on
files from one machine to many systems automatically.
Today, all the latter utilities are antiques:
were made obsolete because both were inherently insecure.
took their place. While FTP remains in wide use, Secure FTP (SFTP), the secure version
of FTP, should be used whenever possible. Other options exist, too—WebDAV
and BitTorrent™ among them. Of course, the more machines you have, the more
difficult it is to keep all in sync—or at least in a known state—and
scp and WebDAV offer no respite, unless you want to script
a solution yourself.
The best tool for distributing files is
can resume a transfer after interruption; it transfers only those portions of a file that
differ between source and destination; and
rsync can perform
entire or incremental backups. Better yet,
rsync is available
on every flavor of UNIX, including Mac OS X, so it's easy to interconnect virtually any set
Let's look at some common uses of
rsync as review, then look
at more advanced applications. The demonstration systems employed here are Mac
OS X version 10.5 Leopard (a variant of FreeBSD) and Ubuntu Linux® version 8. If
you use a different operating system, chances are, most of the examples here are
portable; check your machine's
rsync man page to verify
A quick review
files from a source to a destination. Unlike
source and destination of an
rsync operation can be
local or remote. For instance, the command in Listing 1
copies the directory /tmp/photos and its entire contents verbatim to a home
Listing 1. Copy the contents of a directory verbatim
$ rsync -n -av /tmp/photos ~ building file list ... done photos/ photos/Photo 2.jpg photos/Photo 3.jpg photos/Photo 6.jpg photos/Photo 9.jpg sent 218 bytes received 56 bytes 548.00 bytes/sec total size is 375409 speedup is 1370.11
-v option enables verbose messages. The
-a option (where a stands for archive),
is a shorthand for
-rlptgoD (recurse, copy symbolic
links as symbolic links, preserve permissions, preserve file times, preserve group,
preserve owner, and preserve devices and special files, respectively). Typically,
-a mirrors files; exceptions occur when the destination
cannot or does not support the same attributes. For example, copying a directory
from UNIX to Windows® does not map perfectly. Some suggestions for unusual
cases appear below.
rsync has a lot of options. If you worry that your options or
source or destination specifications are incorrect, use
to perform a dry run. A dry run previews what will happen to each file but
does not move a single byte. When you are confident of all the settings, drop the
-n and proceed.
Listing 2. Copy the contents of a named directory
$ rsync -av /tmp/photos/ ~ ./ Photo 2.jpg Photo 3.jpg Photo 6.jpg Photo 9.jpg sent 210 bytes received 56 bytes 532.00 bytes/sec total size is 375409 speedup is 1411.31
What is the difference? The difference is the trailing slash on the source argument. If the source has a trailing slash, the contents of the named directory but not the directory itself are copied. A slash on the end of the destination is immaterial.
And Listing 3 provides an example of moving the same directory to another system.
Listing 3. Move a directory to a
$ rsync -av /tmp/photos example.com:album created directory album Photo 2.jpg Photo 3.jpg Photo 6.jpg Photo 9.jpg sent 210 bytes received 56 bytes 21.28 bytes/sec total size is 375409 speedup is 1411.31
Assuming that you have the same login name on the remote machine,
rsync prompts you with a password and, given the
proper credential, creates the directory album and copies the images to
that directory. By default,
rsync uses Secure Shell (SSH)
as its transport mechanism; you can reuse your machine aliases and public keys
The examples in Listing 2 and Listing 3
demonstrate two of
rsync's four modes. The first
example was shell mode, also dubbed local mode. The second sample
was remote shell mode and is so named because SSH powers the underlying
connection and transfers.
rsync has two additional
modes. List mode acts like
ls: It lists the contents
of source, as shown in Listing 4.
Listing 4. List the contents of a source
$ drwxr-xr-x 238 2009/08/22 18:49:50 photos -rw-r--r-- 6148 2008/07/03 01:36:18 photos/.DS_Store -rw-r--r-- 71202 2008/06/18 04:51:36 photos/Photo 2.jpg -rw-r--r-- 69632 2008/06/18 04:51:45 photos/Photo 3.jpg -rw-r--r-- 61046 2008/07/14 00:31:17 photos/Photo 6.jpg -rw-r--r-- 167381 2008/07/14 00:31:56 photos/Photo 9.jpg
The fourth mode is server mode. Here, the
daemon runs perennially on a machine, accepting requests to transfer files. A
transfer can send files to the daemon or request files from it. Server mode is
ideal for creating a central backup server or project repository.
To differentiate between remote shell mode and server mode, the latter employs two
:) in the source and destination names. Assuming
that whatever.example.com exists, the next command copies files from the source
to a local destination:
$ rsync -av whatever.example.com::src /tmp
And what exactly is
src? It's an
module that you define and configure on the daemon's host. A module has a name,
a path that contains its files, and some other parameters, such as
read only, which protects the contents from modification.
To run an
rsync daemon, type:
$ sudo rsync --daemon
rsync daemon as the superuser, root, is not
strictly necessary, but the practice protects other files on your machine. Running as
rsync restricts itself to the module's directory
hierarchy (its path) using
chroot. After a
chroot, all other files and directories seem to vanish.
If you choose to run the
rsync daemon with your own
privileges, choose an unused socket and make sure its modules have sufficient
permissions to allow download and/or upload. Listing 5 shows
a minimal configuration to share some files in your home directory
without the need for
sudo. The configuration is stored
in file rsyncd.conf.
Listing 5. Simple configuration for sharing files
motd file = /home/strike/rsyncd/rsync.motd_file pid file = /home/strike/rsyncd/rsyncd.pid port = 7777 use chroot = no [demo] path = /home/strike comment = Martin home directory list = no [dropbox] path = /home/strike/public/dropbox comment = A place to leave things for Martin read only = no [pickup] path = /home/strike/public/pickup comment = Get your files here!
The file has two segments. The first segment—here, the first four
lines—configures the operation of the
daemon. (Other options are available, too.) The first line points to a file with a
friendly message to identify your server. The second line points to another file to
record the process ID of the server. This is a convenience in the event you must
manually kill the
kill -INT `cat /home/strike/rsyncd/rsyncd.pid`
The two files are in a home directory, because this example does not use superuser
privileges to run the software. Similarly, the port chosen for the daemon is above
1000, which users can claim for any application. The fourth line turns off
The remaining segment is subdivided into small sections, one section per module. Each
section, in turn, has a header line and a list of (key-value) pairs to set options
for each module. By default, all modules are read only; set
read only = no to allow Write operations. Also by
default, all modules are listed in the module catalog; set
list = no to hide the module.
To start the daemon, run:
$ rsync --daemon --config=rsyncd.conf
Now, connect to the daemon from another machine, and omit a module name. You should see this:
rsync --port=7777 mymachine.example.com:: Hello! Welcome to Martin's rsync server. dropbox A place to leave things for Martin pickup Get your files here!
If you do not name a module after the colons (
daemon responds with a list of available modules. If you name a module but do not
name a specific file or directory within the module, the daemon provides a catalog
of the module's contents, as shown in Listing 6.
Listing 6. Catalog output of a module's contents
rsync --port=7777 mymachine.example.com::pickup Hello! Welcome to Martin's rsync server. drwxr-xr-x 4096 2009/08/23 08:56:19 . -rw-r--r-- 0 2009/08/23 08:56:19 article21.html -rw-r--r-- 0 2009/08/23 08:56:19 design.txt -rw-r--r-- 0 2009/08/23 08:56:19 figure1.png
And naming a module and a file copies the file locally, as shown in Listing 7.
Listing 7. Name a module to copy files locally
rsync --port=7777 mymachine.example.com::pickup/ Hello! Welcome to Martin's rsync server. drwxr-xr-x 4096 2009/08/23 08:56:19 . -rw-r--r-- 0 2009/08/23 08:56:19 article21.html -rw-r--r-- 0 2009/08/23 08:56:19 design.txt -rw-r--r-- 0 2009/08/23 08:56:19 figure1.png
You can also perform an upload by reversing the source and destination, then pointing to the module for writes, as shown in Listing 8.
Listing 8. Reverse source and destination directories
$ rsync -v --port=7777 application.js mymachine.example.com::dropbox Hello! Welcome to Martin's rsync server. application.js sent 245 bytes received 38 bytes 113.20 bytes/sec total size is 164 speedup is 0.58
That's a quick but thorough review. Next, let's see how you can apply
rsync to daily tasks.
is especially useful for backups. And because it can synchronize a local file with
its remote counterpart—and can do that for an entire file system, too—it's
ideal for managing large clusters of machines that must be (at least partially)
Back up your data with rsync
Performing backups on a frequent basis is a critical but typically ignored chore. Perhaps it's the demands of running a lengthy backup each day or the need to have large external media to store files; never mind the excuse, copying data somewhere for safekeeping should be an everyday practice.
To make the task painless, use
rsync and point to a
remote server—perhaps one that your service provider hosts and backs
up. Each of your UNIX machines can use the same technique, and it's ideal for
keeping the data on your laptop safe.
Establish SSH keys and an
rsync daemon on the remote
machine, and create a backup module to permit writes. Once established, run
rsync to create a daily backup that takes hardly any
space, as shown in Listing 9.
Listing 9. Create daily backups
#!/bin/sh # This script based on work by Michael Jakl (jakl.michael AT gmail DOTCOM) and used # with express permission. HOST=mymachine.example.com SOURCE=$HOME PATHTOBACKUP=home-backup date=`date "+%Y-%m-%dT%H:%M:%S"` rsync -az --link-dest=$PATHTOBACKUP/current $SOURCE $HOST:PATHTOBACKUP/back-$date ssh $HOST "rm $PATHTOBACKUP/current && ln -s back-$date $PATHTOBACKUP/current"
HOST with the name of your backup host and
SOURCE with the directory you want to save. Change
PATHTOBACKUP to the name of your module. (You can
also embed the three final lines of the script in a loop, dynamically change
SOURCE, and back up a series of separate directories
on the same system.) Here's how the backup works:
- To begin,
dateis set to the current date and time and yields a string like
2009-08-23T12:32:18, which identifies the backup uniquely.
rsynccommand performs the heavy lifting.
-azpreserves all file information and compresses the transfers. The magic lies in
--link-dest=$PATHTOBACKUP/current, which specifies that if a file has not changed, do not copy it to the new backup. Instead, create a hard link from the new backup to the same file in the existing backup. In other words, the new backup only contains files that have changed; the rest are links.
More specifically (and expanding all variables),
mymachine.example.com::home-backup/currentis the current archive. The new archive for /home/strike is targeted to
mymachine.example.com::home-backup/back-2009-08-23T12:32:18. If a file in /home/strike has not changed, the file is represented in the new backup by a hard link to the current archive. Otherwise, the new file is copied to the new archive.
If you touch but a few files or perhaps a handful of directories each day, the additional space required for what is effectively a full backup is paltry. Moreover, because each daily backup (except the very first) is so small, you can keep a long history of the files on hand.
- The last step is to alter the organization of the backups on the remote machine to promote the newly created archive to be the current archive, thereby minimizing the differences to record the next time this script runs. The last command removes the current archive (which is merely a symbolic link) and recreates the same symbolic link pointing to the new archive.
Keep in mind that a hard link to a hard link points to the same file. Hard links are very cheap to create and maintain, so a full backup is simulated using only an incremental scheme.
Other advanced tricks and tips
Once you begin using remote
rsync in daily tasks,
you'll likely find it necessary to keep your daemon running at all times. Linux
and UNIX machines have a startup script for
usually in /etc/init.d/rsync. Check your operating system for a startup script and
the utility that enables and disables components. In contrast, if you are running
rsync as a daemon for your own use, or if you do
not have access to the startup scripts, you can still start
@reboot /usr/bin/rsync --daemon --port=7777 --config=/home/strike/rsyncd/rsyncd.conf
This command launches the daemon each time the machine restarts. Place this line in your crontab file, and save the file.
You saw how a preview with
-n can reveal problems
before any occur. You can also monitor the state of your transfers with two
The former renders a progress bar. The latter shows how compression and transmission.
Further, you can hasten the transfer between two machines with
--compress. Rather than send raw data, the data is
compressed by the sender and decompressed by the receiver, making the transit across
the wire faster—fewer bytes translates to better times.
rsync ensures that all files in the source are
copied to the destination. This is duplication. If you want a mirror, where
the destination is an exact copy of the source, provide
For example, if the source has files A, B, and C, a standard
copy duplicates A, B, and C to the destination. However, if you delete B from the
source and duplicate again, the destination no longer mirrors the source: B is no
longer valid. The
--delete command mirrors and removes
files in the destination that no longer exist in the source.
Oftentimes, there are files you never want to copy to a backup or an archive. These
include scratch files created by editors (usually denoted by a trailing tilde
~]) and other utilities and a wide variety of files that
are nonessential, such as the MP3 files in your home directory that can be recreated
if need be. You can exclude files from processing using patterns. You can specify a
pattern on the command line or a list of patterns in a text file. You can also combine
the patterns with the
--delete-excluded command to remove
files from the destination.
To exclude files based on a pattern using the command line, use
Remember that if any characters in the pattern have special meaning to the shell,
*, wrap the pattern in single quotes:
$ rsync -a --exclude='*~' /home/strike/data example.com::data
Assuming that the file /home/strike/excludes had a list of patterns like this:
*~ *.old *.mp3 tmp
you can exclude all files that match any of those patterns with:
$ rsync -a --exclude-from=/home/strike/excludes /home/strike/data example.com::data
Sync 'em up
Now that you know about
rsync, you have no excuse to
skip a healthy backup regimen. What's that? Your dog ate your hard disk? (Plausible
these days, no?) See, and you said your data would be just fine. Now your valuable
files live inFIDOnet.
rsync: Visit the project home page to read more about
rsyncand follow its development.
rsynctutorial: Read this for a host of rsync recipes.
rsyncman page: Read the
- Speaking UNIX: Check out other parts in this series.
- UNIX shells: Learn more about UNIX shells.