Speaking UNIX: Advanced applications of rsync

Keeping multiple machines synced can be challenging. Fortunately, a powerful tool is available to make the task easier: rsync.

Martin Streicher, Software Developer, Pixel, Byte, and Comma

author photo - martin streicherMartin Streicher is a freelance Ruby on Rails developer and the former Editor-in-Chief of Linux Magazine. Martin holds a Masters of Science degree in computer science from Purdue University and has programmed UNIX-like systems since 1986. He collects art and toys. You can reach Martin at martin.streicher@gmail.com.


developerWorks Contributing author
        level

22 September 2009

Also available in Chinese Russian

Over the past 20 years, the use of computer networks has exploded. The growth of the Internet, commensurate and reciprocal investments in national and international backbone infrastructure, and the plummeting price of networking and computing hardware have driven usage. Today, networks are both pervasive and commonplace, and applications still push the envelope of network scale and speed. The Internet may have gotten its start on a handful of tiny workstations, but it and its private analogs now connect countless computers.

Frequently used acronyms

  • FTP: File Transfer Protocol
  • WebDAV: Web-based Distributed Authoring and Versioning

Over the same period, UNIX® has grown as well and kept pace with increasingly capable networking software. FTP was among the first tools to share files between systems and remains in widespread use. rcp, short for "remote copy," improved on FTP, because it mimicked the traditional cp utility but copied files from machine to machine. rdist, based on rcp, distributed files from one machine to many systems automatically.

Today, all the latter utilities are antiques: rcp and rdist were made obsolete because both were inherently insecure. scp took their place. While FTP remains in wide use, Secure FTP (SFTP), the secure version of FTP, should be used whenever possible. Other options exist, too—WebDAV and BitTorrent™ among them. Of course, the more machines you have, the more difficult it is to keep all in sync—or at least in a known state—and scp and WebDAV offer no respite, unless you want to script a solution yourself.

The best tool for distributing files is rsync. rsync can resume a transfer after interruption; it transfers only those portions of a file that differ between source and destination; and rsync can perform entire or incremental backups. Better yet, rsync is available on every flavor of UNIX, including Mac OS X, so it's easy to interconnect virtually any set of systems.

Let's look at some common uses of rsync as review, then look at more advanced applications. The demonstration systems employed here are Mac OS X version 10.5 Leopard (a variant of FreeBSD) and Ubuntu Linux® version 8. If you use a different operating system, chances are, most of the examples here are portable; check your machine's rsync man page to verify proper operation.

A quick review

Much like cp, rsync copies files from a source to a destination. Unlike cp, the source and destination of an rsync operation can be local or remote. For instance, the command in Listing 1 copies the directory /tmp/photos and its entire contents verbatim to a home directory.

Listing 1. Copy the contents of a directory verbatim
$ 	rsync -n -av /tmp/photos ~
building file list ... done
photos/
photos/Photo 2.jpg
photos/Photo 3.jpg
photos/Photo 6.jpg
photos/Photo 9.jpg

sent 218 bytes  received 56 bytes  548.00 bytes/sec
total size is 375409  speedup is 1370.11

The -v option enables verbose messages. The -a option (where a stands for archive), is a shorthand for -rlptgoD (recurse, copy symbolic links as symbolic links, preserve permissions, preserve file times, preserve group, preserve owner, and preserve devices and special files, respectively). Typically, -a mirrors files; exceptions occur when the destination cannot or does not support the same attributes. For example, copying a directory from UNIX to Windows® does not map perfectly. Some suggestions for unusual cases appear below.

rsync has a lot of options. If you worry that your options or source or destination specifications are incorrect, use -n to perform a dry run. A dry run previews what will happen to each file but does not move a single byte. When you are confident of all the settings, drop the -n and proceed.

Listing 2 provides an example where -n is invaluable. The command in Listing 1 and the following command yield different results.

Listing 2. Copy the contents of a named directory
$ rsync -av /tmp/photos/ ~
./
Photo 2.jpg
Photo 3.jpg
Photo 6.jpg
Photo 9.jpg

sent 210 bytes  received 56 bytes  532.00 bytes/sec
total size is 375409  speedup is 1411.31

What is the difference? The difference is the trailing slash on the source argument. If the source has a trailing slash, the contents of the named directory but not the directory itself are copied. A slash on the end of the destination is immaterial.

And Listing 3 provides an example of moving the same directory to another system.

Listing 3. Move a directory to a
$ rsync -av /tmp/photos example.com:album
created directory album
Photo 2.jpg
Photo 3.jpg
Photo 6.jpg
Photo 9.jpg

sent 210 bytes  received 56 bytes  21.28 bytes/sec
total size is 375409  speedup is 1411.31

Assuming that you have the same login name on the remote machine, rsync prompts you with a password and, given the proper credential, creates the directory album and copies the images to that directory. By default, rsync uses Secure Shell (SSH) as its transport mechanism; you can reuse your machine aliases and public keys with rsync.


rsync modes

The examples in Listing 2 and Listing 3 demonstrate two of rsync's four modes. The first example was shell mode, also dubbed local mode. The second sample was remote shell mode and is so named because SSH powers the underlying connection and transfers. rsync has two additional modes. List mode acts like ls: It lists the contents of source, as shown in Listing 4.

Listing 4. List the contents of a source
$ 
drwxr-xr-x         238 2009/08/22 18:49:50 photos
-rw-r--r--        6148 2008/07/03 01:36:18 photos/.DS_Store
-rw-r--r--       71202 2008/06/18 04:51:36 photos/Photo 2.jpg
-rw-r--r--       69632 2008/06/18 04:51:45 photos/Photo 3.jpg
-rw-r--r--       61046 2008/07/14 00:31:17 photos/Photo 6.jpg
-rw-r--r--      167381 2008/07/14 00:31:56 photos/Photo 9.jpg

The fourth mode is server mode. Here, the rsync daemon runs perennially on a machine, accepting requests to transfer files. A transfer can send files to the daemon or request files from it. Server mode is ideal for creating a central backup server or project repository.

To differentiate between remote shell mode and server mode, the latter employs two colons (:) in the source and destination names. Assuming that whatever.example.com exists, the next command copies files from the source to a local destination:

$ rsync -av whatever.example.com::src /tmp

And what exactly is src? It's an rsync module that you define and configure on the daemon's host. A module has a name, a path that contains its files, and some other parameters, such as read only, which protects the contents from modification.

To run an rsync daemon, type:

$ sudo rsync --daemon

Running the rsync daemon as the superuser, root, is not strictly necessary, but the practice protects other files on your machine. Running as root, rsync restricts itself to the module's directory hierarchy (its path) using chroot. After a chroot, all other files and directories seem to vanish. If you choose to run the rsync daemon with your own privileges, choose an unused socket and make sure its modules have sufficient permissions to allow download and/or upload. Listing 5 shows a minimal configuration to share some files in your home directory without the need for sudo. The configuration is stored in file rsyncd.conf.

Listing 5. Simple configuration for sharing files
motd file = /home/strike/rsyncd/rsync.motd_file
pid file = /home/strike/rsyncd/rsyncd.pid
port = 7777
use chroot = no

[demo]
path = /home/strike
comment = Martin home directory
list = no

[dropbox]
path = /home/strike/public/dropbox
comment = A place to leave things for Martin
read only = no

[pickup]
path = /home/strike/public/pickup
comment = Get your files here!

The file has two segments. The first segment—here, the first four lines—configures the operation of the rsync daemon. (Other options are available, too.) The first line points to a file with a friendly message to identify your server. The second line points to another file to record the process ID of the server. This is a convenience in the event you must manually kill the rsync daemon:

kill -INT `cat /home/strike/rsyncd/rsyncd.pid`

The two files are in a home directory, because this example does not use superuser privileges to run the software. Similarly, the port chosen for the daemon is above 1000, which users can claim for any application. The fourth line turns off chroot.

The remaining segment is subdivided into small sections, one section per module. Each section, in turn, has a header line and a list of (key-value) pairs to set options for each module. By default, all modules are read only; set read only = no to allow Write operations. Also by default, all modules are listed in the module catalog; set list = no to hide the module.

To start the daemon, run:

$ rsync --daemon --config=rsyncd.conf

Now, connect to the daemon from another machine, and omit a module name. You should see this:

rsync --port=7777 mymachine.example.com::
Hello! Welcome to Martin's rsync server.

dropbox        	A place to leave things for Martin
pickup         	Get your files here!

If you do not name a module after the colons (::), the daemon responds with a list of available modules. If you name a module but do not name a specific file or directory within the module, the daemon provides a catalog of the module's contents, as shown in Listing 6.

Listing 6. Catalog output of a module's contents
rsync --port=7777 mymachine.example.com::pickup
Hello! Welcome to Martin's rsync server.

drwxr-xr-x        4096 2009/08/23 08:56:19 .
-rw-r--r--           0 2009/08/23 08:56:19 article21.html
-rw-r--r--           0 2009/08/23 08:56:19 design.txt
-rw-r--r--           0 2009/08/23 08:56:19 figure1.png

And naming a module and a file copies the file locally, as shown in Listing 7.

Listing 7. Name a module to copy files locally
rsync --port=7777 mymachine.example.com::pickup/
Hello! Welcome to Martin's rsync server.

drwxr-xr-x        4096 2009/08/23 08:56:19 .
-rw-r--r--           0 2009/08/23 08:56:19 article21.html
-rw-r--r--           0 2009/08/23 08:56:19 design.txt
-rw-r--r--           0 2009/08/23 08:56:19 figure1.png

You can also perform an upload by reversing the source and destination, then pointing to the module for writes, as shown in Listing 8.

Listing 8. Reverse source and destination directories
$ rsync -v --port=7777 application.js mymachine.example.com::dropbox
Hello! Welcome to Martin's rsync server.

application.js

sent 245 bytes  received 38 bytes  113.20 bytes/sec
total size is 164  speedup is 0.58

That's a quick but thorough review. Next, let's see how you can apply rsync to daily tasks. rsync is especially useful for backups. And because it can synchronize a local file with its remote counterpart—and can do that for an entire file system, too—it's ideal for managing large clusters of machines that must be (at least partially) identical.


Back up your data with rsync

Performing backups on a frequent basis is a critical but typically ignored chore. Perhaps it's the demands of running a lengthy backup each day or the need to have large external media to store files; never mind the excuse, copying data somewhere for safekeeping should be an everyday practice.

To make the task painless, use rsync and point to a remote server—perhaps one that your service provider hosts and backs up. Each of your UNIX machines can use the same technique, and it's ideal for keeping the data on your laptop safe.

Establish SSH keys and an rsync daemon on the remote machine, and create a backup module to permit writes. Once established, run rsync to create a daily backup that takes hardly any space, as shown in Listing 9.

Listing 9. Create daily backups
#!/bin/sh
# This script based on work by Michael Jakl (jakl.michael AT gmail DOTCOM) and used 
# with express permission.
HOST=mymachine.example.com
SOURCE=$HOME
PATHTOBACKUP=home-backup

date=`date "+%Y-%m-%dT%H:%M:%S"`

rsync -az --link-dest=$PATHTOBACKUP/current $SOURCE $HOST:PATHTOBACKUP/back-$date

ssh $HOST "rm $PATHTOBACKUP/current && ln -s back-$date $PATHTOBACKUP/current"

Replace HOST with the name of your backup host and SOURCE with the directory you want to save. Change PATHTOBACKUP to the name of your module. (You can also embed the three final lines of the script in a loop, dynamically change SOURCE, and back up a series of separate directories on the same system.) Here's how the backup works:

  • To begin, date is set to the current date and time and yields a string like 2009-08-23T12:32:18, which identifies the backup uniquely.
  • The rsync command performs the heavy lifting. -az preserves all file information and compresses the transfers. The magic lies in --link-dest=$PATHTOBACKUP/current, which specifies that if a file has not changed, do not copy it to the new backup. Instead, create a hard link from the new backup to the same file in the existing backup. In other words, the new backup only contains files that have changed; the rest are links.

    More specifically (and expanding all variables), mymachine.example.com::home-backup/current is the current archive. The new archive for /home/strike is targeted to mymachine.example.com::home-backup/back-2009-08-23T12:32:18. If a file in /home/strike has not changed, the file is represented in the new backup by a hard link to the current archive. Otherwise, the new file is copied to the new archive.

    If you touch but a few files or perhaps a handful of directories each day, the additional space required for what is effectively a full backup is paltry. Moreover, because each daily backup (except the very first) is so small, you can keep a long history of the files on hand.

  • The last step is to alter the organization of the backups on the remote machine to promote the newly created archive to be the current archive, thereby minimizing the differences to record the next time this script runs. The last command removes the current archive (which is merely a symbolic link) and recreates the same symbolic link pointing to the new archive.

Keep in mind that a hard link to a hard link points to the same file. Hard links are very cheap to create and maintain, so a full backup is simulated using only an incremental scheme.


Other advanced tricks and tips

Once you begin using remote rsync in daily tasks, you'll likely find it necessary to keep your daemon running at all times. Linux and UNIX machines have a startup script for rsync, usually in /etc/init.d/rsync. Check your operating system for a startup script and the utility that enables and disables components. In contrast, if you are running rsync as a daemon for your own use, or if you do not have access to the startup scripts, you can still start rsync with cron:

@reboot /usr/bin/rsync --daemon --port=7777 --config=/home/strike/rsyncd/rsyncd.conf

This command launches the daemon each time the machine restarts. Place this line in your crontab file, and save the file.

You saw how a preview with -n can reveal problems before any occur. You can also monitor the state of your transfers with two options: --progress and --stats. The former renders a progress bar. The latter shows how compression and transmission. Further, you can hasten the transfer between two machines with --compress. Rather than send raw data, the data is compressed by the sender and decompressed by the receiver, making the transit across the wire faster—fewer bytes translates to better times.

By default, rsync ensures that all files in the source are copied to the destination. This is duplication. If you want a mirror, where the destination is an exact copy of the source, provide --delete. For example, if the source has files A, B, and C, a standard rsync copy duplicates A, B, and C to the destination. However, if you delete B from the source and duplicate again, the destination no longer mirrors the source: B is no longer valid. The --delete command mirrors and removes files in the destination that no longer exist in the source.

Oftentimes, there are files you never want to copy to a backup or an archive. These include scratch files created by editors (usually denoted by a trailing tilde [~]) and other utilities and a wide variety of files that are nonessential, such as the MP3 files in your home directory that can be recreated if need be. You can exclude files from processing using patterns. You can specify a pattern on the command line or a list of patterns in a text file. You can also combine the patterns with the --delete-excluded command to remove files from the destination.

To exclude files based on a pattern using the command line, use --exclude. Remember that if any characters in the pattern have special meaning to the shell, such as *, wrap the pattern in single quotes:

$ rsync -a --exclude='*~' /home/strike/data example.com::data

Assuming that the file /home/strike/excludes had a list of patterns like this:

*~
*.old
*.mp3
tmp

you can exclude all files that match any of those patterns with:

$ rsync -a --exclude-from=/home/strike/excludes /home/strike/data example.com::data

Sync 'em up

Now that you know about rsync, you have no excuse to skip a healthy backup regimen. What's that? Your dog ate your hard disk? (Plausible these days, no?) See, and you said your data would be just fine. Now your valuable files live inFIDOnet.

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=430021
ArticleTitle=Speaking UNIX: Advanced applications of rsync
publish-date=09222009