Windows-to-Linux roadmap: Part 8. Backup and recovery

A quick guide to Linux backup and recovery

IBM e-business architect Chris Walden is your guide through a nine-part developerWorks series on moving your operational skills from a Windows to a Linux environment. He covers everything from logging to networking, and from the command-line to help systems -- even compiling packages from available source code. In this part, we take stock of what is on the system, and plan and implement regular backups with an eye to recovery as well as security.

Chris Walden (dwinfo@us.ibm.com), e-business Architect, IBM

Chris Walden is an e-business Architect for IBM Developer Relations Technical Consulting in Austin, Texas, providing education, enablement, and consulting to IBM Business Partners. He is the official Linux fanatic on his hallway and does his best to spread the good news to all who will hear it. In addition to his architect duties, he manages the area's all-Linux infrastructure servers, which include file, print, and other application services in a mixed-platform user environment. Chris has ten years of experience in the computer industry ranging from field support to Web application development and consulting.



11 November 2003

Also available in Russian Japanese

Linux is a stable and reliable environment. But any computing system can have unforeseen events, such as hardware failures. Having a reliable backup of critical configuration information and data is part of any responsible administration plan. There is a wide variety of approaches to doing backups in Linux. Techniques range from very simple script-driven methods to elaborate commercial software. Backups can be done to remote network devices, tape drives, and other removable media. Backups can be file-based or drive-image based. There are many options available and you can mix and match your techniques to design the perfect backup plan for your circumstances.

What's your strategy?

There are many different approaches to backing up a system. For some perspectives on this, you may want to read the article "Introduction to Backing Up and Restoring Data" listed in the Resources section at the end of this article.

What you back up depends a lot on your reason for backing up. Are you trying to recover from critical failures, such as hard drive problems? Are you archiving so that old files can be recovered if needed? Do you plan to start with a cold system and restore, or a preloaded standby system?


What to back up?

The file-based nature of Linux is a great advantage when backing up and restoring the system. In a Windows system, the registry is very system specific. Configurations and software installations are not simply a matter of dropping files on a system. Therefore, restoring a system requires software that can deal with these idiosyncrasies. In Linux, the story is different. Configuration files are text based and, except for when they deal directly with hardware, are largely system independent. The modern approach to hardware drivers is to have them available as modules that are dynamically loaded, so kernels are becoming more system independent. Rather than a backup having to deal with the intricacies of how the operating system is installed on your system and hardware, Linux backups are about packaging and unpackaging files.

In general, there are some directories that you want to back up:

  • /etc
    contains all of your core configuration files. This includes your network configuration, system name, firewall rules, users, groups, and other global system items.
  • /var
    contains information used by your systems daemons (services) including DNS configurations, DHCP leases, mail spool files, HTTP server files, db2 instance configuration, and others.
  • /home
    contains the default user home directories for all of your users. This includes their personal settings, downloaded files, and other information your users don't want to lose.
  • /root
    is the home directory for the root user.
  • /opt
    is where a lot of non-system software will be installed. IBM software goes in here. OpenOffice, JDKs, and other software is also installed here by default.

There are directories that you should consider not backing up.

  • /proc
    should never be backed up. It is not a real-file system, but rather a virtualized view of the running kernel and environment. It includes files such as /proc/kcore, which is a virtual view of the entire running memory. Backing these up only wastes resources.
  • /dev
    contains the file representations of your hardware devices. If you are planning to restore to a blank system, then you can back up /dev. However, if you are planning to restore to an installed Linux base, then backing up /dev will not be necessary.

The other directories contain system files and installed packages. In a server environment, much of this information is not customized. Most customization occurs in the /etc and /home directories. But for completeness, you may wish to back them up.

In a production environment where I wanted to be assured that no data would be lost, I would back up the entire system, except for the /proc directory. If I were mostly worried about users and configuration, I would back up only the /etc, /var, /home, and /root directories.


Backup tools

As mentioned before, Linux backups are largely about packaging and unpackaging files. This allows you to use existing system utilities and scripting to perform your backups rather than having to purchase a commercial software package. In many cases, this type of backup will be adequate, and it provides a great deal of control for the administrator. The backup script can be automated using the cron command, which controls scheduled events in Linux.

tar

tar is a classic UNIX command that has been ported into Linux. tar is short for tape archive, and was originally designed for packaging files onto tape. You have probably already encountered tar files if you have downloaded any source code for Linux. It is a file-based command that essentially serially stacks the files end to end.

Entire directory trees can be packaged with tar, which makes it especially suited to backups. Archives can be restored in their entirety, or files and directories can be expanded individually. Backups can go to file-based devices or tape devices. Files can be redirected upon restoration to replace to a different directory (or system) from where they were originally saved. tar is file system-independent. It can be used on ext2, ext3, jfs, Reiser, and other file systems.

Using tar is very much like using a file utility, such as PKZip. You point it toward a destination, which is a file or a device, and then name the files that you want to package. You can compress archives on the fly with standard compression types, or specify an external compression program of your choice. To compress or uncompress files through bzip2, use tar -z.

To back up the entire file system using tar to a SCSI tape drive, excluding the /proc directory:

tar -cpf /dev/st0 / --exclude=/proc

In the above example, the -c switch indicates that the archive is being created. The -p switch indicates that we want to preserve the file permissions, critical for a good backup. The -f switch points to the filename for the archive. In this case, we are using the raw tape device, /dev/st0. The / indicates what we want to back up. Since we wanted the entire file system, we specified the root. tar automatically recurses when pointed to a directory (ending in a /). Finally, we exclude the /proc directory, since it doesn't contain anything we need to save. If the backup will not fit on a single tape, we will add the -M switch (not shown), for multi-volume.

Just in case

Don't forget that Linux is case sensitive. The tar command should always be executed in lowercase, for example. Switches can be upper, lower, or mixed case. For example -t and -T perform different functions. File or directory names may be mixed case and, like commands and switches, are case sensitive.

To restore a file or files, the tar command is used with the extract switch (-x):

tar -xpf /dev/st0 -C /

The -f switch again points to our file, and -p indicates that we want to restore archived permissions. The -x switch indicates an extraction of the archive. The -C / indicates that we want the restore to occur from /. tar normally restores to the directory from which the command is run. The -C switch makes our current directory irrelevant.

The two other tar commands that you will probably use often are the -t and -d switches. The -t switch lists the contents of an archive. The -d switch compares the contents of the archive to current files on a system.

For ease of operation and editing, you can put the files and directories that you want to archive in a text file, which you reference with the -T switch. These can be combined with other directories listed on the command line. The following line backs up all the files and directories listed in MyFiles, the /root directory, and all of the iso files in the /tmp directory:

tar -cpf /dev/st0 -T MyFiles /root /tmp/*.iso

The file list is simply a text file with the list of files or directories. Here's an example:

/etc
/var
/home
/usr/local
/opt

Please note that the tar -T (or files-from) command cannot accept wildcards. Files must be listed explicitly. The example above shows one way to reference files separately. You could also execute a script to search the system and then build a list. Here is an example of such a script:

#!/bin/sh
cat MyFiles > TempList
find /usr/share -iname *.png >> TempList
find /tmp -iname *.iso >> TempList
tar -cpzMf /dev/st0 -T TempList

The above script first copies all of our existing file list from MyFiles to TempList. Then it executes a couple of find commands to search the file system for files that match a pattern and to append them to the TempList. The first search is for all files in the /usr/share directory tree that end in .png. The second search is for all files in the /tmp directory tree that end in .iso. Once the list is built, then tar is run to create a new archive on the file device /dev/st0 (the first SCSI tape drive), which is compressed using the gzip format and retains all of the file permissions. The archive will span Multiple volumes. The file names to be archived will be Taken from the file TempList.

Scripting can also be used to perform much more elaborate actions such as incremental backups. An excellent script is listed by Gerhard Mourani in his book Securing and Optimizing Linux, which you will find listed in the Resources section at the end of this article.

Scripts can also be written to restore files, though restoration is often done manually. As mentioned above, the -x switch for extract replaces the -c switch. Entire archives can be restored, or individual files or directories can be specified. Wildcards are okay to reference files in the archive. You can also use switches to dump and restore.


dump and restore

dump can perform functions similar to tar. However, dump tends to look at file systems rather than individual files. Quoting from the dump man file: "dump examines files on an ext2 filesystem and determines which files need to be backed up. These files are copied to the given disk, tape, or other storage medium for safe keeping.... A dump that is larger than the output medium is broken into multiple volumes. On most media, the size is determined by writing until an end-of-media indication is returned."

The companion program to dump is restore, which is used to restore files from a dump image.

The restore command performs the inverse function of dump. A full backup of a file system may be restored and subsequent incremental backups layered on top of it. Single files and directory subtrees may be restored from full or partial backups.

Both dump and restore can be run across the network, so you can back up or restore from remote devices. dump and restore work with tape drives and file devices providing a wide range of options. However, both are limited to the ext2 and ext3 file systems. If you are working with JFS, Reiser, or other file systems, you will need to use a different utility, such as tar.


Backing up with dump

Running a backup with dump is fairly straightforward. The following command does a full backup of Linux with all ext2 and ext3 file systems to a SCSI tape device:

dump 0f /dev/nst0 /boot
dump 0f /dev/nst0 /

In this example, our system has two file systems. One for /boot and another for / -- a common configuration. They must be referenced individually when a backup is executed. The /dev/nst0 refers to the first SCSI tape, but in a non-rewind mode. This ensures that the volumes are put back-to-back on the tape.

An interesting feature of dump is its built-in incremental backup functionality. In the example above, the 0 indicates a level 0, or base-level, backup. This is the full system backup that you would do periodically to capture the entire system. On subsequent backups you can use other numbers (1-9) in place of the 0 to change the level of the backup. A level 1 backup would save all of the files that had changed since the level 0 backup was done. Level 2 would backup everything that had changed from level 1 and so on. The same function can be done with tar, using scripting, but it requires the script creator to have a mechanism to determine when the last backup was done. dump has its own mechanism, writing an update file (/etc/dumpupdates) when it performs a backup. The update file is reset whenever a level 0 backup is run. Subsequent levels leave their mark until another level 0 is done. If you are doing a tape-based backup, dump will automatically track multiple volumes.

Skipping files

It is possible to mark files and directories to be skipped by dump. The command for this is chattr, which changes the extended attributes on ext2 and ext3 file systems.

chattr +d <filename>

The above command adds a flag to a file to tell dump to skip it when doing a backup.


Restoring with restore

To restore information saved with dump, the restore command is used. Like tar, dump has the ability to list (-t) and compare archives to current files (-C). Where you must be careful with dump is in restoring data. There are two very different approaches, and you must use the correct one to have predictable results.

Rebuild (-r)

Remember that dump is designed with file systems in mind more than individual files. Therefore, there are two different styles of restoring files. To rebuild a file system, use the -r switch. Rebuild is designed to work on an empty file system and restore it back to the saved state. Before running rebuild, you should have created, formatted, and mounted the file system. You should not run rebuild on a file system that contains files.

Here is an example of doing a full rebuild from the dump that we executed above.

restore -rf /dev/nst0

The above command needs to be run for each file system being restored.

This process could be repeated to add the incremental backups if required.

Extract (-x)

If you need to work with individual files, rather than full file systems, you must use the -x switch to extract them. For example, to extract only the /etc directory from our tape backup, use the following command:

restore -xf /dev/nst0 /etc

Interactive restore (-i)

One more feature that restore provides is an interactive mode. Using the command:

restore -if /dev/nst0

will place you in an interactive shell, showing the items contained in the archive. Typing "help" will give you a list of commands. You can then browse and select the items you wish to be extracted. Bear in mind that any files that you extract will go into your current directory.


dump vs. tar

Both dump and tar have their followings. Both have advantages and disadvantages. If you are running anything but an ext2 or ext3 file system, then dump is not available to you. However, if this is not the case, dump can be run with a minimum of scripting, and has interactive modes available to assist with restoration.

I tend to use tar, because I am fond of scripting for that extra level of control. There are also multi-platform tools for working with .tar files.


Other tools

Virtually any program that can copy files can be used to perform some sort of backup in Linux. There are references to people using cpio and dd for backups. cpio is another packaging utility along the lines of tar. It is much less common. dd is a file system copy utility that makes binary copies of file systems. dd might be used to make an image of a hard drive, similar to using a product like Symantec's Ghost. However, dd is not file based, so you can only restore data to an identical hard drive partition.


Commercial backup products

There are several commercial backup products available for Linux. Commercial products generally provide a convenient interface and reporting system, whereas with tools such as dump and tar, you have to roll your own. The commercial offerings are broad and offer a range of features. The biggest benefit you will gain from using a commercial package is a pre-built strategy for handling backups that you can just put to work. Commercial developers have already made many of the mistakes that you are about to, and the cost of their wisdom is cheap compared to the loss of your precious data.

Tivoli Storage Manager

Probably the best commercial backup and storage management utility available now for Linux is the Tivoli Storage Manager. Tivoli Storage Manager Server runs on several platforms, including Linux, and the client runs on many more platforms.

Essentially a Storage Manager Server is configured with the devices appropriate to back up the environment. Any system that is to participate in the backups loads a client that communicates with the server. Backups can be scheduled, performed manually from the Tivoli Storage Manager client interface, or performed remotely using a Web-based interface.

The policy-based nature of TSM means that central rules can be defined for backup behavior without having to constantly adjust a file list. Additionally, IBM Tivoli Storage Resource Manager can identify, evaluate, control, and predict the utilization of enterprise storage assets, and can detect potential problems and automatically apply self-healing adjustments. See the Tivoli Web site (see the link in the Resources section) for more details.

Figure 1. Tivoli Storage Manager menu
Figure 1. Tivoli Storage Manager menu

Backups and restores are then handled through the remote device.

Figure 2. Tivoli Storage Manager interface
Figure 2. Tivoli Storage Manager interface

Go forth and back up

The first step to having a good backup is to have a plan. Know the data that you need to preserve and what your recovery strategy needs to be. Then use the tools that best meet that strategy.

Linux comes with some useful backup tools right out of the box. The two most common are tar and dump/restore. Both are capable of doing full system backups. Using creative scripting, you can design a custom backup scheme to back up systems both locally and remotely.

However, writing your own backup scripts can be a large responsibility, especially when it is a complicated enterprise. Commercial software, such as the Tivoli Storage Manager, cuts across the learning curve and lets you take immediate control of your backups, but you may have to adjust your strategy to fit what the tools can do.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=11356
ArticleTitle=Windows-to-Linux roadmap: Part 8. Backup and recovery
publish-date=11112003