Skip to main content

System Administration Toolkit: Backing up key information

Martin Brown (mc@mcslp.com), Freelance Writer, Freelance Developer
Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more -- as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.

Summary:  Most UNIX® administrators have processes in place to back up the data and information on their UNIX machines, but what about the configuration files and other elements that provide the configuration data your machines need to operate? This article provides detailed information on techniques for achieving an effective and efficient backup system for these key files.

View more content in this series

Date:  15 Aug 2006
Level:  Intermediate
Activity:  423 views
Comments:  

About this series

The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command-line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.


Identifying key files

Your top priority when organizing backups of your UNIX system is the data that they contain. Whether it is databases, development source files, or any other type of so-called user generated information, it is vital to back up this data to ensure that, if there is a failure or other problem, you can restore the data and get back to work.

There are, however, a larger number of files and information that exist on your system that technically aren't user data, but these files might take a considerable amount of time to recreate, or reconfigure. How long, for example, would it take you to reconfigure a server or recreate the Domain Name System (DNS) files for your domain?

A full backup -- in other words, one where you copy all of the files from your system -- would obviously capture everything, but it can be an expensive way of backing up your information. You should be able to create an effective backup by picking and choosing specific files that configure, generate, or support the information and applications.

On a UNIX or Linux® system, the vast majority of configuration files for the system are located in the /etc directory, but you should consider the full list of potential files (and probably locations) for backup, including:

  • Main configuration directory (/etc)
  • DNS domain information (/var/bind)
  • NIS/NIS+ files and configuration (/var/yp)
  • Apache or other Web server configuration (/var/apache, /etc/apache or /usr/local/apache)
  • Mail files or folders (/var/mail and /usr/mail)
  • Lightweight Directory Access Protocol (LDAP) server data (/var/ldap or /usr/local/ldap)
  • Security certificates
  • Custom kernel drivers
  • Kernel configuration or build configuration and parameters
  • License keys and serial numbers
  • Custom scripts and applications
  • User/root login scripts
  • Mail configuration; particularly if you use a solution, such as Cyrus Internet Message Access Protocol (IMAP), where user mail folders are specially recorded and indexed

Other files and sources will be dependent on the system and environment, but it shouldn't take too long to develop a list of the key configuration files that would seriously affect your company, or system, if they were lost.


Collating data for storage

Although it is tempting to simply back up the data directly from its source location, copying the backup data to a separate directory before it is backed up enables you to be more selective about the files that are copied, and it also gives you the flexibility to choose a suitable backup method. With the files in one location, you can back up to tape, disk, or copy the contents to another machine without having to reorganize the source files.

Reconfiguration of the files that are backed up to any destination only requires changing the script that collates and copies the files to the backup preparation directory. Because you have a local and immediate copy of the information, restoring data in the event of a failure can be quick and straightforward, and you still retain the ability to back up the information to tape, disk, or another system.

How information is collated is important, because different solutions imply different storage requirements, techniques, and the facilities available for restoration.


Methods for recording the information

You have a vast array of methods for actually backing up and storing your information. Obvious choices are to back up the files to a traditional medium, such as tape. A simpler solution, one that offers a number of benefits and pitfalls, is to copy the relevant information to another machine on your network. The critical element to any successful backup solution is to have a copy of the important information in another location. It is largely irrelevant whether that is another physical device, a removable storage solution, or another machine altogether.

The removable storage solution (tape, disk, or even USB) is the most reliable backup from a disaster recovery standpoint, in that the storage can be kept offsite in a different location. This protects you from a catastrophe, such as fire or theft, at the location where your computers are kept.

Using storage on another machine means that the backup data is basically online and available. Restoring from a backup in this situation can be as straightforward as copying the files back to your server in the event of a failure, or copying them over to a replacement system in the event of a system failure.

When using a second computer to store your backups, it's a good idea to have not only a copy of the files locally but, if possible, use a directly available offsite location for the files. That machine can either be another computer on the Internet, a machine on your WAN, or at another site. This provides the necessary redundancy and safety.

Using a professional or commercial backup solution requires you to reinstall the software before restoring from a backup, and certain configuration information and key files might be required or useful before that software is installed.


Using tar to store backup data

Some of the most straightforward methods for storing the information can be a tar, cpio, or other archive file type. When using this method, it is a good idea to date the files and create a simple backup script that generates suitably named files. As a counter to this, you also need a method of deleting backups that you no longer need (for example, those older than a specific period).

Listing 1 shows a simple script that creates tarred and compressed (using bzip2) backups of individual directories. The backup files are created on a Network File System (NFS) share to a remote system that holds a copy of the backup.


Listing 1. Creating tarred and compressed backups of individual directories

#!/bin/bash

DATE=`date +%Y%m%d.%H%M`
HOST=`hostname`

TEMP=/mnt/backupprepare

echo "Preparing backup..."

cd $TEMP

files=`/usr/local/mcslp/filesbydate.pl notlast5days $HOST*`
if [ -n "$files" ]
then
    echo "Deleting old files: $files"
    rm $files
fi

cd /etc
tar cf - ./* |bzip2 -9 - >$TEMP/$HOST-etc.$DATE.tar.bz2

cd /var/bind
tar cf - ./* |bzip2 -9 - >$TEMP/$HOST-bind.$DATE.tar.bz2

cd /export/home/webs
tar cf - ./* |bzip2 -9 - >$TEMP/$HOST-webs.$DATE.tar.bz2

cd /etc/apache2
tar cf - ./* |bzip2 -9 - >$TEMP/$HOST-webconfig.$DATE.tar.bz2

The DATE variable is generated using the date command and creates a filename of the form 20060627.2200, or 10PM on the 27th of June 2006. To make the backup script portable, all files are created with a prefix containing the name of the host on which the files were created so that you can easily backup multiple hosts to the same location, and the TEMP directory is the destination for each backup.

A separate Perl script is used to determine what files in the backup preparation directory can be deleted. You will examine that script shortly. In this script, you specify that you want to keep files for the last five days -- in other words, the script selects files not created in the last five days, based on the date specification you use in the filenames for the backup files.

The actual backup process is a simple tar command, combined with bzip2 to compress the files. Because the files generated could be quite large, you might want to adapt this to only choose files that might have changed within a certain period. You can enable this by using find to select the files you want (see Listing 2).


Listing 2. tar command with bzip2 to compress files

tar cf - `find . -type f -mtime -1` | bzip2 -9 -
 >$TEMP/$HOST-webconfig.$DATE.tar.bz2

You specify that only files are selected in the find statement to prevent changes to directories triggering the inclusion of files that have not changed into the tar file. This includes files with reference to the current directory, as shown in Listing 3.


Listing 3. Referencing the current directory

$ cd /etc
$ tar cf etc.tar ./*

This avoids specifying the directory explicitly, as shown in Listing 4.


Listing 4. Avoid specifying the directory explicitly

$ tar cf etc.tar /etc

This ensures that recovered files can be put into a spare directory, and not the live location.

The script for deleting old files uses the filename, extracts the embedded date and time, and works out whether the files are within a specified limit, for example, within or not within a specific number of days (see Listing 5).


Listing 5. Script for deleting old files

#!/usr/local/bin/perl

my $choice = shift;
my @files = @ARGV;

my @selection;

if ($choice =~ /thismonth/)
{
    my ($day,$mon,$year) = dateaslist();
    my $match = sprintf('%04d%02d',$year,$mon);
    foreach my $file (@files)
    {
        if ($file =~ m/$match/ && $choice eq 'thismonth')
        {
            push @selection,$file;
        }
        elsif ($file !~ m/$match/ && $choice eq 'notthismonth')
        {
            push @selection,$file;
        }
    }
}
elsif ($choice =~ /today/)
{
    my ($day,$mon,$year) = dateaslist();
    my $match = sprintf('%04d%02d%02d',$year,$mon,$day);
    foreach my $file (@files)
    {
        if ($file =~ m/$match/ && $choice eq 'today')
        {
            push @selection,$file;
        }
        elsif ($file !~ m/$match/ && $choice eq 'nottoday')
        {
            push @selection,$file;
        }
    }
}
elsif ($choice =~ /last(\d+)days/)
{
    my $days = $1;
    my ($day,$mon,$year) = dateaslist(time()-($1*24*3600));
    my $match = sprintf('%04d%02d%02d',$year,$mon,$day);
    my $spec = sprintf('last%ddays',$days);
    my $notspec = sprintf('notlast%ddays',$days);
    foreach my $file (@files)
    {
        my ($date) = ($file =~ m/(\d{8})/);
        push @selection,$file if ($date >= $match && $choice eq
 $spec);
        push @selection,$file if ($date < $match && $choice eq
 $notspec);
    }
}

print join ' ',@selection;

sub dateaslist
{
    my ($time) = @_;
    $time = time() unless defined($time);
    my ($day,$mon,$year) = (localtime($time))[3..5];
    $mon++;
    $year+= 1900;
    return($day,$mon,$year);
}

Using the script, you can pick backup files by a variety of methods (see Listing 6).


Listing 6. Picking backup files

$ filesbydate.pl last5days         # Files created in the last 5 days
$ filesbydate.pl notlast14days  # Files 15 days or older
$ filesbydate.pl nothismonth    # Files not created this month

Remember that the comparison works on the filename, not the file system creation or modification date, and so the script is able to work with files that might have been created overnight.


Using rsync to store backup data

The rsync tool copies an entire directory structure between locations, or machines, using a special algorithm that only transfers changes in the files. This makes it a very efficient method of copying files, particularly between machines, and that means that the backup process completes quickly.

There are two ways you can use rsync, either as a simple synchronization method, in which you can copy all of the critical files to a new drive or system, or as a backup method by copying entire directory trees on a date-by-date basis, in the same way as you created tar backup files in the previous examples.

The former method is quick and easy, but you cannot go back to a specific date in the event of a failure. The latter method provides a date-by-date option, but it requires more management (especially because you need to delete older versions that you no longer need) and obviously a significant amount of space, because the files are not compressed. You do, however, get easier and more direct access.

Setting up rsync is beyond the scope of this article, but once configured, transferring and synchronizing the information is straightforward. Listing 7 shows a script to synchronize files using rsync.


Listing 7. Script to synchronize files using rsync

#!/bin/bash

DESTBASE=admin@atuin:/mnt/backupprepare
HOST=`hostname`

cd /export/data/svn
rsync --stats --rsh=/usr/bin/ssh --delete --recursive --times -og 
--links . $DESTBASE/$HOST/svn

cd /export/home/webs
rsync --stats --rsh=/usr/bin/ssh --delete --recursive --times -og 
--links . $DESTBASE/$HOST/webs

cd /var/bind
rsync --stats --rsh=/usr/bin/ssh --delete --recursive --times -og 
--links . $DESTBASE/$HOST/bind

cd /etc
rsync --stats --rsh=/usr/bin/ssh --delete --recursive --times -og 
--links . $DESTBASE/$HOST/etc

The rsync command options specified are as follows:

  • --stats shows the statistics of the synchronization.
  • --rsh tells rsync to use Secure Shell (SSH) to copy the files (for security).
  • --delete deletes files on the destination that do not exist in the local directory.
  • --recursive ensures the entire directory is examined.
  • --times preserves the file creation and modification times.
  • -og preserves ownership and group information.
  • --links copies links as links, instead of the files they link to.

The DESTBASE variable specifies the base location (in this case, a file system on a remote host), and the HOST variable holds the hostname information so that you can use the same script across multiple hosts for backup.


Backing up the collated data

In the previous sections, you've used tar and rsync to create backups in a separate folder. You could use this collated information as your main backup, especially if the files were located on another machine. Ideally, however, you should also backup these files to another location.

The above scripts, whether tar or rsync, collate the information from multiple directories on multiple hosts into a single location. From this base, you can backup the information further using whatever techniques you prefer, including copying to another machine or device, or by copying to tape or disk.


Keeping a longer-term record

Most backup solutions, however well managed, often rely on rotating and recycling the media, or destination, used to store the data. There are, however, some types of key data where you might want to keep a longer record of the information, and might even want to be able to record information on changes and modifications to the data as part of the backup process.

The period over which you record the information is really limited only by the amount of disk space, or storage, that you have available. Using the date methods, particularly with tar or cpio, means you can keep a longer-term record with little impact on storage. By using a regular full backup (in other words, all the files) and a diff backup where you backup only changes, you can extend the duration still further.


Summary

Backing up key files is about a combination of identifying the files and finding a suitable method for storing and backing up that information. There are many locations for files to be stored, and you should take great care to ensure you cover important, but often forgotten, areas like kernel drivers and libraries and configuration.

Making an effective backup of that information can then be achieved in whatever method is appropriate. Unlike user data, key files often need to be recovered more urgently to get the machine into the correct configuration before the rest of the restoration continues. Having ready access to that information, by using tar or rsync, can often be more effective.


Resources

Learn

Get products and technologies

  • Rsync: This tool is a directory synchronization that copies directories between local and remote systems.

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

About the author

Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more -- as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.

Comments



Trademarks

static.content.url=/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=154245
ArticleTitle=System Administration Toolkit: Backing up key information
publish-date=08152006
author1-email=mc@mcslp.com
author1-email-cc=