Skip to main content

Systems Administration Toolkit: Testing system validity

Martin Brown (mc@mcslp.com), Freelance Writer, Consultant
Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms—Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more—as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.

Summary:  Examine methods of storing and later checking the validity of your configuration files. Despite all the security systems you have in place, it is still possible that somebody has accessed your system and changed your configuration or security settings.

Date:  11 Sep 2007
Level:  Intermediate
Activity:  2932 views

About this series

The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command-line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.

Verifying file content

Establishing that the configuration files (and if necessary the application files) that make up your UNIX system are the ones you expect can go a long way to ensure your system is up and running without problems. It can also form a key part of your security process—not only by safeguarding that your system is valid, but also by making certain that the security configuration of your system is also secure.

Although your systems should be secure enough that nobody can get in and make any modifications to your configuration files, you need to protect yourself, or at least have the ability to confirm that if someone got in you could verify whether the configuration files (or indeed any file) had been altered by an unauthorized individual.

So how do you verify file content?

Well, there are a number of different parameters that you need to consider. All of the following items have the potential to cause problems if they are changed in a working configuration:

  • File contents
  • File owner
  • Group owner
  • File permissions
  • Modification time
  • Creation time

One simple way would be to keep a copy of the configuration files (and their associated parameters) on a different machine, and then run a regular comparison on the local and remote files. The problem with this method is that it requires a considerable amount of space and, more importantly, a considerable amount of time to record and compare this amount of information. This method can also have a detrimental effect on your ability to compare the information in a timely fashion.

Another often proposed technique is to simply record the file size, modification time, mode, and ownership information. Because the information is much shorter, it is much easier to store and quicker to check and verify compared to the full file contents.

The problem is that the file size provides no real indication that the contents have changed. Consider this one line file:

location:/home/mcbrown

And the same file, with a line changed:

location:/home/slbrown

Both files would be 22 characters long, but the content is different, and such a relatively innocent change could have serious consequences, even though a check on the file size would return no difference.

As to the other parameters, modification times, file mode, and other information can all be modified. You can alter modification times using the touch command. Even the file creation times can be faked by changing the time on the machine and recreating the file.

One item that is far more difficult to change on a UNIX system is the file inode number. The inode number is a unique ID given to a file when the file is first created. The inode number is used by the underlying file system drivers to identify the file on the file system. The inode often changes when a file has been edited, because most editors create a new file and write the new contents to that file before removing the old file and renaming the new file to the original name. This makes inode comparison a good indicator of whether the file has been edited.

Recording these pieces of information is still not enough; you need an efficient method of comparing the file content. Probably the best method of this is to use the file checksum.

The file checksum

Creating a checksum for a file is a classic way of comparing whether the content of the file is different without having to physically compare every byte of each file.

The way a checksum works is to use an algorithm over the file contents. The algorithm generates an almost unique fingerprint of the file contents. You can accomplish this task in many different ways. For example, you can add the value of each byte together to use an algorithm that applies a complex calculation to the individual bits or bit groups of a given file. The exact methods used are beyond the scope of this article and are dependent on which checksum tool you use.

UNIX includes a simple checksum command, sum. This command is very basic, but it provides a reasonably unique number that you can use to identify the difference in most files. However, there are limits to this algorithm. Many modern solutions provide the md5 command. The latter produces a 128-bit fingerprint of a file and can theoretically generate a unique signature for any file of any size.

The md5 algorithm for generating checksum information was originally developed to generate a unique fingerprint for a file before the file was encrypted so that the validity of the decrypted file could be guaranteed. Checksums generated by md5 can be represented either as a binary string, hex string, or base64 encoded string. The latter format is used in MIME email messages to ensure that different attachments in a file are uniquely identified.

Creating checksums for files

Because there are command-line solutions for creating checksum information, you can create a checksum of any file right on the command line. A good demonstration of how unique the checksum information can be is to use the file samples demonstrated earlier that had the same physical length and content that differed by only characters.

You can obtain the checksum for both files in one command, as shown here in Listing 1.


Listing 1. Obtaining the checksum for both files in one command
                
$ sum old new
50093 1 old
62381 1 new

Even though only two characters are different in Listing 1, you still get significantly different checksum figures. Listing 2 shows the same files, this time checked with md5.


Listing 2. Checking the files with md5
                
$ md5 old new
MD5 (old) = 602f604720d3b57925e99bcaa7d931a4
MD5 (new) = c3f06c217a0f26c16f8d030837d8718b

Here the checksums are significantly different and there should be no doubt that the files in question differ in some way.

Another solution for creating checksums is to use Perl to generate the checksum information. There is a module available for Perl, Digest::MD5, which can generate MD5 checksums from any string of data or from a supplied file.

Listing 3 shows a simple script that returns the MD5 checksum for a file supplied on the command line as a hex string (identical to the format shown in Listing 2).


Listing 3. Script that returns the MD5 checksum
                
use Digest::MD5;
use IO::File;

my $chk = Digest::MD5->new();

foreach my $file (@ARGV)
{
    $chk->addfile(IO::File->new($file));

    print "$file -> ",$chk->hexdigest,"\n";
}

You can run the script on the same files as before, and you should get identical information, as shown here in Listing 4.


Listing 4. Running Digest::MD5 on the same files
                
$ simpmd5.pl old new
old -> 602f604720d3b57925e99bcaa7d931a4
new -> c3f06c217a0f26c16f8d030837d8718b

To make this process useful, you need to record the information into a file so that you can compare the information again later. Before you do that, add the other information you want to compare (modification times, file sizes, ownership, inodes, and so forth) into the stored data.

Adding other data to your report

The Perl stat() function can obtain a whole heap of information from a given file, most of which you can use. The list of information that you can obtain from the file is shown here in Listing 5.


Listing 5. Perl stat() function
                
 0 dev      device number of filesystem
 1 ino      inode number
 2 mode     file mode  (type and permissions)
 3 nlink    number of (hard) links to the file
 4 uid      numeric user ID of file's owner
 5 gid      numeric group ID of file's owner
 6 rdev     the device identifier (special files only)
 7 size     total size of file, in bytes
 8 atime    last access time in seconds since the epoch
 9 mtime    last modify time in seconds since the epoch
10 ctime    inode change time in seconds since the epoch (*)
11 blksize  preferred block size for file system I/O
12 blocks   actual number of blocks allocated

You can record nearly all of this information, but some of it is useless to use because it either changes too regularly, or it is not consistent across reboots. The following fields should probably be ignored:

  • rdev—Because this is unique to special files only (usually devices or pipes), it can probably be ignored.
  • atime—The last access time of the file changes each time a files is accessed. This means that the file is likely to change even though the file has never been modified in any way. Recording that information could lead to false positives in the difference report.
  • blksize—The block size used for file system I/O. Although this is unlikely to change, other factors than a file modification could lead to a change in this value, so recording it on a file-by-file basis is pointless.
  • blocks—The number of blocks allocated for the file on the file system. This information is specific to a file, but if you are also recording the file size, then recording both is probably overkill.

These fields are useful to record for some specific reasons:

  • dev—The device number of the file system should be consistent across reboots, providing you do not regularly mount and unmount the file systems. If the file systems are mounted in the same order on each reboot, then the device number should be consistent.
  • nlink—The number of hard links to a file can help to identify whether someone has created a hard link to the file in a location where they can overwrite the file and bypass the permissions of the original. You cannot have a hard link to a file with different ownership and permissions than the original.
  • ctime—The inode change time will be altered either based on when the file was created or when the ownership or mode information was altered. If this value has changed, then it might indicate that these values were altered, even if they were later returned to their normal values.

Listing 6 shows a script that writes out the file path, checksum, and other data to the standard output, separating each field of information with a colon. For the checksum, you not only checksum the file content, but you also add the other information into the checksum data so that just by comparing the checksum alone you can determine if there was a difference.


Listing 6. Writing out the file path, checksum and other data to standard output
                
#!/usr/local/bin/perl

use Digest::MD5;
use IO::File;
use strict;
use File::Find ();

my $chksumfile = 'chksums.dat';

use vars qw/*name *dir *prune/;
*name   = *File::Find::name;
*dir    = *File::Find::dir;
*prune  = *File::Find::prune;

File::Find::find({wanted => \&wanted}, $ARGV[0]);

sub wanted {
    next unless (-f $name);

    my $fileinfo = genchksuminfo($name);

    printf ("%s\n",$fileinfo);
}

sub genchksuminfo
{
    my ($file) = @_;

    my $chk = Digest::MD5->new();

    my (@statinfo) = stat($file);
    
    $chk->add(@statinfo[0,1,2,3,4,5,7,9,10]);
    $chk->addfile(IO::File->new($file));
    return sprintf("%s:%s:%s",
                   $file,$chk->hexdigest,
                   join(':',@statinfo[0,1,2,3,4,5,9,10]));
}

The script uses the File::Find module in Perl, which traverses a directory and finds every file and directory from the base point. For each file, the wanted() function is called and, in that function for each file, the genchksuminfo() function is called. That gets the information with stat() and creates the file path, checksum, and other information as a line and returns it. In this script, that information is just printed out to the standard output.

The command accepts the directory to be scanned, so you can generate the checksum information. For the /etc, you would use the command shown in Listing 7.


Listing 7. Scanning /etc
                
$ perl savemd5.pl /etc
/private/etc/6to4.conf:e6b1ba3e7683a0df9be21c9e9f5d1f6a:234881026:46788:
               33188:1:0:0:1152674600:1155914028
/private/etc/afpovertcp.cfg:dc7c89b0626d6e603131902d387816f7:234881026:30152:
               33188:1:0:0:1151780398:1166194017
/private/etc/aliases:de483c306c03f35dcbd45d609f8e68ce:234881026:47440:
              33188:1:0:0:1151828538:1155914028
/private/etc/aliases.db:aa95ae673dcb6ba89684a6f4bbe3dba5:234881026:47437:
              33188:1:0:0:1151828588:1155914028
/private/etc/authorization:39f7938ae1df629d422b27ec1a17f3dd:234881026:950752:
              33188:1:0:0:1162503594:1162503594
/private/etc/auto.mnt:3da7579cdc03c529059a42de51c6679e:234881026:1013554:
              33188:1:0:0:1162728759:1162728759
/private/etc/auto.mnt~:54d856aa344d03a6084d63c9dd7e1d9c:234881026:1013530:
              33188:1:0:0:1162728576:1162728576
/private/etc/bashrc:fb23bdcacf23f69f1ce92e3b910c03b9:234881026:42880:
              33188:1:0:0:1151805563:1155914028
/private/etc/compilers:363c62792a79df85cd0c8d71ff274495:234881026:821586:
              33188:1:0:0:1159026690:1162503150
/private/etc/crontab:b9af1eb506bd68a43465789174bfe5e1:234881026:29678:
              33188:1:0:0:1151800085:1166193736
...

The final stage to the process is to store the information and provide a way of comparing the current situation with the stored one.

Validating checksum information

The final script builds on the script in Listing 6. The script expands significantly from the original script, incorporating a number of new features:

  • A command-line option that parses using the Getopt::Long module. This enables you to support specifying the chksumfile (storage of checksums and other information you calculate), whether or not you compare the new information with the old (by reading the contents of the chksumfile), and the ability to specify the base directory to be searched. If you compare the file, the data will be updated and only the differences will be reported.
  • A loadchksumdata() function to load and parse an existing data file in a way that allows you to easily compare the new information with the old.
  • A gendiff report() function that actually compares the individual fields of the stored information and the current information to tell you what has changed. This function is only called if you determine that there has been a difference of some kind.


    Listing 8. Final script
                            
    #!/usr/local/bin/perl 
    
    use Digest::MD5;
    use IO::File;
    use strict;
    use File::Find ();
    use Getopt::Long;
    
    my $chksumfile = 'chksums.dat';
    my $compare = 0;
    my $basedir = '/etc';
    
    use vars qw/*name *dir *prune/;
    *name   = *File::Find::name;
    *dir    = *File::Find::dir;
    *prune  = *File::Find::prune;
    
    GetOptions("chksumfile=s" => \$chksumfile,
               "compare" => \$compare,
               "basedir=s" => \$basedir);
    
    my $chksumdata = {};
    
    if ($compare)
    {
        loadchksumdata($chksumfile);
    }
    
    my $outfile = '';
    
    if (!$compare)
    {
        $outfile = IO::File->new($chksumfile,"w");
    }
    
    File::Find::find({wanted => \&wanted}, $basedir);
    
    if ($compare)
    {
        foreach my $file (keys %{$chksumdata})
        {
            print STDERR "Couldn't find $file, but have the info on record\n";
        }
    }
    
    sub loadchksumdata
    {
        my ($file) = @_;
    
        open(DATA,$file) or die "Cannot open check sum file $file: $!\n";
        while(<DATA>)
        {
            chomp;
            my ($filename,$rest) = split(/:/,$_,2);
            $chksumdata->{$filename} = $_;
        }
        close(DATA);
    }
    
    sub wanted {
        next unless (-f $name);
    
        my $fileinfo = genchksuminfo($name);
    
        if ($compare)
        {
            if (exists($chksumdata->{$name}))
            {
                if ($chksumdata->{$name} ne $fileinfo)
                {
                    print STDERR "Warning: $name differs from that on record\n";
                    gendiffreport($chksumdata->{$name}, $fileinfo);
                }
                delete($chksumdata->{$name});
            }
            else
            {
                print STDERR "Warning: Couldn't find $name in existing records\n";
            }
        }
        else
        {
            printf $outfile ("%s\n",$fileinfo);
        }
    }
    
    sub gendiffreport
    {
        my ($orig,$curr) = @_;
    
        my @fields = qw/filename chksum device inode mode nlink uid gid size mtime ctime/;
    
        my @origfields = split(/:/,$orig);
        my @currfields = split(/:/,$curr);
    
        for(my $i=0;$i<scalar @origfields;$i++)
        {
            if ($origfields[$i] ne $currfields[$i])
            {
                print STDERR "\t$fields[$i] differ; was $origfields[$i], 
    			     now $currfields[$i]\n";
            }
        }
    
    }
    
    sub genchksuminfo
    {
        my ($file) = @_;
    
        my $chk = Digest::MD5->new();
    
        my (@statinfo) = stat($file);
    
        $chk->add(@statinfo[0,1,2,3,4,5,7,9,10]);
        $chk->addfile(IO::File->new($file));
        return sprintf("%s:%s:%s",
                       $file,$chk->hexdigest,
                       join(':',@statinfo[0,1,2,3,4,5,9,10]));
    }
    

To use the script, you first need to generate a file with the base checksum and other data that will act as your base comparison file. For example, to create a checksum data file for the /etc directory, you might use the following command line:

$ genmd5.pl --basedir=/etc --chksumfile=etc-chksum.dat

Now that you have the information, if you edit a file and then re-run the script, you should get a report of the differences. In Listing 9, you can see the results when the /etc/hosts file is edited.


Listing 9. Results when the /etc/hosts file is edited
                
$ genmd5.pl --basedir /private/etc --compare
Warning: /private/etc/hosts differs from that on record
        chksum differ; was d4a23fcdaa835d98ede1875503273ce6, 
		                now beb50782b3fd998f35786b1e6f503d1b
        inode differ; was 4879566, now 4879581
        size differ; was 1186929905, now 1186930065
        mtime differ; was 1186929905, now 1186930065
Couldn't find /private/etc/hosts~, but have the info on record

Note that you report both the differences in an individual file and the fact that a file has been deleted. If a new file had been created, the difference would have been reported as well.

Using the checksum data

Using the script in Listing 6 generates a file that you can use to test and verify the validity of a system. Of course, the very fact that the file exists means you have to store that information securely, otherwise anybody could update the information, including any unauthorized individual that happens to use your machine and alter the files you want to secure.

There is no hard and fast rule for this information, but it should be clear that storing the file you create on the same machine that you generated it on is probably a bad idea—once it has been located, it could be altered. The same is true of storing the file on another machine on the same network. Once the file has been located, it could be subverted and altered. The best solution is probably to write the file to a CD or DVD that can be kept completely separate from the machine.

The problem with this solution is that you must keep the information up to date. Each time you legitimately update or alter the files that you are monitoring, you must update the checksum file.

Although this makes the process somewhat more difficult, the benefits of the security information that the file provides can be incalculable.

Summary

In this article, you developed a script that you can use to generate information that checks the validity of a file or directory full of files. The recorded information includes the file path, a checksum of the file so that you can compare the file contents, and unique information about the file (inode, permissions, ownership information) so that you can identify differences should they occur.

How you make use of this script is up to you. You could store the information and run the script regularly to identify problems as soon as they happened, or you could use the file as a post-mortem tool in the event of a problem to find out which files had been changed so that you have a list of files to be examined.


Resources

Learn

Get products and technologies

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

About the author

Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms—Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more—as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=254636
ArticleTitle=Systems Administration Toolkit: Testing system validity
publish-date=09112007
author1-email=mc@mcslp.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers