The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command-line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.
Establishing that the configuration files (and if necessary the application files) that make up your UNIX system are the ones you expect can go a long way to ensure your system is up and running without problems. It can also form a key part of your security process—not only by safeguarding that your system is valid, but also by making certain that the security configuration of your system is also secure.
Although your systems should be secure enough that nobody can get in and make any modifications to your configuration files, you need to protect yourself, or at least have the ability to confirm that if someone got in you could verify whether the configuration files (or indeed any file) had been altered by an unauthorized individual.
So how do you verify file content?
Well, there are a number of different parameters that you need to consider. All of the following items have the potential to cause problems if they are changed in a working configuration:
- File contents
- File owner
- Group owner
- File permissions
- Modification time
- Creation time
One simple way would be to keep a copy of the configuration files (and their associated parameters) on a different machine, and then run a regular comparison on the local and remote files. The problem with this method is that it requires a considerable amount of space and, more importantly, a considerable amount of time to record and compare this amount of information. This method can also have a detrimental effect on your ability to compare the information in a timely fashion.
Another often proposed technique is to simply record the file size, modification time, mode, and ownership information. Because the information is much shorter, it is much easier to store and quicker to check and verify compared to the full file contents.
The problem is that the file size provides no real indication that the contents have changed. Consider this one line file:
location:/home/mcbrown |
And the same file, with a line changed:
location:/home/slbrown |
Both files would be 22 characters long, but the content is different, and such a relatively innocent change could have serious consequences, even though a check on the file size would return no difference.
As to the other parameters, modification times, file mode, and other information
can all be modified. You can alter modification times using the
touch command. Even the file creation times can be
faked by changing the time on the machine and recreating the file.
One item that is far more difficult to change on a UNIX system is the file inode number. The inode number is a unique ID given to a file when the file is first created. The inode number is used by the underlying file system drivers to identify the file on the file system. The inode often changes when a file has been edited, because most editors create a new file and write the new contents to that file before removing the old file and renaming the new file to the original name. This makes inode comparison a good indicator of whether the file has been edited.
Recording these pieces of information is still not enough; you need an efficient method of comparing the file content. Probably the best method of this is to use the file checksum.
Creating a checksum for a file is a classic way of comparing whether the content of the file is different without having to physically compare every byte of each file.
The way a checksum works is to use an algorithm over the file contents. The algorithm generates an almost unique fingerprint of the file contents. You can accomplish this task in many different ways. For example, you can add the value of each byte together to use an algorithm that applies a complex calculation to the individual bits or bit groups of a given file. The exact methods used are beyond the scope of this article and are dependent on which checksum tool you use.
UNIX includes a simple checksum command, sum. This
command is very basic, but it provides a reasonably unique number that you can use
to identify the difference in most files. However, there are limits to this
algorithm. Many modern solutions provide the md5
command. The latter produces a 128-bit fingerprint of a file and can theoretically
generate a unique signature for any file of any size.
The md5 algorithm for generating checksum information was originally developed to generate a unique fingerprint for a file before the file was encrypted so that the validity of the decrypted file could be guaranteed. Checksums generated by md5 can be represented either as a binary string, hex string, or base64 encoded string. The latter format is used in MIME email messages to ensure that different attachments in a file are uniquely identified.
Because there are command-line solutions for creating checksum information, you can create a checksum of any file right on the command line. A good demonstration of how unique the checksum information can be is to use the file samples demonstrated earlier that had the same physical length and content that differed by only characters.
You can obtain the checksum for both files in one command, as shown here in Listing 1.
Listing 1. Obtaining the checksum for both files in one command
$ sum old new
50093 1 old
62381 1 new
|
Even though only two characters are different in Listing 1, you still get significantly different checksum figures. Listing 2 shows the same files, this time checked with md5.
Listing 2. Checking the files with md5
$ md5 old new
MD5 (old) = 602f604720d3b57925e99bcaa7d931a4
MD5 (new) = c3f06c217a0f26c16f8d030837d8718b
|
Here the checksums are significantly different and there should be no doubt that the files in question differ in some way.
Another solution for creating checksums is to use Perl to generate the checksum
information. There is a module available for Perl,
Digest::MD5, which can generate MD5 checksums from any
string of data or from a supplied file.
Listing 3 shows a simple script that returns the MD5 checksum for a file supplied on the command line as a hex string (identical to the format shown in Listing 2).
Listing 3. Script that returns the MD5 checksum
use Digest::MD5;
use IO::File;
my $chk = Digest::MD5->new();
foreach my $file (@ARGV)
{
$chk->addfile(IO::File->new($file));
print "$file -> ",$chk->hexdigest,"\n";
}
|
You can run the script on the same files as before, and you should get identical information, as shown here in Listing 4.
Listing 4. Running Digest::MD5 on the same files
$ simpmd5.pl old new
old -> 602f604720d3b57925e99bcaa7d931a4
new -> c3f06c217a0f26c16f8d030837d8718b
|
To make this process useful, you need to record the information into a file so that you can compare the information again later. Before you do that, add the other information you want to compare (modification times, file sizes, ownership, inodes, and so forth) into the stored data.
Adding other data to your report
The Perl stat() function can obtain a whole heap of
information from a given file, most of which you can use. The list of information
that you can obtain from the file is shown here in Listing 5.
Listing 5. Perl
stat() function
0 dev device number of filesystem
1 ino inode number
2 mode file mode (type and permissions)
3 nlink number of (hard) links to the file
4 uid numeric user ID of file's owner
5 gid numeric group ID of file's owner
6 rdev the device identifier (special files only)
7 size total size of file, in bytes
8 atime last access time in seconds since the epoch
9 mtime last modify time in seconds since the epoch
10 ctime inode change time in seconds since the epoch (*)
11 blksize preferred block size for file system I/O
12 blocks actual number of blocks allocated
|
You can record nearly all of this information, but some of it is useless to use because it either changes too regularly, or it is not consistent across reboots. The following fields should probably be ignored:
- rdev—Because this is unique to special files only (usually devices or pipes), it can probably be ignored.
- atime—The last access time of the file changes each time a files is accessed. This means that the file is likely to change even though the file has never been modified in any way. Recording that information could lead to false positives in the difference report.
- blksize—The block size used for file system I/O. Although this is unlikely to change, other factors than a file modification could lead to a change in this value, so recording it on a file-by-file basis is pointless.
- blocks—The number of blocks allocated for the file on the file system. This information is specific to a file, but if you are also recording the file size, then recording both is probably overkill.
These fields are useful to record for some specific reasons:
- dev—The device number of the file system should be consistent across reboots, providing you do not regularly mount and unmount the file systems. If the file systems are mounted in the same order on each reboot, then the device number should be consistent.
- nlink—The number of hard links to a file can help to identify whether someone has created a hard link to the file in a location where they can overwrite the file and bypass the permissions of the original. You cannot have a hard link to a file with different ownership and permissions than the original.
- ctime—The inode change time will be altered either based on when the file was created or when the ownership or mode information was altered. If this value has changed, then it might indicate that these values were altered, even if they were later returned to their normal values.
Listing 6 shows a script that writes out the file path, checksum, and other data to the standard output, separating each field of information with a colon. For the checksum, you not only checksum the file content, but you also add the other information into the checksum data so that just by comparing the checksum alone you can determine if there was a difference.
Listing 6. Writing out the file path, checksum and other data to standard output
#!/usr/local/bin/perl
use Digest::MD5;
use IO::File;
use strict;
use File::Find ();
my $chksumfile = 'chksums.dat';
use vars qw/*name *dir *prune/;
*name = *File::Find::name;
*dir = *File::Find::dir;
*prune = *File::Find::prune;
File::Find::find({wanted => \&wanted}, $ARGV[0]);
sub wanted {
next unless (-f $name);
my $fileinfo = genchksuminfo($name);
printf ("%s\n",$fileinfo);
}
sub genchksuminfo
{
my ($file) = @_;
my $chk = Digest::MD5->new();
my (@statinfo) = stat($file);
$chk->add(@statinfo[0,1,2,3,4,5,7,9,10]);
$chk->addfile(IO::File->new($file));
return sprintf("%s:%s:%s",
$file,$chk->hexdigest,
join(':',@statinfo[0,1,2,3,4,5,9,10]));
}
|
The script uses the File::Find module in Perl, which
traverses a directory and finds every file and directory from the base point. For
each file, the wanted() function is called and, in that
function for each file, the genchksuminfo() function
is called. That gets the information with stat() and
creates the file path, checksum, and other information as a line and returns it.
In this script, that information is just printed out to the standard output.
The command accepts the directory to be scanned, so you can generate the checksum information. For the /etc, you would use the command shown in Listing 7.
Listing 7. Scanning /etc
$ perl savemd5.pl /etc
/private/etc/6to4.conf:e6b1ba3e7683a0df9be21c9e9f5d1f6a:234881026:46788:
33188:1:0:0:1152674600:1155914028
/private/etc/afpovertcp.cfg:dc7c89b0626d6e603131902d387816f7:234881026:30152:
33188:1:0:0:1151780398:1166194017
/private/etc/aliases:de483c306c03f35dcbd45d609f8e68ce:234881026:47440:
33188:1:0:0:1151828538:1155914028
/private/etc/aliases.db:aa95ae673dcb6ba89684a6f4bbe3dba5:234881026:47437:
33188:1:0:0:1151828588:1155914028
/private/etc/authorization:39f7938ae1df629d422b27ec1a17f3dd:234881026:950752:
33188:1:0:0:1162503594:1162503594
/private/etc/auto.mnt:3da7579cdc03c529059a42de51c6679e:234881026:1013554:
33188:1:0:0:1162728759:1162728759
/private/etc/auto.mnt~:54d856aa344d03a6084d63c9dd7e1d9c:234881026:1013530:
33188:1:0:0:1162728576:1162728576
/private/etc/bashrc:fb23bdcacf23f69f1ce92e3b910c03b9:234881026:42880:
33188:1:0:0:1151805563:1155914028
/private/etc/compilers:363c62792a79df85cd0c8d71ff274495:234881026:821586:
33188:1:0:0:1159026690:1162503150
/private/etc/crontab:b9af1eb506bd68a43465789174bfe5e1:234881026:29678:
33188:1:0:0:1151800085:1166193736
...
|
The final stage to the process is to store the information and provide a way of comparing the current situation with the stored one.
Validating checksum information
The final script builds on the script in Listing 6. The script expands significantly from the original script, incorporating a number of new features:
- A command-line option that parses using the
Getopt::Longmodule. This enables you to support specifying the chksumfile (storage of checksums and other information you calculate), whether or not you compare the new information with the old (by reading the contents of the chksumfile), and the ability to specify the base directory to be searched. If you compare the file, the data will be updated and only the differences will be reported. - A
loadchksumdata()function to load and parse an existing data file in a way that allows you to easily compare the new information with the old. - A gendiff
report()function that actually compares the individual fields of the stored information and the current information to tell you what has changed. This function is only called if you determine that there has been a difference of some kind.
Listing 8. Final script#!/usr/local/bin/perl use Digest::MD5; use IO::File; use strict; use File::Find (); use Getopt::Long; my $chksumfile = 'chksums.dat'; my $compare = 0; my $basedir = '/etc'; use vars qw/*name *dir *prune/; *name = *File::Find::name; *dir = *File::Find::dir; *prune = *File::Find::prune; GetOptions("chksumfile=s" => \$chksumfile, "compare" => \$compare, "basedir=s" => \$basedir); my $chksumdata = {}; if ($compare) { loadchksumdata($chksumfile); } my $outfile = ''; if (!$compare) { $outfile = IO::File->new($chksumfile,"w"); } File::Find::find({wanted => \&wanted}, $basedir); if ($compare) { foreach my $file (keys %{$chksumdata}) { print STDERR "Couldn't find $file, but have the info on record\n"; } } sub loadchksumdata { my ($file) = @_; open(DATA,$file) or die "Cannot open check sum file $file: $!\n"; while(<DATA>) { chomp; my ($filename,$rest) = split(/:/,$_,2); $chksumdata->{$filename} = $_; } close(DATA); } sub wanted { next unless (-f $name); my $fileinfo = genchksuminfo($name); if ($compare) { if (exists($chksumdata->{$name})) { if ($chksumdata->{$name} ne $fileinfo) { print STDERR "Warning: $name differs from that on record\n"; gendiffreport($chksumdata->{$name}, $fileinfo); } delete($chksumdata->{$name}); } else { print STDERR "Warning: Couldn't find $name in existing records\n"; } } else { printf $outfile ("%s\n",$fileinfo); } } sub gendiffreport { my ($orig,$curr) = @_; my @fields = qw/filename chksum device inode mode nlink uid gid size mtime ctime/; my @origfields = split(/:/,$orig); my @currfields = split(/:/,$curr); for(my $i=0;$i<scalar @origfields;$i++) { if ($origfields[$i] ne $currfields[$i]) { print STDERR "\t$fields[$i] differ; was $origfields[$i], now $currfields[$i]\n"; } } } sub genchksuminfo { my ($file) = @_; my $chk = Digest::MD5->new(); my (@statinfo) = stat($file); $chk->add(@statinfo[0,1,2,3,4,5,7,9,10]); $chk->addfile(IO::File->new($file)); return sprintf("%s:%s:%s", $file,$chk->hexdigest, join(':',@statinfo[0,1,2,3,4,5,9,10])); }
To use the script, you first need to generate a file with the base checksum and other data that will act as your base comparison file. For example, to create a checksum data file for the /etc directory, you might use the following command line:
$ genmd5.pl --basedir=/etc --chksumfile=etc-chksum.dat |
Now that you have the information, if you edit a file and then re-run the script, you should get a report of the differences. In Listing 9, you can see the results when the /etc/hosts file is edited.
Listing 9. Results when the /etc/hosts file is edited
$ genmd5.pl --basedir /private/etc --compare
Warning: /private/etc/hosts differs from that on record
chksum differ; was d4a23fcdaa835d98ede1875503273ce6,
now beb50782b3fd998f35786b1e6f503d1b
inode differ; was 4879566, now 4879581
size differ; was 1186929905, now 1186930065
mtime differ; was 1186929905, now 1186930065
Couldn't find /private/etc/hosts~, but have the info on record
|
Note that you report both the differences in an individual file and the fact that a file has been deleted. If a new file had been created, the difference would have been reported as well.
Using the script in Listing 6 generates a file that you can use to test and verify the validity of a system. Of course, the very fact that the file exists means you have to store that information securely, otherwise anybody could update the information, including any unauthorized individual that happens to use your machine and alter the files you want to secure.
There is no hard and fast rule for this information, but it should be clear that storing the file you create on the same machine that you generated it on is probably a bad idea—once it has been located, it could be altered. The same is true of storing the file on another machine on the same network. Once the file has been located, it could be subverted and altered. The best solution is probably to write the file to a CD or DVD that can be kept completely separate from the machine.
The problem with this solution is that you must keep the information up to date. Each time you legitimately update or alter the files that you are monitoring, you must update the checksum file.
Although this makes the process somewhat more difficult, the benefits of the security information that the file provides can be incalculable.
In this article, you developed a script that you can use to generate information that checks the validity of a file or directory full of files. The recorded information includes the file path, a checksum of the file so that you can compare the file contents, and unique information about the file (inode, permissions, ownership information) so that you can identify differences should they occur.
How you make use of this script is up to you. You could store the information and run the script regularly to identify problems as soon as they happened, or you could use the file as a post-mortem tool in the event of a problem to find out which files had been changed so that you have a list of files to be examined.
Learn
-
System Administration Toolkit:
Check out other parts in this series.
-
System Administration Toolkit: Standardizing your UNIX command-line tools
(Martin Brown, developerWorks, May 2006): Read this article and learn how to use the same command
across multiple machines.
-
System Administration Toolkit: Time and event management
(Martin Brown, developerWorks, May 2006): The article covers the creation and organization of
time scripts using cron and at.
-
Scheduling recurring tasks in Java
(Tom White, developerWorks, November 2003): Read this article and earn how to build a simple, general
scheduling framework for task execution conforming to an arbitrarily complex
schedule.
-
Wikipedia pages on crontab:
Browse through additional information on contab.
-
The road to better programming: Chapter 11. Crontab management with cfperl
(Teodor Zlatanov, developerWorks, June 2003): Discover how crontab entries can be added or deleted
easily in series on developing a cfegine interpreter written in Perl.
-
Scheduling recurring tasks in Java
(Tom White, developerWorks, November 2003): Examine how to build a
simple, general scheduling framework for task execution conforming to an
arbitrarily complex schedule.
- For an article series that teaches you how
to program in bash, see:
- Bash by example, Part 1: Fundamental programming in the Bourne again shell (bash) (Daniel Robbins, developerWorks, March 2000)
- Bash by example, Part 2: More bash programming fundamentals (Daniel Robbins, developerWorks, April 2000)
- Bash by example, Part 3: Exploring the ebuild system (Daniel Robbins, developerWorks, May 2000).
-
Making
UNIX and Linux work together (Martin Brown, developerWorks, April 2006): This article is a guide to getting
traditional UNIX distributions and Linux® working together.
-
IBM Redbooks:
Different systems use different tools, and the Solaris to Linux Migration: A Guide
for System Administrators helps you identify some key tools.
-
Exploring the Linux memory model
(Vikram Shukla, developerWorks, January 2006): This article helps you understand how Linux uses
memory, swap space and exchanges pages, and processes between the two.
-
Popular content:
See what AIX® and UNIX content your peers find interesting.
- Check out other articles and tutorials written
by Martin Brown:
-
AIX and
UNIX:
The AIX and UNIX developerWorks zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
-
New to AIX and UNIX?:
Visit the "New to AIX and UNIX" page to learn more about AIX and UNIX.
-
AIX Wiki:
A collaborative environment for technical information related to AIX.
- Search the AIX and UNIX library by topic:
- System administration
- Application development
- Performance
- Porting
- Security
- Tips
- Tools and utilities
- Java™ technology
- Linux
- Open source
-
Safari bookstore:
Visit this e-reference library to find specific technical resources.
-
developerWorks technical events and webcasts:
Stay current with developerWorks technical events and webcasts.
-
Podcasts: Tune in and
catch up with IBM technical experts.
Get products and technologies
-
IBM trial software:
Build your next development project with software for download directly from
developerWorks.
Discuss
- Participate in the
developerWorks blogs
and get involved in the developerWorks community.
- Participate in the AIX and UNIX forums:
- AIX —technical forum
- AIX 6 Open Beta
- AIX for Developers Forum
- Cluster Systems Management
- IBM Support Assistant
- Performance Tools—technical
- Virtualization—technical
- More AIX and UNIX forums
Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms—Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more—as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.
Comments (Undergoing maintenance)





