The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command-line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.
There are many issues around executing unattended scripts—that is,
scripts that you run either automatically through a service like cron or
at commands.
The default mode of cron and at commands, for example,
is for the output of the script to be captured and then emailed to the user that
ran the script. You don't always want the user to get the email that cron sends by
default (especially if everything ran fine)—sometimes the user who
ran the script and the person actually responsible for monitoring that output are
different.
Therefore, you need better methods for trapping and identifying errors within the script, better methods for communicating problems, and optional successes to the appropriate person.
Getting the scripts set up correctly is vital; you need to ensure that the script is configured in such a way that it's easy to maintain and that the script runs effectively. You also need to be able to trap errors and output from programs and ensure the security and validity of the environment in which the script executes. Read along to find out how to do all of this.
Before getting into the uses of unattended scripts, you need to make sure that you have set up your environment properly. There are various elements that need to be explicitly configured as part of your script, and taking the time to do this not only ensures that your script runs properly, but it also makes the script easier to maintain.
Some things you might need to think about include:
- Search path for applications
- Search path for libraries
- Directory locations
- Creating directories or paths
- Common files
Some of these elements are straightforward enough to organize. For example, you can set the path using the following in most Bourne-compatible shells (sh, Bash, ksh, and zsh):
PATH=/usr/bin:/bin:/usr/sbin |
For directory and file locations, just set a variable at the header of the script. You can then use the variable in each place where you would have used the filename. For example, when writing to a log file, you might use Listing 1.
Listing 1. Writing a log file
LOGFILE=/tmp/output.log
do_something >>$LOGFILE
do_another >>$LOGFILE
|
By setting the name once and then using the variable, you ensure that you don't get the filename wrong, and if you need to change the filename name, you only need to change the name once.
Using a single filename and variable also makes it very easy to create a complex
filename. For example, adding a date to your log filename is made easier by using
the date command with a format specification:
DATE='date +%Y%m%d.%H%M' |
The above command creates a string containing the date in the format YYYYMMDD.HHMM, for example, 20070524.2359. You can insert that date variable into a filename so that your log file is tagged according to the date it was created.
If you are not using a date/time unique identifier in the log filename, it's a good idea to insert some other unique identifier in case two scripts are run simultaneously. If your script is writing to the same file from two different processes, you will end up either with corrupted information or missing information.
All shells support a unique shell ID, based on the shell process ID, and are accessible through the special $$ variable name. By using a global log variable, you can easily create a unique file to be used for logging:
LOGFILE=/tmp/$$.err |
You can also apply the same global variable principles to directories:
LOGDIR=/var/log/my_app |
To ensure that the directories are created, use the -p
option for mkdir to create the entire path of the directory you want to use:
mkdir -p $LOGDIR |
Fortunately, this format won't complain if the directories already exist, which makes it ideal for running in an unattended script.
Finally, it is generally a good idea to use full path names rather than localized paths in your unattended scripts so that you can use the previous principles together.
Listing 2. Using full path names in unattended scripts
DATE='date +%Y%m%d.%H%M'
LOGDIR=/usr/local/mcslp/logs/rsynclog
mkdir -p $LOGDIR
LOGNAME=$LOGDIR/$DATE.log
|
Now that you've set up the environment, let's look at how you can use these principles to help with the general, unattended scripts.
Probably the simplest improvement you can make to your scripts is to write the output from your script to a log file. You might not think this is necessary, but the default operation of cron is to save the output from the script or command that was executed, and then email it to the user who owned the crontab or at job.
This is less than perfect for a number of reasons. First of all, the configured user that might be running the script might not be the same as the real person that needs to handle the output. You might be running the script as root, even though the output of the script or command when run needs to go to somebody else. Setting up a general filter or redirection won't work if you want to send the output of different commands to different users.
The second reason is a more fundamental one. Unless something goes wrong, it's not necessary to receive the output from a script . The cron daemon sends you the output from stdout and stderr, which means that you get a copy of the output, even if the script executed successfully.
The final reason is about the management and organization of the information and output generated. Email is not always an efficient way of recording and tracking the output from the scripts that are run automatically. Maybe you just want to keep an archive of the log file that was a success or email a copy of the error log in the event of a problem.
Writing out to a log file can be handled in a number of different ways. The most straightforward way is to redirect output to a file for each command (see Listing 3).
Listing 3. Redirecting output to a file
cd /shared
rsync --delete --recursive . /backups/shared >$LOGFILE
|
If you want to combine error and standard output into a single file, use numbered redirection (see Listing 4).
Listing 4. Combining error and standard output into a single file
cd /shared
rsync --delete --recursive . /backups/shared >$LOGFILE 2>&1
|
Listing 4 writes out the information to the same log file.
You might also want to write out the information to separate files (see Listing 5).
Listing 5. Writing out information to separate files
cd /shared
rsync --delete --recursive . /backups/shared >$LOGFILE 2>$ERRFILE
|
For multiple commands, the redirections can get complex and repetitive. You must ensure, for example, that you are appending, not overwriting, information to the log file (see Listing 6).
Listing 6. Appending information to the log file
cd /etc
rsync --delete --recursive . /backups/etc >>$LOGFILE >>$ERRFILE
|
A simpler solution, if your shell supports it, is to use an inline block for a group of commands, and then to redirect the output from the block as a whole. The result is that you can rewrite the lines in Listing 7 using the structure in Listing 8.
Listing 7. Logging in long form
cd /shared
rsync --delete --recursive . /backups/shared >$LOGFILE 2>$ERRFILE
cd /etc
rsync --delete --recursive . /backups/etc >>$LOGFILE 2>>$ERRFILE
|
Listing 8 shows an inline block for grouping commands.
Listing 8. Logging using a block
{
cd /shared
rsync --delete --recursive . /backups/shared
cd /etc
rsync --delete --recursive . /backups/etc
} >$LOGFILE 2>$ERRFILE
|
The enclosing braces imply a subshell so that all the commands in the block are executed as if part of a separate process (although no secondary shell is created, the enclosing block is just treated as a different logical environment). Using the subshell, you can collectively redirect their standard and error output for the entire block instead of for each individual command.
Trapping errors and reporting them
One of the main advantages of the subshell is that you can place a wrapper around the main content of the script, redirect the errors, and then send a formatted email with the status of the script execution.
For example, Listing 9 shows a more complete script that sets up the environment, executes the actual commands and bulk of the process, traps the output, and then sends an email with the output and error information.
Listing 9. Using a subshell for emailing a more useful log
LOGFILE=/tmp/$$.log
ERRFILE=/tmp/$$.err
ERRORFMT=/tmp/$$.fmt
{
set -e
cd /shared
rsync --delete --recursive . /backups/shared
cd /etc
rsync --delete --recursive . /backups/etc
} >$LOGFILE 2>$ERRFILE
{
echo "Reported output"
echo
cat /tmp/$$.log
echo "Error output"
echo
cat /tmp/$$.err
} >$ERRORFMT 2>&1
mailx -s 'Log output for backup' root <$ERRORFMT
rm -f $LOGFILE $ERRFILE $ERRORFMT
|
If you use the subshell trick and your shell supports shell options (Bash, ksh,
and zsh), then you might want to optionally set some shell options to ensure that
the block is terminated correctly on an error. For example, the
-e (errexit) option within Bash ensures that the shell
terminates when a simple command (for example, any external command called through
the script) causes immediate termination of the shell.
In Listing 9, for example, if the first rsync failed, then the subshell would just continue and run the next command. However, there are times when you want to stop the moment a command fails because continuing could be more damaging. By setting errexit, the subshell immediately terminates when the first command stops.
Setting options and ensuring security
Another issue with automated scripts is ensuring the security of the script and, in particular, ensuring that script does not fail because of bad configuration. You can use shell options for this process.
Other options you might want to set in a shell-independent manner (and the richer
the shell, the better, as a rule, at trapping these instances). In the Bash shell,
for example, -u ensures that any unset variables are
treated as an error. This can be useful to ensure that an unattended script does
not try to execute when a required variable has not been configured correctly.
The -C option (noclobber) ensures that files are not
overwritten if they already exist, and it can prevent the script from overwriting
files it shouldn't have access too (for example, the system files), unless the
script has the correct commands to delete the original file first.
Each of these options can be set using the set command
(see Listing 10).
Listing 10. Using the set command to set options
set -e
set -C
|
You can use a plus sign before the option to disable it.
Another area where you might want to improve the security and environment of your
script is to use resource limits. Resource limits can be set by the
ulimit command, which is generally specific to the
shell, and enable you to limit the size of files, cores, memory use, and even the
duration of the script to ensure that the script does not run away with itself.
For example, you can set CPU time in seconds using the following command:
ulimit -t 600 |
Although ulimit does not offer complete protection, it helps in those scripts where the potential for the script to run away with itself, or a program to suddenly use a large amount of memory, might become a problem.
You have already seen how to trap errors, output, and create logs that can be emailed to the appropriate person when they occur, but what if you want to be more specific about the errors and responses?
Two tools are useful here. The first is the return status from a command, and the
second is the trap command within your shell.
The return status from a command can be used to identify whether a particular command ran correctly, or whether it generated some sort of error. The exact meaning for a specific return status code is unique to a particular command (check the man pages), but a generally accepted principle is that an error code of zero means that the command executed correctly.
For example, imagine that you want to trap an error when trying to create a directory. You can check the $? variable after mkdir and then email the output, as shown in Listing 11.
Listing 11. Trapping return status
ERRLOG=/tmp/$$.err
mkdir /tmp 2>>$ERRLOG
if [ $? -ne 0 ]
then
mailx -s "Script failed when making directory" admin <$ERRLOG
exit 1
fi
|
Incidentally, you can use the return status code information inline by chaining
commands with the && or || symbols to act as an
and, or, or
type statement. For example, say you want to ensure
that the directory gets created and the command gets executed but, if the
directory is not created, the command does not get executed. You could do that
using an if statement (see
Listing 12).
Listing 12. Ensuring that a directory is created before executing a command
mkdir /tmp/out
if [ $? -eq 0 ]
then
do_something
fi
|
You can modify Listing 12 into a single line:
mkdir /tmp/out && do_something |
The above statement basically reads, "Make a directory and, if it completes successfully, also run the command." In essence, only do the second command if the first completes correctly.
The || symbol works in the opposite way; if the first command does not complete successfully, then execute the second. This can be useful for trapping situations where a command would raise an error, but instead provides an alternative solution. For example, when changing to a directory, you might use the line:
cd /tmp/out || mkdir /tmp/out |
This line of code tries to change the directory and, if it fails, (probably because the directory does not exist), you make it. Furthermore, you can combine these statements together. In the previous example, of course, what you want to do is change to the directory, or create it and then change to that directory if it doesn't already exist. You can write that in one line as:
cd /tmp/out || mkdir /tmp/out && cd /tmp/out |
The trap command is a more generalized solution for
trapping more serious errors based on the signals raised when a command fails,
such as core dump, memory error, or when a command has been forcibly terminated by
a kill command.
To use trap, you specify the command or function to be executed when the signal is trapped, and the signal number or numbers that you want to trap, as shown here in Listing 13.
Listing 13. Trapping signals
function catch_trap
{
echo "killed" mailx -s "Signal trapped" admin
}
trap catch_trap 1 2 3 4 5 6 7 8 9 10 11
sleep 9000
|
You can trap any signal in this way and it can be a good way of ensuring that a program that crashes out is caught and trapped effectively and reported.
Throughout this article, you've looked at ways of trapping errors, saving the output, and recording issues so that they can be dealt with and reported. However, what if the script or commands that you are using naturally output error information that you want to be able to use and report on but that you don't always want to know about?
There is no easy solution to this problem, but you can use a combination of the techniques shown in this article to log errors and information, read or filter the information, and mail and report or display it accordingly.
A simple way to do this is to choose which parts of the command that you output and report to the logs. Alternatively, you can post-process the logs to select or filter out the output that you need.
For example, say you have a script that builds a document in the background using the Formatting Objects Processor (FOP) system from Apache to generate a PDF version of the document. Unfortunately in the process, a number of errors are generated about hyphenation. These are errors that you know about, but they don't affect the output quality. In the script that generates the file, just filter out these lines from the error log:
sed -e '/hyphenation/d' <error.log >mailerror.log |
If there were no other errors, the mailerror.log file will be empty, and email is sent with the error information.
In this article, you've looked at how to run commands in an unattended script, captured their output, and monitored the execution of different commands in the script. You can log the information in many ways, for example, on a command-by-command or global basis, and check and report on the progress.
For error trapping, you can monitor output and result codes, and you can even set up global traps that identify problems and trap them during execution for reporting purposes. The result is a range of options that handle and report problems for scripts that are running on their own and where their ability to recover from errors and problems is critical.
Learn
-
System Administration Toolkit:
Check out other parts in this series.
- "System Administration Toolkit: Time and event management"
(Martin Brown, developerWorks, May 2006): This article covers the creation and
organization of time scripts using cron and at.
- Read
Wikipedia pages on crontab.
- "The road to better programming: Chapter 11. Crontab management with cfperl"
(Teodor Zlatanov, developerWorks, June 2003): Find out how entries can be added or
deleted easily.
- "Scheduling recurring tasks in Java"
(Tom White, developerWorks, November 2003): Find out how to build a simple,
general scheduling framework for task execution conforming to an arbitrarily
complex schedule.
- Find out how to program in Bash:
"Bash by example,
Part 1: Fundamental programming in the Bourne again shell (bash)"
(Daniel Robbins, developerWorks, March 2000)
"Bash by example,
Part 2: More bash programming fundamentals"
(Daniel Robbins, developerWorks, April 2000)
Bash by example,
Part 3: Exploring the ebuild system"
(Daniel Robbins, developerWorks, May 2000)
- "Making UNIX and
Linux work together"
(Martin Brown, developerWorks, April 2006): This article is an excellent guide to
getting traditional UNIX distributions and Linux® working together.
-
IBM Redbooks:
Different systems use different tools, and Solaris to Linux Migration: A Guide
for System Administrators helps you identify some key tools.
- "Exploring the Linux memory model"
(Vikram Shukla, developerWorks, January 2006): This article helps you understand
how Linux uses memory, swap spac, and exchanges pages and processes between the
two.
-
Popular content:
See what AIX® and UNIX content your peers find interesting.
- Check out other articles and tutorials written
by Martin Brown:
-
AIX and
UNIX®
:
The AIX and UNIX developerWorks zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
-
New to AIX and UNIX?:
Visit the "New to AIX and UNIX" page to learn more about AIX and UNIX.
-
AIX 5L™ Wiki:
A collaborative environment for technical information related to AIX.
- Search the AIX and UNIX library by topic:
- System administration
- Application development
- Performance
- Porting
- Security
- Tips
- Tools and utilities
- Java™ technology
- Linux
- Open source
-
Safari bookstore:
Visit this e-reference library to find specific technical resources.
-
developerWorks technical events and webcasts:
Stay current with developerWorks technical events and webcasts.
-
Podcasts: Tune in and
catch up with IBM technical experts.
Get products and technologies
-
Apache FOP (Formatting Objects
Processor):
This is the world's first print formatter driven by XSL formatting objects
(XSL-FO) and the world's first output independent formatter.
-
IBM trial software:
Build your next development project with software for download directly from
developerWorks.
Discuss
- Participate in the
developerWorks blogs
and get involved in the developerWorks community.
- Participate in the AIX and UNIX forums:
- AIX 5L—technical forum
- AIX for Developers Forum
- Cluster Systems Management
- IBM Support Assistant
- Performance Tools—technical
- Virtualization—technical
- More AIX and UNIX forums
Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms—Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more—as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.





