System Administration Toolkit: Build intelligent, unattended scripts

Look at how to create scripts that are able to record their output, trap and identify errors, and recover from errors and problems so that they either run correctly or fail with a suitable error message and report. Building scripts and running them automatically is a task that every good administrator has to handle, but how do you handle the error output and make intelligent decisions about how the script should handle these errors? This article addresses these issues.

Share:

Martin Brown (mc@mcslp.com), Freelance Writer, Consultant

Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms—Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more—as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.



03 July 2007

Also available in Chinese

About this series

The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command-line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.

The unattended script problem

There are many issues around executing unattended scripts—that is, scripts that you run either automatically through a service like cron or at commands.

The default mode of cron and at commands, for example, is for the output of the script to be captured and then emailed to the user that ran the script. You don't always want the user to get the email that cron sends by default (especially if everything ran fine)—sometimes the user who ran the script and the person actually responsible for monitoring that output are different.

Therefore, you need better methods for trapping and identifying errors within the script, better methods for communicating problems, and optional successes to the appropriate person.

Getting the scripts set up correctly is vital; you need to ensure that the script is configured in such a way that it's easy to maintain and that the script runs effectively. You also need to be able to trap errors and output from programs and ensure the security and validity of the environment in which the script executes. Read along to find out how to do all of this.

Setting up the environment

Before getting into the uses of unattended scripts, you need to make sure that you have set up your environment properly. There are various elements that need to be explicitly configured as part of your script, and taking the time to do this not only ensures that your script runs properly, but it also makes the script easier to maintain.

Some things you might need to think about include:

  • Search path for applications
  • Search path for libraries
  • Directory locations
  • Creating directories or paths
  • Common files

Some of these elements are straightforward enough to organize. For example, you can set the path using the following in most Bourne-compatible shells (sh, Bash, ksh, and zsh):

PATH=/usr/bin:/bin:/usr/sbin

For directory and file locations, just set a variable at the header of the script. You can then use the variable in each place where you would have used the filename. For example, when writing to a log file, you might use Listing 1.

Listing 1. Writing a log file
LOGFILE=/tmp/output.log

do_something >>$LOGFILE
do_another >>$LOGFILE

By setting the name once and then using the variable, you ensure that you don't get the filename wrong, and if you need to change the filename name, you only need to change the name once.

Using a single filename and variable also makes it very easy to create a complex filename. For example, adding a date to your log filename is made easier by using the date command with a format specification:

DATE='date +%Y%m%d.%H%M'

The above command creates a string containing the date in the format YYYYMMDD.HHMM, for example, 20070524.2359. You can insert that date variable into a filename so that your log file is tagged according to the date it was created.

If you are not using a date/time unique identifier in the log filename, it's a good idea to insert some other unique identifier in case two scripts are run simultaneously. If your script is writing to the same file from two different processes, you will end up either with corrupted information or missing information.

All shells support a unique shell ID, based on the shell process ID, and are accessible through the special $$ variable name. By using a global log variable, you can easily create a unique file to be used for logging:

LOGFILE=/tmp/$$.err

You can also apply the same global variable principles to directories:

LOGDIR=/var/log/my_app

To ensure that the directories are created, use the -p option for mkdir to create the entire path of the directory you want to use:

mkdir -p $LOGDIR

Fortunately, this format won't complain if the directories already exist, which makes it ideal for running in an unattended script.

Finally, it is generally a good idea to use full path names rather than localized paths in your unattended scripts so that you can use the previous principles together.

Listing 2. Using full path names in unattended scripts
DATE='date +%Y%m%d.%H%M'
LOGDIR=/usr/local/mcslp/logs/rsynclog
mkdir -p $LOGDIR
LOGNAME=$LOGDIR/$DATE.log

Now that you've set up the environment, let's look at how you can use these principles to help with the general, unattended scripts.

Writing a log file

Probably the simplest improvement you can make to your scripts is to write the output from your script to a log file. You might not think this is necessary, but the default operation of cron is to save the output from the script or command that was executed, and then email it to the user who owned the crontab or at job.

This is less than perfect for a number of reasons. First of all, the configured user that might be running the script might not be the same as the real person that needs to handle the output. You might be running the script as root, even though the output of the script or command when run needs to go to somebody else. Setting up a general filter or redirection won't work if you want to send the output of different commands to different users.

The second reason is a more fundamental one. Unless something goes wrong, it's not necessary to receive the output from a script . The cron daemon sends you the output from stdout and stderr, which means that you get a copy of the output, even if the script executed successfully.

The final reason is about the management and organization of the information and output generated. Email is not always an efficient way of recording and tracking the output from the scripts that are run automatically. Maybe you just want to keep an archive of the log file that was a success or email a copy of the error log in the event of a problem.

Writing out to a log file can be handled in a number of different ways. The most straightforward way is to redirect output to a file for each command (see Listing 3).

Listing 3. Redirecting output to a file
cd /shared
rsync --delete --recursive . /backups/shared >$LOGFILE

If you want to combine error and standard output into a single file, use numbered redirection (see Listing 4).

Listing 4. Combining error and standard output into a single file
cd /shared
rsync --delete --recursive . /backups/shared >$LOGFILE 2>&1

Listing 4 writes out the information to the same log file.

You might also want to write out the information to separate files (see Listing 5).

Listing 5. Writing out information to separate files
cd /shared
rsync --delete --recursive . /backups/shared >$LOGFILE 2>$ERRFILE

For multiple commands, the redirections can get complex and repetitive. You must ensure, for example, that you are appending, not overwriting, information to the log file (see Listing 6).

Listing 6. Appending information to the log file
cd /etc
rsync --delete --recursive . /backups/etc >>$LOGFILE >>$ERRFILE

A simpler solution, if your shell supports it, is to use an inline block for a group of commands, and then to redirect the output from the block as a whole. The result is that you can rewrite the lines in Listing 7 using the structure in Listing 8.

Listing 7. Logging in long form
cd /shared
rsync --delete --recursive . /backups/shared >$LOGFILE 2>$ERRFILE

cd /etc
rsync --delete --recursive . /backups/etc >>$LOGFILE 2>>$ERRFILE

Listing 8 shows an inline block for grouping commands.

Listing 8. Logging using a block
{
    cd /shared
    rsync --delete --recursive . /backups/shared 

    cd /etc
    rsync --delete --recursive . /backups/etc

} >$LOGFILE 2>$ERRFILE

The enclosing braces imply a subshell so that all the commands in the block are executed as if part of a separate process (although no secondary shell is created, the enclosing block is just treated as a different logical environment). Using the subshell, you can collectively redirect their standard and error output for the entire block instead of for each individual command.

Trapping errors and reporting them

One of the main advantages of the subshell is that you can place a wrapper around the main content of the script, redirect the errors, and then send a formatted email with the status of the script execution.

For example, Listing 9 shows a more complete script that sets up the environment, executes the actual commands and bulk of the process, traps the output, and then sends an email with the output and error information.

Listing 9. Using a subshell for emailing a more useful log
LOGFILE=/tmp/$$.log
ERRFILE=/tmp/$$.err
ERRORFMT=/tmp/$$.fmt

{
    set -e

    cd /shared
    rsync --delete --recursive . /backups/shared

    cd /etc
    rsync --delete --recursive . /backups/etc

} >$LOGFILE 2>$ERRFILE

{
    echo "Reported output"
    echo
    cat /tmp/$$.log
    echo "Error output"
    echo
    cat /tmp/$$.err
} >$ERRORFMT 2>&1

mailx -s 'Log output for backup' root <$ERRORFMT

rm -f $LOGFILE $ERRFILE $ERRORFMT

If you use the subshell trick and your shell supports shell options (Bash, ksh, and zsh), then you might want to optionally set some shell options to ensure that the block is terminated correctly on an error. For example, the -e (errexit) option within Bash ensures that the shell terminates when a simple command (for example, any external command called through the script) causes immediate termination of the shell.

In Listing 9, for example, if the first rsync failed, then the subshell would just continue and run the next command. However, there are times when you want to stop the moment a command fails because continuing could be more damaging. By setting errexit, the subshell immediately terminates when the first command stops.

Setting options and ensuring security

Another issue with automated scripts is ensuring the security of the script and, in particular, ensuring that script does not fail because of bad configuration. You can use shell options for this process.

Other options you might want to set in a shell-independent manner (and the richer the shell, the better, as a rule, at trapping these instances). In the Bash shell, for example, -u ensures that any unset variables are treated as an error. This can be useful to ensure that an unattended script does not try to execute when a required variable has not been configured correctly.

The -C option (noclobber) ensures that files are not overwritten if they already exist, and it can prevent the script from overwriting files it shouldn't have access too (for example, the system files), unless the script has the correct commands to delete the original file first.

Each of these options can be set using the set command (see Listing 10).

Listing 10. Using the set command to set options
set -e
set -C

You can use a plus sign before the option to disable it.

Another area where you might want to improve the security and environment of your script is to use resource limits. Resource limits can be set by the ulimit command, which is generally specific to the shell, and enable you to limit the size of files, cores, memory use, and even the duration of the script to ensure that the script does not run away with itself.

For example, you can set CPU time in seconds using the following command:

ulimit -t 600

Although ulimit does not offer complete protection, it helps in those scripts where the potential for the script to run away with itself, or a program to suddenly use a large amount of memory, might become a problem.

Capturing faults

You have already seen how to trap errors, output, and create logs that can be emailed to the appropriate person when they occur, but what if you want to be more specific about the errors and responses?

Two tools are useful here. The first is the return status from a command, and the second is the trap command within your shell.

The return status from a command can be used to identify whether a particular command ran correctly, or whether it generated some sort of error. The exact meaning for a specific return status code is unique to a particular command (check the man pages), but a generally accepted principle is that an error code of zero means that the command executed correctly.

For example, imagine that you want to trap an error when trying to create a directory. You can check the $? variable after mkdir and then email the output, as shown in Listing 11.

Listing 11. Trapping return status
ERRLOG=/tmp/$$.err

mkdir /tmp 2>>$ERRLOG
if [ $? -ne 0 ]
then
    mailx -s "Script failed when making directory" admin <$ERRLOG
    exit 1
fi

Incidentally, you can use the return status code information inline by chaining commands with the && or || symbols to act as an and, or, or type statement. For example, say you want to ensure that the directory gets created and the command gets executed but, if the directory is not created, the command does not get executed. You could do that using an if statement (see Listing 12).

Listing 12. Ensuring that a directory is created before executing a command
mkdir /tmp/out
if [ $? -eq 0 ]
then
    do_something
fi

You can modify Listing 12 into a single line:

mkdir /tmp/out && do_something

The above statement basically reads, "Make a directory and, if it completes successfully, also run the command." In essence, only do the second command if the first completes correctly.

The || symbol works in the opposite way; if the first command does not complete successfully, then execute the second. This can be useful for trapping situations where a command would raise an error, but instead provides an alternative solution. For example, when changing to a directory, you might use the line:

cd /tmp/out || mkdir /tmp/out

This line of code tries to change the directory and, if it fails, (probably because the directory does not exist), you make it. Furthermore, you can combine these statements together. In the previous example, of course, what you want to do is change to the directory, or create it and then change to that directory if it doesn't already exist. You can write that in one line as:

cd /tmp/out || mkdir /tmp/out && cd /tmp/out

The trap command is a more generalized solution for trapping more serious errors based on the signals raised when a command fails, such as core dump, memory error, or when a command has been forcibly terminated by a kill command.

To use trap, you specify the command or function to be executed when the signal is trapped, and the signal number or numbers that you want to trap, as shown here in Listing 13.

Listing 13. Trapping signals
function catch_trap
{
    echo "killed" mailx -s "Signal trapped" admin
}

trap catch_trap 1 2 3 4 5 6 7 8 9 10 11

sleep 9000

You can trap any signal in this way and it can be a good way of ensuring that a program that crashes out is caught and trapped effectively and reported.

Identifying reportable errors

Throughout this article, you've looked at ways of trapping errors, saving the output, and recording issues so that they can be dealt with and reported. However, what if the script or commands that you are using naturally output error information that you want to be able to use and report on but that you don't always want to know about?

There is no easy solution to this problem, but you can use a combination of the techniques shown in this article to log errors and information, read or filter the information, and mail and report or display it accordingly.

A simple way to do this is to choose which parts of the command that you output and report to the logs. Alternatively, you can post-process the logs to select or filter out the output that you need.

For example, say you have a script that builds a document in the background using the Formatting Objects Processor (FOP) system from Apache to generate a PDF version of the document. Unfortunately in the process, a number of errors are generated about hyphenation. These are errors that you know about, but they don't affect the output quality. In the script that generates the file, just filter out these lines from the error log:

sed -e '/hyphenation/d' <error.log >mailerror.log

If there were no other errors, the mailerror.log file will be empty, and email is sent with the error information.

Summary

In this article, you've looked at how to run commands in an unattended script, captured their output, and monitored the execution of different commands in the script. You can log the information in many ways, for example, on a command-by-command or global basis, and check and report on the progress.

For error trapping, you can monitor output and result codes, and you can even set up global traps that identify problems and trap them during execution for reporting purposes. The result is a range of options that handle and report problems for scripts that are running on their own and where their ability to recover from errors and problems is critical.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=237522
ArticleTitle=System Administration Toolkit: Build intelligent, unattended scripts
publish-date=07032007