Creating new response scripts

The predefined response scripts we provide are general purpose ways of notifying users about an event, or else logging the event information to a file. In addition to these general-purpose scripts, you might want to write you own scripts that provide more specific responses to events.

You might want to do this to create an automatic recovery script that would enable RMC to solve a simple problem automatically. For example when the /tmp directory is over 90 percent full, you could have RMC run a script to automatically delete the oldest unnecessary files in the /tmp directory. Another reason you might want to create your own scripts is to tailor system responses to better suit your particular organization. For example, you might want to create a script that calls your pager when a particular event occurs.

If you want to create your own response scripts, it pays to examine the existing scripts that RSCT provides (as described in Table Table 1 in Creating a response). These scripts are located in the directory /usr/bin/rsct/bin, and can be useful as templates in creating your new scripts, and also illustrate how the script can use ERRM environment variables to obtain information about the event that triggered its execution. For example, suppose that you wanted to create a script that called your pager when particular events occur. You might want to use our predefined script wallevent as a template in creating your new script. This predefined script uses the wall command to write a message to all users who are logged in. You could make a copy of this program, and replace the wall command with a program to contact your pager.
Note: Because our predefined responses use the predefined response scripts, do not modify the original scripts in /usr/bin/rsct/bin. In you want to use an existing script as a template for a new script, copy the file to a new name before making your modifications.

After a condition event occurs, but before the response script executes, ERRM sets a number of environment variables that contain information about the event. The script can check the values of these variables in order to provide the event information to the user. Using the ERRM environment variables, the script can ascertain such information whether it was triggered by the condition event or rearm event, the time the event occurred, the host on which the event occurred, and so on.

Example: The following is a predefined Perl script called wallevent which illustrates the use of the ERRM environment variables. The ERRM environment variables names begin with "ERRM_" and are highlighted in the example.

# main()

PERL=/opt/rsct/perl5/bin/perl

CTMSG=/opt/rsct/bin/ctdspmsg
MSGMAPPATH=/opt/rsct/msgmaps
export MSGMAPPATH

Usage=`$CTMSG script IBM.ERrm.cat MSG_SH_USAGE`

while getopts ":h" opt
do
  case $opt in

    h ) print "Usage: `basename $0` [-h] "
        exit 0;;

    ? ) print "Usage: `basename $0` [-h] "
        exit 3;;
  esac
done

# convert time string
seconds=${ERRM_TIME%,*}

EventTime=$(seconds=$seconds $PERL -e \
'
use POSIX qw(strftime);
print strftime("  

'
)

WallMsg=`$CTMSG script IBM.ERrm.cat MSG_SH_WALLN "$ERRM_COND_SEVERITY" 
"$ERRM_TYPE" "$ERRM_COND_NAME" "$ERRM_RSRC_NAME" 
"$ERRM_RSRC_CLASS_NAME" "$EventTime" "$ERRM_NODE_NAME" 
"$ERRM_NODE_NAMELIST"`

wall "${WallMsg}"


#wall "$ERRM_COND_SEVERITY $ERRM_TYPE occurred for the condition $ERRM_COND_NAME 
on the resource $ERRM_RSRC_NAME of the resource class $ERRM_RSRC_CLASS_NAME at 
$EventTime on $ERRM_NODE_NAME" 

The preceding script uses the ERRM_TIME environment variable to ascertain the time that the event occurred, the ERRM_COND_SEVERITY environment variable to learn the severity of the event, the ERRM_TYPE environment variable to determine if it was the condition event or rearm event that triggered the script's execution, and so on. This information is all included in the message sent to online users.

Table 1 describes the ERRM environment variables that you can use in response scripts. Unless otherwise specified, these environment variables are available for ERRM commands in non-batched event responses, batched event responses, and batched event files.

Table 1. ERRM environment variables
This environment variable... Contains...
ERRM_ATTR_NAME

ERRM_ATTR_NAME_n

The display name of the dynamic attribute used in the expression that caused this event to occur.

This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_ATTR_NUM The number of attributes that are used in the event expression.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_ATTR_PNAME

ERRM_ATTR_PNAME_n

The programmatic name of the attribute used in the expression that caused this event to occur.

This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_BATCH_REASON The reason why the batched event was triggered. The valid values are: 1 (the event batch interval expired), 2 (the maximum number of batching events was reached), 3 (monitoring stopped), 4 (the association between the condition and event response was removed), 5 (the event response was deleted), and 6 (the condition was deleted).

This environment variable is not available for ERRM commands in non-batched event responses or batched event files.

ERRM_COND_BATCH The indication of whether the condition is batching events. The valid values are: 0 (no) and 1 (yes).
ERRM_COND_BATCH_NUM The number of events in the batched event file.

This environment variable is not available for ERRM commands in non-batched event responses or batched event files.

ERRM_COND_HANDLE The resource handle of the condition that caused the event. The format of this value is six hexadecimal integers that are separated by spaces and written as a string, for example:

0x6004 0xffff 0x180031ae 0xe300b8db 0x10f4de7b 0x40a5c5c9
ERRM_COND_MAX_BATCH The maximum number of events that can be batched together, if the condition is batching events. If the value is 0, there is no limit.

This environment variable is not available for ERRM commands in non-batched event responses.

ERRM_COND_NAME The name of the condition that caused the event.
ERRM_COND_SEVERITY The severity of the condition that caused the event. For the severity attribute values of 0, 1, and 2, this environment variable has the following values, respectively: Informational, Warning, and Critical. All other severity attribute values are represented in this environment variable as a decimal string.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_COND_SEVERITYID The severity value of the condition that caused the event. The valid valuea are: 0 (Informational), 1 (Warning), and 2 (Critical).

This environment variable is not available for ERRM commands in batched event responses.

ERRM_DATA_TYPE

ERRM_DATA_TYPE_n

The RMC ct_data_type_t type of the attribute that changed to cause this event. The valid values are: CT_BINARY_PTR, CT_CHAR_PTR, CT_FLOAT32, CT_FLOAT64, CT_INT32, CT_INT64, CT_SD_PTR, CT_UINT32, and CT_UINT64. The actual value of the attribute is stored in the ERRM_VALUE environment variable (except for attributes with a data type of CT_NONE).

This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_ER_HANDLE
The event response resource handle for this event. The format of this value is six hexadecimal integers that are separated by spaces and written as a string, for example:

0x6006 0xffff 0x180031ae 0xe300b8db 0x10ffa192 0xdd39d86b

This environment variable is not available for ERRM commands in batched event files.

ERRM_ER_NAME The name of the event that triggered this event response script.

This environment variable is not available for ERRM commands in batched event files.

ERRM_ERROR_MSG The descriptive message for ERRM_ERROR_NUM, if the value of ERRM_ERROR_NUM is not 0.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_ERROR_NUM The error code from the RMC subsystem for an error event. If the value is 0, an error did not occur.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_EVENT_DETAIL_FILE The file name of where the batched events can be found, if the condition is batching events. This environment variable does not appear in the batched event file.

This environment variable is not available for ERRM commands in non-batched event responses or batched event files.

ERRM_EXPR The condition event expression or rearm event expression that tested True, thus triggering this linked response. The type of event that triggered the linked response is stored in the ERRM_TYPE environment variable.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_NODE_NAME The host name on which this event or rearm event occurred.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_NODE_NAMELIST A list of host names. These are the hosts on which the monitored resource resided when the event occurred.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_RSRC_CLASS_NAME The display name of the resource class containing the attribute that changed, thus causing the event to occur.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_RSRC_CLASS_PNAME The programmatic name of the resource class containing the attribute that changed, thus causing the event to occur.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_RSRC_HANDLE
The resource handle of the resource with the state change that caused this event to be generated. The format of this value is six hexadecimal integers that are separated by spaces and written as a string, for example:

0x6009 0xffff 0x180031ae 0xe300b8db 0x10bee2df 0x33b20837

This environment variable is not available for ERRM commands in batched event responses.

ERRM_RSRC_NAME The name of the resource whose attribute changed, thus causing this event.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_RSRC_TYPE The type of resource that caused the event to occur. The valid values are: 0 (an existing resource), 1 (a new resource), and 2 (a deleted resource).

This environment variable is not available for ERRM commands in batched event responses.

ERRM_SD_DATA_TYPES

ERRM_SD_DATA_TYPES_n

The data type for each element within the structured data (SD) variable, separated by commas. This environment variable is only defined when ERRM_DATA_TYPE is CT_SD_PTR. For example: CT_CHAR_PTR, CT_UINT32_ARRAY, CT_UINT32_ARRAY, CT_UINT32_ARRAY.

This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM.

The ERRM_SD_DATA_TYPES_n environment variable is only defined when the value of ERRM_DATA_TYPE_n is CT_SD_PTR.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_TIME The time the event occurred. The time is written as a decimal string representing the time since midnight January 1, 1970 in seconds, followed by a comma and the number of microseconds.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_TYPE The type of event that occurred. For conditions, the valid values are: Event and Rearm Event. For responses, the valid values are: Event, Rearm Event, and Error Event.

This environment variable is not available for ERRM commands in batched event responses.

ERRM_TYPEID The value of ERRM_TYPE. For conditions, the valid values are: 0 (event) and 1 (rearm event). For responses, the valid values are: 0 (event), 1 (rearm event), and 2 (error event).

This environment variable is not available for ERRM commands in batched event responses.

ERRM_VALUE

ERRM_VALUE_n

The value of the attribute that caused the event to occur for all attributes except those with a data type of CT_NONE.

The following data types are represented with this environment variable as a decimal string: CT_INT32, CT_UINT32, CT_INT64, CT_UINT64, CT_FLOAT32, and CT_FLOAT64.

CT_CHAR_PTR is represented as a string for this environment variable.

CT_BINARY_PTR is represented as a hexadecimal string separated by spaces.

CT_SD_PTR is enclosed in square brackets and has individual entries within the SD that are separated by commas. Arrays within an SD are enclosed within braces {}. For example, ["My Resource Name",{1,5,7},{0,9000,20000},{7000,11000,25000}] See the definition of ERRM_SD_DATA_TYPES for an explanation of the data types that these values represent.

This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM.

This environment variable is not available for ERRM commands in batched event responses.

Note:
In addition to these ERRM environment variables, you can, when defining a response action using either the mkresponse or chresponse command, specify additional environment variables for RMC to set prior to triggering the event response script. This enables you to write a more general purpose script that will behave differently based on the environment variables settings associated with the response action. To specify such user-defined environment variables, use the -E flag of either the mkresponse or chresponse command. For example:
mkresponse -n "Page Admins"  -s /opt/rsct/bin/pageevent
-d 1+7 -t 0000-2400  -e a -E 'ENV1="PAGE ALL"' "contact system administrators"

Of course, if you do create your own response scripts, test them before use as response actions in a production environment. The -o flag of the mkresponse and chresponse commands is useful when debugging new actions. When specified, all standard output from the script is directed to the audit log. This is useful because, while standard error is always directed to the audit log, standard output is not.

For more information about the predefined response scripts (as well as information on the -E and -o flags of the mkresponse and chresponse commands), see the Technical Reference: RSCT for AIX® and Technical Reference: RSCT for Multiplatforms guides.