Creating new response scripts
The predefined response scripts we provide are general purpose ways of notifying users about an event, or else logging the event information to a file. In addition to these general-purpose scripts, you might want to write you own scripts that provide more specific responses to events.
You might want to do this to create an automatic recovery script that would enable RMC to solve a simple problem automatically. For example when the /tmp directory is over 90 percent full, you could have RMC run a script to automatically delete the oldest unnecessary files in the /tmp directory. Another reason you might want to create your own scripts is to tailor system responses to better suit your particular organization. For example, you might want to create a script that calls your pager when a particular event occurs.
After a condition event occurs, but before the response script executes, ERRM sets a number of environment variables that contain information about the event. The script can check the values of these variables in order to provide the event information to the user. Using the ERRM environment variables, the script can ascertain such information whether it was triggered by the condition event or rearm event, the time the event occurred, the host on which the event occurred, and so on.
Example: The following is a predefined Perl script called wallevent which illustrates the use of the ERRM environment variables. The ERRM environment variables names begin with "ERRM_" and are highlighted in the example.
# main()
PERL=/opt/rsct/perl5/bin/perl
CTMSG=/opt/rsct/bin/ctdspmsg
MSGMAPPATH=/opt/rsct/msgmaps
export MSGMAPPATH
Usage=`$CTMSG script IBM.ERrm.cat MSG_SH_USAGE`
while getopts ":h" opt
do
case $opt in
h ) print "Usage: `basename $0` [-h] "
exit 0;;
? ) print "Usage: `basename $0` [-h] "
exit 3;;
esac
done
# convert time string
seconds=${ERRM_TIME%,*}
EventTime=$(seconds=$seconds $PERL -e \
'
use POSIX qw(strftime);
print strftime("
'
)
WallMsg=`$CTMSG script IBM.ERrm.cat MSG_SH_WALLN "$ERRM_COND_SEVERITY"
"$ERRM_TYPE" "$ERRM_COND_NAME" "$ERRM_RSRC_NAME"
"$ERRM_RSRC_CLASS_NAME" "$EventTime" "$ERRM_NODE_NAME"
"$ERRM_NODE_NAMELIST"`
wall "${WallMsg}"
#wall "$ERRM_COND_SEVERITY $ERRM_TYPE occurred for the condition $ERRM_COND_NAME
on the resource $ERRM_RSRC_NAME of the resource class $ERRM_RSRC_CLASS_NAME at
$EventTime on $ERRM_NODE_NAME" The preceding script uses the ERRM_TIME environment variable to ascertain the time that the event occurred, the ERRM_COND_SEVERITY environment variable to learn the severity of the event, the ERRM_TYPE environment variable to determine if it was the condition event or rearm event that triggered the script's execution, and so on. This information is all included in the message sent to online users.
Table 1 describes the ERRM environment variables that you can use in response scripts. Unless otherwise specified, these environment variables are available for ERRM commands in non-batched event responses, batched event responses, and batched event files.
| This environment variable... | Contains... |
|---|---|
| ERRM_ATTR_NAME ERRM_ATTR_NAME_n |
The display name of the dynamic attribute used
in the expression that caused this event to occur. This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_ATTR_NUM | The number of attributes that are used in the
event expression. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_ATTR_PNAME ERRM_ATTR_PNAME_n |
The programmatic name of the attribute used
in the expression that caused this event to occur. This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_BATCH_REASON | The reason why the batched event was triggered.
The valid values are: 1 (the event batch interval expired),
2 (the maximum number of batching events was reached), 3 (monitoring
stopped), 4 (the association between the condition and event
response was removed), 5 (the event response was deleted),
and 6 (the condition was deleted). This environment variable is not available for ERRM commands in non-batched event responses or batched event files. |
| ERRM_COND_BATCH | The indication of whether the condition is batching events. The valid values are: 0 (no) and 1 (yes). |
| ERRM_COND_BATCH_NUM | The number of events in the batched event file. This environment variable is not available for ERRM commands in non-batched event responses or batched event files. |
| ERRM_COND_HANDLE | The resource handle of the condition that caused
the event. The format of this value is six hexadecimal integers that
are separated by spaces and written as a string, for example: |
| ERRM_COND_MAX_BATCH | The maximum number of events that can be batched
together, if the condition is batching events. If the value is 0,
there is no limit. This environment variable is not available for ERRM commands in non-batched event responses. |
| ERRM_COND_NAME | The name of the condition that caused the event. |
| ERRM_COND_SEVERITY | The severity of the condition that caused the
event. For the severity attribute values of 0, 1, and 2, this environment
variable has the following values, respectively: Informational, Warning,
and Critical. All other severity attribute values are represented
in this environment variable as a decimal string. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_COND_SEVERITYID | The severity value of the condition that caused
the event. The valid valuea are: 0 (Informational), 1 (Warning),
and 2 (Critical). This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_DATA_TYPE ERRM_DATA_TYPE_n |
The RMC ct_data_type_t type of the attribute
that changed to cause this event. The valid values are: CT_BINARY_PTR, CT_CHAR_PTR, CT_FLOAT32, CT_FLOAT64, CT_INT32, CT_INT64,
CT_SD_PTR, CT_UINT32, and CT_UINT64. The actual
value of the attribute is stored in the ERRM_VALUE environment
variable (except for attributes with a data type of CT_NONE).
This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_ER_HANDLE | The event response resource handle for
this event. The format of this value is six hexadecimal integers that
are separated by spaces and written as a string, for example:
This environment variable is not available for ERRM commands in batched event files. |
| ERRM_ER_NAME | The name of the event that triggered this event
response script. This environment variable is not available for ERRM commands in batched event files. |
| ERRM_ERROR_MSG | The descriptive message for ERRM_ERROR_NUM,
if the value of ERRM_ERROR_NUM is not 0. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_ERROR_NUM | The error code from the RMC subsystem for an
error event. If the value is 0, an error did not occur. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_EVENT_DETAIL_FILE | The file name of where the batched events can
be found, if the condition is batching events. This environment variable
does not appear in the batched event file. This environment variable is not available for ERRM commands in non-batched event responses or batched event files. |
| ERRM_EXPR | The condition event expression or rearm event
expression that tested True, thus triggering this linked
response. The type of event that triggered the linked response is
stored in the ERRM_TYPE environment variable. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_NODE_NAME | The host name on which this event or rearm event
occurred. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_NODE_NAMELIST | A list of host names. These are the hosts on
which the monitored resource resided when the event occurred. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_RSRC_CLASS_NAME | The display name of the resource class containing
the attribute that changed, thus causing the event to occur. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_RSRC_CLASS_PNAME | The programmatic name of the resource class
containing the attribute that changed, thus causing the event to occur. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_RSRC_HANDLE | The resource handle of the resource with
the state change that caused this event to be generated. The format
of this value is six hexadecimal integers that are separated by spaces
and written as a string, for example:
This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_RSRC_NAME | The name of the resource whose attribute changed,
thus causing this event. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_RSRC_TYPE | The type of resource that caused the event to
occur. The valid values are: 0 (an existing resource), 1 (a
new resource), and 2 (a deleted resource). This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_SD_DATA_TYPES ERRM_SD_DATA_TYPES_n |
The data type for each element within the structured
data (SD) variable, separated by commas. This environment variable
is only defined when ERRM_DATA_TYPE is CT_SD_PTR. For example:
CT_CHAR_PTR, CT_UINT32_ARRAY, CT_UINT32_ARRAY, CT_UINT32_ARRAY. This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM. The ERRM_SD_DATA_TYPES_n environment variable is only defined when the value of ERRM_DATA_TYPE_n is CT_SD_PTR. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_TIME | The time the event occurred. The time is written
as a decimal string representing the time since midnight January 1,
1970 in seconds, followed by a comma and the number of microseconds.
This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_TYPE | The type of event that occurred. For conditions,
the valid values are: Event and Rearm Event. For responses,
the valid values are: Event, Rearm Event, and Error
Event. This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_TYPEID | The value of ERRM_TYPE. For conditions,
the valid values are: 0 (event) and 1 (rearm event).
For responses, the valid values are: 0 (event), 1 (rearm
event), and 2 (error event). This environment variable is not available for ERRM commands in batched event responses. |
| ERRM_VALUE ERRM_VALUE_n |
The value of the attribute that caused the
event to occur for all attributes except those with a data type of CT_NONE.
The following data types are represented with this environment variable as a decimal string: CT_INT32, CT_UINT32, CT_INT64, CT_UINT64, CT_FLOAT32, and CT_FLOAT64. CT_CHAR_PTR is represented as a string for this environment variable. CT_BINARY_PTR is represented as a hexadecimal string separated by spaces. CT_SD_PTR is enclosed in square brackets and has individual entries within the SD that are separated by commas. Arrays within an SD are enclosed within braces {}. For example, ["My Resource Name",{1,5,7},{0,9000,20000},{7000,11000,25000}] See the definition of ERRM_SD_DATA_TYPES for an explanation of the data types that these values represent. This environment variable is repeated if the value of ERRM_ATTR_NUM is greater than 1. The value of n is from 2 to ERRM_ATTR_NUM. This environment variable is not available for ERRM commands in batched event responses. |
mkresponse -n "Page Admins" -s /opt/rsct/bin/pageevent
-d 1+7 -t 0000-2400 -e a -E 'ENV1="PAGE ALL"' "contact system administrators"
Of course, if you do create your own response scripts, test them before use as response actions in a production environment. The -o flag of the mkresponse and chresponse commands is useful when debugging new actions. When specified, all standard output from the script is directed to the audit log. This is useful because, while standard error is always directed to the audit log, standard output is not.
For more information about the predefined response scripts (as well as information on the -E and -o flags of the mkresponse and chresponse commands), see the Technical Reference: RSCT for AIX® and Technical Reference: RSCT for Multiplatforms guides.