Configuring probes for self monitoring
As a self-monitoring mechanism, you can configure a probe to collect statistical data about the amount of memory used for various processing operations, and the number of events received, discarded, and generated.
About this task
To configure a probe to collect and process statistical data:
Procedure
- Go to the $NCHOME/omnibus/extensions/roi directory.
- Copy the probestats.sql file to the $NCHOME/omnibus/etc directory, or another preferred location.
Apply the ProbeWatch Heartbeat customization to the ObjectServer schema
by running the following command from the SQL interactive interface:
$NCHOME/omnibus/bin/nco_sql -user username -password password -server servername < directory_path/probestats.sql
"%NCHOME%\omnibus\bin\isql" -U username -P password -S servername -i directory_path\probestats.sql
In these commands, username is a valid user name, password is the corresponding password, servername is the name of the ObjectServer, and directory_path is the fully-qualified directory path to the .sql file.
The probestats.sql file adds a set of tables and triggers to the ObjectServer.
- Copy the $NCHOME/omnibus/extensions/roi/probewatch.include file to a preferred local or remote directory where the main rules
file or any secondary rules files for the probe is stored. This file is designed to replace the logic in the ProbeWatch section of your primary rules file, which is typically coded as follows:
if( match( @Manager, "ProbeWatch" ) ) { switch(@Summary) { case "Running ...": @Severity = 1 @AlertGroup = "probestat" @Type = 2 case "Going Down ...": @Severity = 5 @AlertGroup = "probestat" @Type = 1 default: @Severity = 1 } @AlertKey = @Agent @Summary = @Agent + " probe on " + @Node + ": " + @Summary } else { ...probe specific rules... }
The code shown in bold text needs to be replaced with an
include
statement that enables you to embed the contents of the probewatch.include file, as instructed in step 5. - Remove the default read-only permissions from
your copy of the probewatch.include file and
review the file to familiarize yourself with its contents. Then edit
the file as follows:
- Update any of the elements at the top of the file to define
how a ProbeWatch Heartbeat event should be processed. Use the number
sign (#) to comment out any elements that you do not require. The
processing logic for these elements is coded within the
case "Heartbeat ..."
statement in the ProbeWatch section of the file.Table 1. Elements for ProbeWatch Heartbeat events Element Action $OplHeartbeat_discard
Set this value to 1 if you want to discard the ProbeWatch Heartbeat event. Set this value to 0 if you want to forward the ProbeWatch Heartbeat event to the ObjectServer.
$OplHeartbeat_populate_master_probestats
Set this value to 1 to enable a new probe metrics event to be generated by using the genevent function, which is defined within the case "Heartbeat ..."
statement. The event data consists a set of OplStats probe metrics, which are forwarded to the master.probestats table that was created when you ran the probestats.sql script.Set this value to 0 if you do not want to generate this event for insertion into the master.probestats table.
$OplHeartbeat_write_to_probe_log
Set this value to 1 if you want to record the OplStats metrics in the probe log file. Details are logged at the INFO level. The metric details that are logged are defined in the case "Heartbeat ..."
statement.Set this value to 0 if you do not want to record the metrics in the log file.
$OplHeartbeat_generate_threshold_events
Set this value to 1 if you want to generate threshold events that indicate when a particular probe metric violates a defined threshold. By default, no code is provided for threshold events within this rules file because individual preferences can vary widely. If you require threshold events, you must first decide which thresholds you want to monitor. Then, within the case "Heartbeat ..."
statement, provide the code for generating threshold events. - In addition to the standard CASE statements, the file includes
the following two CASE statements, which contain the logic for two
new ProbeWatch events that provide feedback when a probe re-reads
its rules files on receipt of a SIGHUP signal. The first CASE statement
applies when the re-read was successful:
case "Rules file reread upon SIGHUP successful ...": @Severity = 1 @AlertGroup = "rules" @Type = 2
The second CASE statement applies when the re-read was unsuccessful. This section of code includes two elements ($msg and $file), where $msg is the error message as reported in the probe log file, and $file is the name of the file where the error exists.case "Rules file reread upon SIGHUP failed ...": @Severity = 4 @AlertGroup = "rules" @Type = 1 if( exists( $msg ) ) { @Summary = @Summary + "("+$msg+")" } if( exists( $file ) ) { @Summary = @Summary + " in file "+$file }
If you do not require these, use the
discard
function to prevent them from being sent to the ObjectServer. - The final CASE statement (
case "Heartbeat ..."
) contains a set of conditional statements for calculating the probe metrics and processing the data. IF statements are provided with the logic to discard events and to write the probe metrics to a log file. Some user input is also required:Table 2. case "Heartbeat ..." sections that require user input Locate the section of code that begins with the following lines: if( int( $OplHeartbeat_populate_master_probestats ) == 1 ) { log( DEBUG, "HEARTBEAT - SENDING PROBESTATS TO MASTER.PROBESTATS" ) ...
This section of code contains a genevent statement with a
DefaultOS
placeholder that identifies a target, registered ObjectServer. This target must be defined in a registertarget statement in the main rules file. Replace this placeholder with the target ObjectServer to which you want to send events.Locate the section of code that begins with the following lines: if( int( $OplHeartbeat_generate_threshold_events ) == 1 ) { # # Area to generate user defined threshold events using genevent ...
If you set the
$OplHeartbeat_generate_threshold_events
element to 1 at the top of the file, you must enter the code for the type of threshold events that you want to monitor.You can ignore this section if you do not require threshold events.
- If you have modified the ProbeWatch section of your main rules file (typically $NCHOME/omnibus/probes/arch/probename.rules), you must make the same modifications to the probewatch.include file.
- If the main rules file includes additional ProbeWatch sections that contain code for different ProbeWatch messages that are not covered in the probewatch.include file, copy this additional code into the probewatch.include file.
Tip: After making all the changes to the probewatch.include file, run the Probe Rules Syntax Checker (nco_p_syntax) to test the syntax of the rules file. - Update any of the elements at the top of the file to define
how a ProbeWatch Heartbeat event should be processed. Use the number
sign (#) to comment out any elements that you do not require. The
processing logic for these elements is coded within the
- Embed the updated probewatch.include file in your main probe rules file
by using an
include
statement. Ensure that the path in theinclude
statement points to the location where the updated probewatch.include file is stored.if( match( @Manager, "ProbeWatch" ) ) { include "directory_path/probewatch.include" } else { ...probe specific rules... }
- Specify the interval, in seconds at which probe heartbeat
messages are generated, by setting ProbeWatchHeartbeatInterval in the probe properties file.
- Set a positive number to generate the events
- Set 0 (zero) or a negative number for no events
- Ensure that the stats_triggers trigger group is enabled.
The triggers that are added by the probestats.sql file are assigned to this trigger group, which must be enabled for
the triggers to run. You can enable the trigger group by using Netcool/OMNIbus Administrator or the ALTER TRIGGER GROUP command.
- Enable the probe_statistics_cleanup trigger, which by default is set to delete probe statistics that are over an hour old. You can change this default period to increase the length of time for which statistics are stored.
- Start the probe.
The probe metrics that are collected are recorded in the log file $NCHOME/omnibus/log/server_name_probestats.log, where server_name is the ObjectServer name.