How can I generate a server down alert only after the server is down more than 5 minutes?

To generate an alert when a server is down more than 5 minutes, you must create a linked rule. The linked rule monitors when the server stops and takes no action unless the server does not start in more than 5 minutes.

About this task

When IBM® Sterling Control Center Monitor can no longer communicate with a monitored server, it assumes that the server stopped. It then generates an event of type Server Status with a message ID of CCTR034E. Also, when IBM Sterling Control Center Monitor establishes communication with a monitored server, it assumes that the server is started. It generates an event of type Server Status with a message ID of CCTR033E.

This information can be used in conjunction with a linked rule to generate an alert for a server down condition only if the server is down for more than, say, 5 minutes. This alert gives a system administrator a chance to restart the server before any alert is generated. This rule is useful when brief server outages are of minor consequence and you want an alert generated only when a longer-term outage occurs.

A linked rule is a normal rule with additional attributes.

To construct a rule that alerts you when a server is down more than 5 minutes:

Procedure

  1. Build a rule that monitors for the message that IBM Sterling Control Center Monitor generates when it thinks a server stopped.
  2. Specify a Key of Message ID, an Operator of Matches, and a Value of CCTR034E. You can specify additional parameters to limit when this rule is triggered.
  3. Select the No Operation action, since you do not want to generate an alert when the server is first discovered to be down. You want to generate an alert only after it is down for more than 5 minutes.
  4. Select Enabled on the Linked Rules wizard that appears next.
  5. Specify a following parameter Key of Message ID, an Operator of Matches, and a Value of CCTR033E, as an event that is generated by the IBM Sterling Control Center Monitor engine when it finds that a managed server was started. You can also specify other parameters to refine the criteria under which CCTR033E events are matched by the linked part of the rule.
  6. Since you selected No Operation for the initial action, choose the same for the Resolution action. Nothing must be done if the server comes back up within 5 minutes. Choose an appropriate action, such as alert1, for the Non-Resolution action.
    Important: If you selected an alert action for the initial action, you must choose alert0 for the resolution action, which tells IBM Sterling Control Center Monitor to clear the first alert, and again an appropriate action for the Non-Resolution action, such as alert1. When the server is first discovered to be down, a lower-severity alert is generated. If the server remains down for more than the allowed amount of time, a higher-severity alert is generated.
  7. Set the timeout value to the time the IBM Sterling Control Center Monitor engine must wait before taking either the Resolution or the Non-Resolution action, in this case 5 minutes.