Enabling predictive eventing and predictive analytics

In an integrated Tivoli Netcool/OMNIbus and IBM Tivoli Monitoring environment, you can use the built-in predictive analytics capability to identify potential performance and capacity problems that could result in performance degradation or service outages. Within this environment, you can generate events that represent predictions for systems that are in danger of an impending threshold violation, and which require attention.

Note: This section assumes that you have a working knowledge of IBM Tivoli Monitoring.

Predictive event

IBM Tivoli Monitoring and the Probe for Tivoli EIF can be configured to forward predictive events to Tivoli Netcool/OMNIbus, and the resulting alerts can then be monitored in the Active Event List or the desktop event list.

Depending on which predictive eventing and analytics functions you configure, the following types of predictive event can be generated:

  • Predictive events based on the event rate: If the linear trending function is configured, these predictive events display predictions of problems with the event rate of a specific device, for example, if a monitored device is showing an increased error event rate, and will exceed a defined threshold within seven days.
  • Predictive events based on deviations from the baseline: If the baselining function is configured, these predictive events display deviations from a defined average event rate, which is calculated from archived data.
  • Predictive events based on linear trending using historical data that is collected from monitoring agents in the IBM Tivoli Monitoring environment.

The following usage scenario outlines possible actions to take when predictive events are forwarded to Tivoli Netcool/OMNIbus for display in the event list and Active Event List:

  • When alert data for a predictive event is displayed in the event list or Active Event List, begin to gather initial information on the predicted problem, such as the location of the problem and the number of days before the threshold is violated.
  • Investigate the reasons for the prediction by looking through the actual events and other predictive events that are generated for the managed entity or node.
  • When sufficiently satisfied with the validity of the prediction, begin to take corrective actions on the managed entity before the problem actually occurs.
  • Alternatively, temporarily ignore the predictive event while monitoring for follow-up predictions and actual events that occur on the managed entity in the period between the generation of the predictive event and the threshold violation.
  • For multiple predictive events, prioritize your response based on the order in which the events are displayed in the event list or Active Event List.

Linear trending for device event rates

Linear trending for device event rates uses IBM Tivoli Monitoring predictive analytics functionality to determine if the event rates received by the ObjectServer are likely to exceed upper thresholds within a defined period of time. Tivoli Performance Analyzer uses the event rates received from the ObjectServer to produce trends. If a threshold is violated within a define time frame, a predictive event is sent to the ObjectServer. Support is provided for calculations over seven, 30, or 90 days. You can define critical and warning thresholds, when these thresholds are exceeded, the resulting predictive event has the corresponding severity.

The linear trending calculations use the Least Squares Regression method. This method approximates a linear pattern of use, over time, for selected attributes based on their values in the past.

Note: The Least Squares Regression method provides approximate data. You should apply a margin of approximately plus or minus 12 hours to the predictive events that are generated.

Baselining for device event rates

Baselining of the event rate per client device calculates the average event rate, per device, over a defined period of time. Event rate data, archived in Tivoli Data Warehouse, is used to build a corridor of normality. Current event rates are measured against the baseline average over a defined period of weeks for the current period of time. You define the upper and lower deviations from the average rate, that is the thresholds that the event rate must exceed. If a threshold is exceeded, IBM Tivoli Monitoring generates a situation, which is received by the Probe for Tivoli EIF. The Probe for Tivoli EIF in turn generates an event in Tivoli Netcool/OMNIbus. The minimum period that data needs to build up is seven days. Ideally you should allow for 14 days of data.