Configuring anomaly-based alerts and exceptions

Anomaly detection is specified at the monitoring profile level, with each profile monitoring a specific workload type.

About this task

When anomaly-based exceptions are enabled, the following occurs when a query workload matches the profile:

  • Db2® Query Monitor calculates the mean and variance for the specified attributes (CPU, Elapsed, Get Pages).
  • The mean will be maintained through intervals and per interval.
  • More recent executions will be weighted more heavily.
  • Users will specify a sensitivity value to control how far from normal before triggering the alert or exception.

Procedure

To configure anomaly-based alerts and exceptions:
Set up an INCLUDE monitoring profile that:
  • Identifies the workload (by information such as the subsystem, plan, and program).
  • Excludes any SQL codes from exception reporting for the workload.
  • Defines an exception limit greater than 0.
  • Disables exception thresholds:

    In ISPF, set the Threshold Exceptions field to N.

    In CAE, on the Exceptions and Alerts tab, deselect the Threshold Exceptions checkbox.

  • Disables threshold alerts:

    In ISPF, set the Threshold Alerts field to N.

    In CAE, on the Exceptions and Alerts tab, deselect the Threshold Alerts checkbox.

  • Specifies anomaly-based exceptions:

    In ISPF, set the Anomaly-based Exceptions fields CPU Anomalies, Elapsed Anomalies, and Getpage Anomalies to Y as needed. For each selected field, set a value in the corresponding Toleration Level field.

    In CAE, on the Exceptions and Alerts tab, select the Anomaly-based Exceptions checkboxes Cpu, Elapsed, and Getpages as needed. For each selected checkbox, set a value in the corresponding Sensitivity field.

  • Specifies anomaly-based alerts:

    In ISPF, set the Anomaly-based Alerts fields CPU Anomalies, Elapsed Anomalies, and Getpage Anomalies to Y as needed. For each selected field, set a value in the corresponding Toleration Level field.

    In CAE, on the Exceptions and Alerts tab, select the Anomaly-based Alerts checkboxes Cpu, Elapsed, and Getpages as needed. For each selected checkbox, set a value in the corresponding Sensitivity field.

  • Specifies discard levels:

    In ISPF, specify these values in the CPU Discard Level, Elapsed Discard Level, and Getpage Discard Level fields.

    In CAE, specify these values on the Exceptions and Alerts tab in the Discard CPU data above, Discard Elapsed Time data above, and Discard Getpages data above fields.

Tips for setting tolerance levels

When setting tolerance levels for anomaly-based alerts and exceptions, use the following guidance:

Procedure

  1. Start with default values of Tolerance Level (6.0) and Discard Tolerance (12.0)
  2. Start with No Anomalies configured for alerts.
  3. Set an Exception Count Limit in the exception profile.
  4. Run with these values for a time period.
  5. Analyze the anomaly exceptions produced.
  6. Sort Exceptions based on Standard Deviation Factor for Exception Values.
    The Standard Deviation Factor is the number of Stand Deviations the Exception Value is from the Mean.
  7. Determine what values are anomalous and determine an acceptable value for Standard Deviation Factor.
  8. When the exception list is sorted based on Standard Deviation Factor, it is easier to see how many exceptions would be generated at different Tolerance Levels.
  9. Set Tolerance Level appropriately based on the Standard Deviation Factor.
  10. If insufficient exceptions are generated in the given timeframe, lower the Tolerance Level, and repeat.