Using Anomaly Detection

In performance monitoring, you can place thresholds on metrics to determine if a Db2 thread is using an excessive amount of resources. For example, you might set up thresholds on metrics such as CPU time, Elapsed Time, and Get Pages. The problem with setting such thresholds though is how do you determine what is normal and what is truly an error that needs attention.

For a CICS transaction to use less than 5 seconds of Db2 Elapsed time and setting a threshold to detect larger values makes sense. But what about a batch program? Its CPU Time, Elapsed Time and Get Pages will probably be much larger. You want to trigger an exception if a specific CICS transaction is running too long, but you do not want an exception condition to trigger for a batch program that is expected to run for a long time.

This is where machine learning and artificial intelligence for Anomaly Detection is valuable. Machine learning and artificial intelligence mean that OMEGAMON for Db2® PE has an initial learning period where metrics of thread executions grouped by specific Thread Identity fields are recorded.

Once the initial learning period is complete then Db2 threads that match the execution group are measured against previously learned metrics to look for anomalies. An anomaly is a thread that is outside the learned range by greater than the tolerance value. Learning continues if the value is within a reasonable range based on the discard tolerance value.