Incident Creation based on Persistence and Golden Signal
Incident prioritization uses the golden signal policy and the alert suppression policy to filter out incidents when the observed alerts contain a golden signal and are not suppressed. The policy suppresses alerts that do not need attention.
An incident is only created when the following criteria are met:
- Events persist for a significant duration, leading to an unsuppressed alert.
- The golden signal that is associated with the alert is of the type
EFFECT
.
Prerequisites
Configuration prerequisites
To enable the X in Y algorithm, use the following steps.
-
Log in to your Red Hat OpenShift Container Platform cluster as an administrator where Cloud Pak for AIOps is installed.
-
Create or update the ConfigMap with the
IBM_IR_AI_X_IN_Y_PREVIEW_ENABLED
field. Set the value of this field totrue
. A sample YAML file can resemble the following example:apiVersion: v1 data: IBM_IR_AI_X_IN_Y_PREVIEW_ENABLED: "true" kind: ConfigMap metadata: name: feature-flag-configmap namespace: <namespace>
Where
<namespace>
is the namespace where IBM Cloud Pak for AIOps is installed. -
If you are not running these steps before installation or upgrading, delete the
ibm-ir-ai-operator-controller-manager
pod. When the pod is deleted, OpenShift automatically creates and starts a new pod.
Data prerequisites
Historical alert data is used in the training process to learn values of X and Y, which are later employed for creating XinY alert suppression policies.
-
In the Cloud Pak for AIOps console, click Operate > AI model management and find the Alert suppression XinY policies tile. If this tile does not appear, delete the
aimanager-aio-ai-platform-api-server
pod using the OpenShift console. When the pod is deleted OpenShift automatically creates and starts a new pod.For more information, see The Alert suppression XinY policies tile does not appear in AI model management.
-
Click the tile and follow the steps to train the model and then deploy it. For more information, see training a model.
This instantiates the policy and adds it to the policy list in Automations.
Note: To create effective Alert Suppression policies, a minimum of 7 days’ worth of historical alert data is needed. This data can be supplied either during the training phase or continuously gathered from the alerts that are received after integrating your application with IBM Cloud Pak for AIOps.
About this task
After you train your model with historical data, you are ready to proceed to using the incident creation based on persistence and golden signal policy.
Procedure
- Go to the Automations page and click the Policy tab to see all available policies.
- Click the Toggle filters icon and select Suppression from the list.
- Any policies that have suppressions are listed. Find the policy that contains the prefix X-in-Y-alert-suppression. If it is not there, try clicking the refresh icon, to update the policy list.
- Click the X-in-Y-alert-suppression policy to open a sidebar information panel. In the panel, slide the Status switch to Enabled.
- Next, close the panel, click the Tag header column, clear Suppression, and check the Incident box.
- Among the incident policy listings are the Golden Signal and Persistence incident creation policy. Click this policy to open the sidebar panel.
- In the sidebar panel, set the Status switch to Enabled.
- The Golden Signal and Persistence incident creation must be the only Incident tagged policy that is enabled. If other Incident policies are enabled, then click them, in turn, and disable them in the sidebar panel.
You now have a policy that combines the two policy values when creating incidents. Now when alerts come in, they must fulfill the conditions of the two policies to be promoted as incidents:
- They need to contain a Golden Signal
Error
- The X and Y values must be outside the allowed range set when the model was trained – for example, where the model has the value of 12 events within 60 minutes, an anomaly only surfaces when the events are 12 or above.