Incident Creation based on Persistence and Golden Signal

Incident creation based on persistence and golden signal is available only for technology preview, not for production usage  Incident creation based on persistence and golden signal is available only for technology preview, not for production usage.

Incident prioritization uses the golden signal policy and the alert suppression policy to filter out incidents when the observed alerts contain a golden signal and are not suppressed. The policy suppresses alerts that do not need attention.

An incident is only created when the following criteria are met:

  1. Events persist for a significant duration, leading to an unsuppressed alert.
  2. The golden signal that is associated with the alert is of the type EFFECT.

Prerequisites

Configuration prerequisites

To enable the X in Y algorithm, use the following steps.

  1. Log in to your Red Hat OpenShift Container Platform cluster as an administrator where Cloud Pak for AIOps is installed.

  2. Create or update the ConfigMap with the IBM_IR_AI_X_IN_Y_PREVIEW_ENABLED field. Set the value of this field to true. A sample YAML file can resemble the following example:

     apiVersion: v1
     data:
       IBM_IR_AI_X_IN_Y_PREVIEW_ENABLED: "true"
     kind: ConfigMap
     metadata:
       name: feature-flag-configmap
       namespace: <namespace>
    

    Where <namespace> is the namespace where IBM Cloud Pak for AIOps is installed.

  3. If you are not running these steps before installation or upgrading, delete the ibm-ir-ai-operator-controller-manager pod. When the pod is deleted, OpenShift automatically creates and starts a new pod.

Data prerequisites

Historical alert data is used in the training process to learn values of X and Y, which are later employed for creating XinY alert suppression policies.

  1. In the Cloud Pak for AIOps console, click Operate > AI model management and find the Alert suppression XinY policies tile. If this tile does not appear, delete the aimanager-aio-ai-platform-api-server pod using the OpenShift console. When the pod is deleted OpenShift automatically creates and starts a new pod.

    For more information, see The Alert suppression XinY policies tile does not appear in AI model management.

  2. Click the tile and follow the steps to train the model and then deploy it. For more information, see training a model.

This instantiates the policy and adds it to the policy list in Automations.

Note: To create effective Alert Suppression policies, a minimum of 7 days’ worth of historical alert data is needed. This data can be supplied either during the training phase or continuously gathered from the alerts that are received after integrating your application with IBM Cloud Pak for AIOps.

About this task

After you train your model with historical data, you are ready to proceed to using the incident creation based on persistence and golden signal policy.

Procedure

  1. Go to the Automations page and click the Policy tab to see all available policies.
  2. Click the Toggle filters icon and select Suppression from the list.
  3. Any policies that have suppressions are listed. Find the policy that contains the prefix X-in-Y-alert-suppression. If it is not there, try clicking the refresh icon, to update the policy list.
  4. Click the X-in-Y-alert-suppression policy to open a sidebar information panel. In the panel, slide the Status switch to Enabled.
  5. Next, close the panel, click the Tag header column, clear Suppression, and check the Incident box.
  6. Among the incident policy listings are the Golden Signal and Persistence incident creation policy. Click this policy to open the sidebar panel.
  7. In the sidebar panel, set the Status switch to Enabled.
  8. The Golden Signal and Persistence incident creation must be the only Incident tagged policy that is enabled. If other Incident policies are enabled, then click them, in turn, and disable them in the sidebar panel.

You now have a policy that combines the two policy values when creating incidents. Now when alerts come in, they must fulfill the conditions of the two policies to be promoted as incidents:

  • They need to contain a Golden Signal Error
  • The X and Y values must be outside the allowed range set when the model was trained – for example, where the model has the value of 12 events within 60 minutes, an anomaly only surfaces when the events are 12 or above.