IBM Support

AIOps 4.11.1/4.12.0 event analytics generates duplicate policies

Flashes (Alerts)


Abstract

AIOps 4.11.1/4.12.0 is creating multiple identical temporal policies for a group of events in situations where it should only create one. We notice that for a given amount of alerts the number of policies is inconsistent and excessive. This has been diagnosed as a defect.

Content

DIAGNOSING THE PROBLEM:

You can validate by running the following commands. The following commands will read out the number of policies and alerts in the Cassandra system used for training:

CLASSIFIER_POD=$(oc get pods | grep classifier | awk '{print $1}' | head -n 1)
READ_DATA="/opt/app/scripts/read_cassandra_agg_aiops.pyc"
READ_POLICIES="/opt/app/scripts/read_cassandra_policies_agg_aiops.pyc"
oc exec $CLASSIFIER_POD -- python3 $READ_DATA
oc exec $CLASSIFIER_POD -- python3 $READ_POLICIES


If the previous query timeouts then run the following command. This will give you the read out of the number of policies on the DB:

CASS_USER=$(oc get secret aiops-topology-cassandra-auth-secret -o jsonpath --template '{.data.username}' | base64 --decode; echo);

CASS_PASS=$(oc get secret aiops-topology-cassandra-auth-secret -o jsonpath --template '{.data.password}' | base64 --decode; echo);

oc exec -ti aiops-topology-cassandra-0 -- /opt/ibm/cassandra/bin/cqlsh --ssl -u $CASS_USER -p $CASS_PASS -e "copy aiops_policies.aiops_policies to '/dev/null'"


This defect has been confirmed for all 4.11.1 and 4.12.0 systems in Cloud Pak of AIOps.

 

RESOLVING THE PROBLEM:

For both AIOPS 4.11.1 and 4.12.0 the steps to solve the issue involves applying a hotfix.

4.11.1 image = cp.icr.io/cp/cp4waiops/training-service@sha256:cb9439548160f00f6a6d3dc6c4a005db9f7d2cab4ec6723018ef2b06557f09bf

 

4.12 image = cp.icr.io/cp/cp4waiops/training-service@sha256:b2f03aa2dcdf0515ed0bf3fb4fa2b49a8992317bbfe549ed4c706986862c6d81


To patch an existing instance, run the following commands, replacing <namespace> with the namespace where Cloud Pak for AIOps is installed.

Set the env NEW_IMG based on the version of AIOPS you currently have deployed

export NEW_IMG="cp.icr.io/cp/cp4waiops/training-service@sha256:b2f03aa2dcdf0515ed0bf3fb4fa2b49a8992317bbfe549ed4c706986862c6d81"
export NAMESPACE=cp4waiops
export CSV=$(oc get csv -n ${NAMESPACE} | grep ibm-aiops-ir-ai | awk '{print $1}')

oc patch csv $CSV -n ${NAMESPACE} --type='json' -p="[{'op': 'replace', 'path': '/spec/install/spec/deployments/0/spec/template/metadata/annotations/olm.relatedImage.aiops-spark-trainer-image', 'value': '${NEW_IMG}'}]"


Hint, you can watch the rollout succeeds using: 

oc get po -w | grep spark-pipeline-composer


Finally, validate there are no errors in the IRAI 

oc get irai -o yaml


Once the system has been patched with the new image and the spark pods have restarted, before a new training is started,  we must clean up the remaining temporal/seasonal policies on the system.

Note: Some users may want to save their custom policies and user defined policies and only truncate the temporal/seasonal policies.

The policy clean up tool can assist with the following - Download based on the version of Cloud Pak for AIOps deployed.

https://github.com/IBM/cp4waiops-samples/tree/main/utils

IMPORTANT: Please read the readme file for the scripts usage:

The tool performs the following operations:

  1. Exports default policies (with isDefault=true label)
  2. Exports user-defined policies (with isDefault=false and managed-by-analytics=false)
  3. Truncates Cassandra policy tables
  4. Runs policy registry upgrade
  5. Reloads policies via appropriate APIs


The command to execute policy clean up tool:

chmod +x clearAnalyticsPolicies.sh
./clearAnalyticsPolicies.sh -n <namespace>


During the invocation of the script clearAnalyticsPolicies.sh, you will be given an option to proceed to truncation and reload, you can select no at this stage to simply download the policies. This would be useful in a situation where you would rather manually truncate the policy tables, then later reload the policies to the system using the reload option.

This command create json files with the policies defined in them, cat the files to validate the user defined policies exist and everything that the client want to save is recorded.

Once the file is validated you can run: clearAnalyticsPolicies.sh -n <namespace> and select okay for the truncate option. The policies should be truncated and the user defined and custom policies would be reloaded into the policy registry.

NOTE:  Only once the truncation of the old temporal/seasonal policies are done, you should then proceed to run training.


Pre-Req before running the script clearAnalyticsPolicies.sh: 

  1. Authentication: Script must be executed with access to cpadmin credentials (via secrets)

  2. Kubernetes Access: User must be logged into the AIOps namespace

  3. CLI Tools: curl, kubectl, and jq must be installed

  4. API Access: Policy registry service must be functioning correctly

  5. Pod Access: Ability to exec into pods using kubectl

  6. Bash Version: Bash 3.2 or higher 

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSE9G0Q","label":"IBM Cloud Pak for AIOps"},"ARM Category":[{"code":"a8m3p000000PC50AAG","label":"Cloud Pak for AIOps-\u003EAI Manager-\u003EAI Modelling\/Algorithms-\u003EEvent Grouping-\u003ETemporal"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"4.11.1;4.12.0"}]

Product Synonym

CP4AIOps

Document Information

Modified date:
27 March 2026

UID

ibm17267624