Flashes (Alerts)
Abstract
AIOps 4.11.1/4.12.0 is creating multiple identical temporal policies for a group of events in situations where it should only create one. We notice that for a given amount of alerts the number of policies is inconsistent and excessive. This has been diagnosed as a defect.
Content
DIAGNOSING THE PROBLEM:
You can validate by running the following commands. The following commands will read out the number of policies and alerts in the Cassandra system used for training:
CLASSIFIER_POD=$(oc get pods | grep classifier | awk '{print $1}' | head -n 1)
READ_DATA="/opt/app/scripts/read_cassandra_agg_aiops.pyc"
READ_POLICIES="/opt/app/scripts/read_cassandra_policies_agg_aiops.pyc"
oc exec $CLASSIFIER_POD -- python3 $READ_DATA
oc exec $CLASSIFIER_POD -- python3 $READ_POLICIES
If the previous query timeouts then run the following command. This will give you the read out of the number of policies on the DB:
CASS_USER=$(oc get secret aiops-topology-cassandra-auth-secret -o jsonpath --template '{.data.username}' | base64 --decode; echo);
CASS_PASS=$(oc get secret aiops-topology-cassandra-auth-secret -o jsonpath --template '{.data.password}' | base64 --decode; echo);
oc exec -ti aiops-topology-cassandra-0 -- /opt/ibm/cassandra/bin/cqlsh --ssl -u $CASS_USER -p $CASS_PASS -e "copy aiops_policies.aiops_policies to '/dev/null'"
This defect has been confirmed for all 4.11.1 and 4.12.0 systems in Cloud Pak of AIOps.
RESOLVING THE PROBLEM:
For both AIOPS 4.11.1 and 4.12.0 the steps to solve the issue involves applying a hotfix.
4.11.1 image = cp.icr.io/cp/cp4waiops/training-service@sha256:cb9439548160f00f6a6d3dc6c4a005db9f7d2cab4ec6723018ef2b06557f09bf
4.12 image = cp.icr.io/cp/cp4waiops/training-service@sha256:b2f03aa2dcdf0515ed0bf3fb4fa2b49a8992317bbfe549ed4c706986862c6d81
To patch an existing instance, run the following commands, replacing <namespace> with the namespace where Cloud Pak for AIOps is installed.
Set the env NEW_IMG based on the version of AIOPS you currently have deployed
export NEW_IMG="cp.icr.io/cp/cp4waiops/training-service@sha256:b2f03aa2dcdf0515ed0bf3fb4fa2b49a8992317bbfe549ed4c706986862c6d81"
export NAMESPACE=cp4waiops
export CSV=$(oc get csv -n ${NAMESPACE} | grep ibm-aiops-ir-ai | awk '{print $1}')
oc patch csv $CSV -n ${NAMESPACE} --type='json' -p="[{'op': 'replace', 'path': '/spec/install/spec/deployments/0/spec/template/metadata/annotations/olm.relatedImage.aiops-spark-trainer-image', 'value': '${NEW_IMG}'}]"
Hint, you can watch the rollout succeeds using:
oc get po -w | grep spark-pipeline-composer
Finally, validate there are no errors in the IRAI
oc get irai -o yaml
Once the system has been patched with the new image and the spark pods have restarted, before a new training is started, we must clean up the remaining temporal/seasonal policies on the system.
Note: Some users may want to save their custom policies and user defined policies and only truncate the temporal/seasonal policies.
The policy clean up tool can assist with the following - Download based on the version of Cloud Pak for AIOps deployed.
https://github.com/IBM/cp4waiops-samples/tree/main/utils
IMPORTANT: Please read the readme file for the scripts usage:
The tool performs the following operations:
- Exports default policies (with
isDefault=truelabel) - Exports user-defined policies (with
isDefault=falseandmanaged-by-analytics=false) - Truncates Cassandra policy tables
- Runs policy registry upgrade
- Reloads policies via appropriate APIs
The command to execute policy clean up tool:
chmod +x clearAnalyticsPolicies.sh
./clearAnalyticsPolicies.sh -n <namespace>
During the invocation of the script clearAnalyticsPolicies.sh, you will be given an option to proceed to truncation and reload, you can select no at this stage to simply download the policies. This would be useful in a situation where you would rather manually truncate the policy tables, then later reload the policies to the system using the reload option.
This command create json files with the policies defined in them, cat the files to validate the user defined policies exist and everything that the client want to save is recorded.
Once the file is validated you can run: clearAnalyticsPolicies.sh -n <namespace> and select okay for the truncate option. The policies should be truncated and the user defined and custom policies would be reloaded into the policy registry.
NOTE: Only once the truncation of the old temporal/seasonal policies are done, you should then proceed to run training.
Pre-Req before running the script clearAnalyticsPolicies.sh:
Authentication: Script must be executed with access to
cpadmincredentials (via secrets)Kubernetes Access: User must be logged into the AIOps namespace
CLI Tools:
curl,kubectl, andjqmust be installedAPI Access: Policy registry service must be functioning correctly
Pod Access: Ability to exec into pods using kubectl
Bash Version: Bash 3.2 or higher
Product Synonym
CP4AIOps
Was this topic helpful?
Document Information
Modified date:
27 March 2026
UID
ibm17267624