Flashes (Alerts)
Abstract
The Problem:
Upgrades from 4.12->4.13 are failing with exceptions with certain savepoints in lifecycle:
This has been diagnosed as a defect.
Content
DIAGNOSING THE PROBLEM:
The Problem:
Upgrades from 4.12->4.13 are failing with the following exception with certain savepoints:
java.io.IOException: User defined function KeyedStateReaderFunction#readKey threw an exception
at org.apache.flink.state.api.input.KeyedStateInputFormat.nextRecord(KeyedStateInputFormat.java:225)
at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:98)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:113)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:71)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:338)
Caused by: java.lang.RuntimeException: java.lang.ClassCastException: com.ibm.aiops.lifecycle.sdk.util.RegistryUpdate incompatible with java.util.List
at com.ibm.aiops.flink.state.StateManager.lambda$serialize$10(StateManager.java:259)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1850)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:522)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:512)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:239)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
at com.ibm.aiops.flink.state.StateManager.serialize(StateManager.java:269)
at com.ibm.aiops.lifecycle.stateupgrade.state.StateManagerStateReader$1.collect(StateManagerStateReader.java:85)
at com.ibm.aiops.lifecycle.stateupgrade.state.StateManagerStateReader$1.collect(StateManagerStateReader.java:77)
at com.ibm.aiops.lifecycle.stateupgrade.strategies.v412.Version412ExecutorStateMapper.accept(Version412ExecutorStateMapper.java:141)
at com.ibm.aiops.lifecycle.stateupgrade.strategies.v412.Version412ExecutorStateMapper.accept(Version412ExecutorStateMapper.java:38)
at com.ibm.aiops.lifecycle.stateupgrade.state.StateManagerStateReader.readKey(StateManagerStateReader.java:75)
at com.ibm.aiops.lifecycle.stateupgrade.state.StateManagerStateReader.readKey(StateManagerStateReader.java:31)
at org.apache.flink.state.api.input.operator.KeyedStateReaderOperator.processElement(KeyedStateReaderOperator.java:76)
at org.apache.flink.state.api.input.operator.KeyedStateReaderOperator.processElement(KeyedStateReaderOperator.java:51)
at org.apache.flink.state.api.input.KeyedStateInputFormat.nextRecord(KeyedStateInputFormat.java:223)
... 4 more
Caused by: java.lang.ClassCastException: com.ibm.aiops.lifecycle.sdk.util.RegistryUpdate incompatible with java.util.List
at org.apache.flink.api.common.typeutils.base.ListSerializer.serialize(ListSerializer.java:42)
at com.ibm.aiops.flink.state.StateManager.lambda$serialize$5(StateManager.java:225)
at io.vavr.control.Try.run(Try.java:131)
at com.ibm.aiops.flink.state.StateManager.lambda$serialize$6(StateManager.java:225)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at com.ibm.aiops.flink.state.StateManager.lambda$serialize$10(StateManager.java:225)
... 21 more
This occurs when:
There are active policy executions at the time of the savepoint
Previous policies in those executions have stored global variables.
RESOLVING THE PROBLEM:
The hotfix is only applied on 4.13 release of AIOPS
image = "cp.icr.io/cp/cp4waiops/lifecycle-trigger@sha256:fe77ee33558a08361dcf5fad8dbabc06ac3c19c915efe196c1e75ca24db4b66a"
This hotfix patches the code that manages policy state upgrade and resolves the defect.
Applying Patch
Set your project for AIOps
oc project <AIOps project / namespace>Back up the lifecycle CSV:
oc get clusterserviceversions.operators.coreos.com "$(oc get subscriptions.operators.coreos.com ibm-aiops-ir-lifecycle -o jsonpath='{.status.installedCSV}')" -o yaml > ibm-aiops-ir-lifecycle-csv-back.yamlSet the new image to patch:
export PATCHED_AIOPS_IR_LIFECYCLE_IMAGE="cp.icr.io/cp/cp4waiops/lifecycle-trigger@sha256:fe77ee33558a08361dcf5fad8dbabc06ac3c19c915efe196c1e75ca24db4b66a"Patch the CSV with the updated image
oc patch clusterserviceversions.operators.coreos.com "$(oc get subscriptions.operators.coreos.com ibm-aiops-ir-lifecycle -o jsonpath='{.status.installedCSV}')" --type='json' -p="[{'op': 'replace', 'path': '/spec/install/spec/deployments/0/spec/template/metadata/annotations/olm.relatedImage.lifecycle-trigger', 'value': '${PATCHED_AIOPS_IR_LIFECYCLE_IMAGE}'}]"Wait for the new operator pod to start:
oc get po --watch | grep lifecycle-operator
This is complete when the new ir-lifecycle-operator pod is shown in the Running state.
6. Delete the current failed upgrade job to trigger a new one to be scheduled:
oc delete job -l app.kubernetes.io/component=policy-upgrade-job
Product Synonym
CP4AIOps
Was this topic helpful?
Document Information
Modified date:
20 April 2026
UID
ibm17270091