After upgrade, the metric-spark pod constantly restarts

Learn how to stabilize the metric-spark pod.

Problem

If you upgrade from Netcool® Operations Insight® version 1.6.10 to a later version, the metric-spark pod might constantly restart.

Resolution

To stabilize the metric-spark pod, complete the following steps:

  1. Scale down the following services:
    oc scale deploy <release-name>-metric-spark-service-metricsparkservice --replicas=0
    Where <release-name> is the release name, for example, evtmanager.
  2. Remove the /workdir/data/metric-checkpoint checkpointing folder from each minio instance with the following code:
    for P in $(oc get po|grep <release-name>-ibm-minio-|awk '{print $1}'); do oc rsh $P rm -rf /workdir/data/metric-checkpoint; echo -n "."; done
  3. Scale the following services back up:
    oc scale deploy <release-name>-metric-spark-service-metricsparkservice --replicas=1

After these instructions are completed, the metric-spark pod becomes stable. You can see new data in the metric search after a few minutes.