Pod failures after upgrade or system restart

After an upgrade or a system restart, the kafka pods do not start, and the cassandra pods crash.

Problem

After Netcool® Operations Insight® on OpenShift® is upgraded or after a system restart, the kafka pod does not start, and the cassandra pods might repeatedly fail.

Resolution

  1. Restart the zookeeper and kafka pods.
    oc get pod |egrep "zoo|kafka" | awk '{print "kubectl delete pod "$1}'
  2. If the cassandra pods keep failing, gracefully restart all of them by following the procedure in Restart of all Cassandra pods causes errors for connecting services.
  3. If the topology-cassandra pods are failing, restart these pods with the following command.
    oc delete pod release_name-topology-cassandra-number

    Where <release_name> is the name of your deployment, as specified by the value used for name (Operator Lifecycle Manager UI Form view), or name in the metadata section of the noi.ibm.com_noihybrids_cr.yaml or noi.ibm.com_nois_cr.yaml files (YAML view).

Draft comment: LOUISERoberts
#6709/#6831