Pod failures after upgrade or system restart
After an upgrade or a system restart, the kafka pods do not start, and the cassandra pods crash.
Problem
After Netcool® Operations Insight® on OpenShift® is upgraded or after a system restart, the kafka pod does not start, and the cassandra pods might repeatedly fail.
Resolution
- Restart the zookeeper and kafka
pods.
oc get pod |egrep "zoo|kafka" | awk '{print "kubectl delete pod "$1}'
- If the cassandra pods keep failing, gracefully restart all of them by following the procedure in Restart of all Cassandra pods causes errors for connecting services.
- If the topology-cassandra pods are failing, restart these pods with the
following
command.
oc delete pod release_name-topology-cassandra-number
Where <release_name> is the name of your deployment, as specified by the value used for name (Operator Lifecycle Manager UI Form view), or name in the metadata section of the noi.ibm.com_noihybrids_cr.yaml or noi.ibm.com_nois_cr.yaml files (YAML view).
#6709/#6831