IBM Support

Content Analyzer RabbitMQ pods do not return to Running/Ready state after system shutdown

Troubleshooting


Problem

After a Business Automation Content Analyzer (ACA) system is restarted, the RabbitMQ pods do not return to a running and ready state. The exact status might show as ImagePullBackOff,
CrashLoopBackOff, ErrImagePull, Running but not Ready, or some other status.

Cause

When all pods are starting up simultaneously after a system restart, there are many actions that need to be taken by the system to accomplish restarting. This process can strain the resources for some systems and lead to pods not successfully starting up.  The RabbitMQ pods are sensitive to this issue.

Diagnosing The Problem

Running oc get pods shows the RabbitMQ pods in an error state, or running but not ready.

Running oc describe pod (pod name) can provide a more specific error message in the Events section.

Resolving The Problem

  1. Wait at least 30 minutes before you take any corrective action.  The pods are continually trying to restart and the issue can self-resolve after some time.  If the issue continues, go through the following steps one at a time, waiting a few minutes between each step and continuing to the remaining steps only if the problem persists. 
  2. In the Events section, if 'unauthorized: authentication required' is displayed, then the registry secret is expired and must be regenerated.
    • kubectl delete secret (registry secret name)
    • kubectl -n (namespace) create secret docker-registry (registry secret name) --docker-server=(docker-registry) --docker-username=(username) --docker-password=$(oc whoami -t) --docker-email=' ' --dry-run -o yaml | kubectl apply -f -
  3. Delete the pods
    • oc delete pods (pod name)
  4. Delete the RabbitMQ mnesia folder by using one of two methods.
    • Deleting the mnesia file folder.
      • cd (data folder as defined by data pvc)/rabbitmq
      • rm -rf mnesia
      • oc delete pods (pod name)
    • Deleting the mnesia folder within the RabbitMQ pods.
      • oc exec -ti (pod name) bash
      • rm -rf /var/lib/rabbitmq/mnesia
      • exit
      • oc delete pods (pod name)

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSUM7G","label":"IBM Business Automation Content Analyzer on Cloud"},"Component":"","Platform":[{"code":"PF040","label":"RedHat OpenShift"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
19 March 2020

UID

ibm16091150