The vmware-infra-* pods in CrashLoopBackOff error states

You can encounter an error where several pods are crashing, and restarting constantly. These pods can appear to be failing with an authentication error.

When this error occurs, your output when checking the pod status can resemble the following example:

NAMESPACE                                               NAME                                                                  READY   STATUS              RESTARTS   AGE
management-infrastructure-management                    1-vmware-infra-event-catcher-30-65d7bfd967-tcsrg                      0/1     CrashLoopBackOff    399        44h
management-infrastructure-management                    1-vmware-infra-operations-30-6dc9f87b75-mqfpz                         0/1     CrashLoopBackOff    402        44h
management-infrastructure-management                    1-vmware-infra-refresh-30-84fcd6cbd4-fc68w                            0/1     Error               412        44h

When you check the logs, these pods appear to be failing with an authentication error:

{"@timestamp":"2020-08-02T21:13:00.050303 ","hostname":"1-vmware-infra-refresh-30-84fcd6cbd4-fc68w","pid":7,"tid":"2ae9ee5f997c","level":"err","message":"MIQ(ManageIQ::Providers::Vmware::InfraManager::RefreshWorker::Runner) ID [2241] PID [7] GUID [403658a2-96a2-4823-b996-94cd6472dffd] EMS id [30] failed authentication check. Worker exiting."}

This error can happen when VMWare is unable to be accessed, which can be due to an authentication issue, network issue, or some other reason. When this connection issue occurs, the pods enter this state.

Solution: To resolve this error, complete the following steps:

  1. Fix the credentials or network issue, allowing the pods to recover.
  2. Delete the provider to stop the CrashLoopBackOff.