Pods displaying ImagePullBackOff status

Problem

Pods display ImagePullBackOff for their status. They never get to a running status.

Note: With some instances of this error, pods are running initially, then, after a specific amount of time, for example; 24 hours or so, dockerconfigjson in the cnmon-pullsecret-deployable expires causing the pod status to change to ImagePullBackOff for some pods. Refer to step 3 in the Resolving the problem section if the ImagePullBackOff error is due to dockerconfigjson expiration.

For example:

  kubectl get pods -n cp4mcm-cloud-native-monitoring

  NAME                                               READY   STATUS              RESTARTS   AGE
  ibm-dc-autoconfig-operator-85d5885995-dv2jh        1/1     Running             0          5m53s
  ibm-dc-autoconfig-operator-hw569                   0/1     ContainerCreating   0          4m33s
  job-ua-operator-jvxk4                              0/1     ImagePullBackOff    0          4m33s
  k8sdc-operator-868c8bcb55-q8fvm                    0/1     ImagePullBackOff    0          4m33s
  reloader-56f4d5dc76-8ds2c                          0/1     ImagePullBackOff    0          4m33s
  ua-operator-555db94dfb-zzjxf                       0/1     ImagePullBackOff    0          4m33s

Cause

The reason for this ImagePullBackOff status can be:

  1. You are using a self-signed certificate for your docker registry. The docker daemon that is used by Kubernetes on the managed cluster does not trust the self-signed certificate. As a result, a x509: certificate signed by unknown authority error is returned when you run the docker login command.

  2. An incorrect or expired docker login credential exists in the cnmon-pullsecret-deployable, which results in an authentication required error.

Resolving the problem

You must instruct docker to trust the self-signed certificate that is used by your docker registry. To do this, copy the self-signed certificate to /etc/docker/certs.d/<your_registry_host_name>:<your_registry_host_port>/ca.crt on the managed cluster where you are deploying Cloud Native Monitoring.

  1. First, get a detailed description of the pod with the ImagePullBackOff status. For example:

    kubectl describe pod ibm-dc-autoconfig-operator-mc4vm -n cp4mcm-cloud-native-monitoring
    
  2. If the error is the x509 certification error, for example:

    ...
    Warning  Failed     <invalid>                      kubelet, worker1.stubbly.cp.fyre.ibm.com  Failed to pull image "bordure-inf.fyre.ibm.com:5555/ibm-dc-autoconfig-operator:APM_202010271234": rpc error: code = Unknown desc = error pinging docker registry bordure-inf.fyre.ibm.com:5555: Get https://bordure-inf.fyre.ibm.com:5555/v2/: x509: certificate signed by unknown authority
    

    Complete the following steps:

    Note: If you are using self-signed certificate for your docker registry, you must complete these steps on the managed cluster where you are deploying Cloud Native Monitoring.

  3. If the error is unauthorized: authentication required, for example:

    ...
    Warning  Failed     <invalid> (x4 over 2s)  kubelet, worker1.stubbly.cp.fyre.ibm.com  Failed to pull image "bordure-inf.fyre.ibm.com:5555/ibm-dc-autoconfig-operator:APM_202010271234": rpc error: code = Unknown desc = Error reading manifest APM_202010271234 in bordure-inf.fyre.ibm.com:5555/ibm-dc-autoconfig-operator: unauthorized: authentication required
    

    It might be caused by an invalid or expired docker-registry secret that you created in Configure Cloud Native Monitoring in the Monitoring operator. To try to resolve this error: