Pods display ImagePullBackOff for their status. They never get to a running status.
Note: With some instances of this error, pods are running initially, then, after a specific amount of time, for example; 24 hours or so, dockerconfigjson in the cnmon-pullsecret-deployable expires causing the
pod status to change to ImagePullBackOff for some pods. Refer to step 3 in the Resolving the problem section if the ImagePullBackOff error is due to dockerconfigjson expiration.
For example:
kubectl get pods -n cp4mcm-cloud-native-monitoring
NAME READY STATUS RESTARTS AGE
ibm-dc-autoconfig-operator-85d5885995-dv2jh 1/1 Running 0 5m53s
ibm-dc-autoconfig-operator-hw569 0/1 ContainerCreating 0 4m33s
job-ua-operator-jvxk4 0/1 ImagePullBackOff 0 4m33s
k8sdc-operator-868c8bcb55-q8fvm 0/1 ImagePullBackOff 0 4m33s
reloader-56f4d5dc76-8ds2c 0/1 ImagePullBackOff 0 4m33s
ua-operator-555db94dfb-zzjxf 0/1 ImagePullBackOff 0 4m33s
The reason for this ImagePullBackOff status can be:
You are using a self-signed certificate for your docker registry. The docker daemon that is used by Kubernetes on the managed cluster does not trust the self-signed certificate. As a result, a x509: certificate signed by unknown authority error is returned when you run the docker login command.
An incorrect or expired docker login credential exists in the cnmon-pullsecret-deployable, which results in an authentication required error.
You must instruct docker to trust the self-signed certificate that is used by your docker registry. To do this, copy the self-signed certificate to /etc/docker/certs.d/<your_registry_host_name>:<your_registry_host_port>/ca.crt on the managed cluster where you are deploying Cloud Native Monitoring.
First, get a detailed description of the pod with the ImagePullBackOff status. For example:
kubectl describe pod ibm-dc-autoconfig-operator-mc4vm -n cp4mcm-cloud-native-monitoring
If the error is the x509 certification error, for example:
...
Warning Failed <invalid> kubelet, worker1.stubbly.cp.fyre.ibm.com Failed to pull image "bordure-inf.fyre.ibm.com:5555/ibm-dc-autoconfig-operator:APM_202010271234": rpc error: code = Unknown desc = error pinging docker registry bordure-inf.fyre.ibm.com:5555: Get https://bordure-inf.fyre.ibm.com:5555/v2/: x509: certificate signed by unknown authority
Complete the following steps:
Note: If you are using self-signed certificate for your docker registry, you must complete these steps on the managed cluster where you are deploying Cloud Native Monitoring.
For a pure kubernetes cluster and OpenShift Container Platform version 3.x, run the following commands on all worker nodes one by one.
These commands create the /etc/docker/certs.d/ directory, copy your docker registry certificate file from your docker registry host to the managed cluster where you are deploying Cloud Native Monitoring, and then rename your docker
registry certificate file to /etc/docker/certs.d/<your_registry_host_name>:<your_registry_host_port>/ca.crt.
mkdir -p /etc/docker/certs.d/<your_registry_host_name>:<your_registry_host_port>
scp <your_registry_host_name>:/opt/registry/certs/domain.crt /etc/docker/certs.d/<your_registry_host_name>:<your_registry_host_port>/ca.crt
For example:
mkdir -p /etc/docker/certs.d/bordure-inf.fyre.ibm.com:5555
scp bordure-inf.fyre.ibm.com:/opt/registry/certs/domain.crt /etc/docker/certs.d/bordure-inf.fyre.ibm.com:5555/ca.crt
For OpenShift Container Platform version 4.x, ensure you are logged in to the cluster. Run the following commands to copy your docker registry certificate file from your docker registry host to the cluster, create configmap to save the certificate, and patch the OpenShift image configuration.
scp <your-registry-server>:/opt/registry/certs/domain.crt <path>/ca.crt
oc create configmap registry-config --from-file=<your_docker_registry_host>..<your_docker_registry_port>=<path>/ca.crt -n openshift-config
oc patch image.config.openshift.io/cluster --patch '{"spec":{"additionalTrustedCA":{"name":"registry-config"}}}' --type=merge
For example:
scp bordure-inf.fyre.ibm.com:/opt/registry/certs/domain.crt $HOME/ca.crt
oc create configmap registry-config --from-file=bordure-inf.fyre.ibm.com..5555=$HOME/ca.crt -n openshift-config
oc patch image.config.openshift.io/cluster --patch '{"spec":{"additionalTrustedCA":{"name":"registry-config"}}}' --type=merge
If the error is unauthorized: authentication required, for example:
...
Warning Failed <invalid> (x4 over 2s) kubelet, worker1.stubbly.cp.fyre.ibm.com Failed to pull image "bordure-inf.fyre.ibm.com:5555/ibm-dc-autoconfig-operator:APM_202010271234": rpc error: code = Unknown desc = Error reading manifest APM_202010271234 in bordure-inf.fyre.ibm.com:5555/ibm-dc-autoconfig-operator: unauthorized: authentication required
It might be caused by an invalid or expired docker-registry secret that you created in Configure Cloud Native Monitoring in the Monitoring operator. To try to
resolve this error:
cnmon-pull-secret that you created in Configure Cloud Native Monitoring in the Monitoring operator. After you recreate the secret,
the cnmon-pullsecret-deployable can be synchronized automatically. It might take up to fifteen minutes to complete.running status.