Operator upgrade fails - OLM known issue
Operator upgrade fails because the service account is deleted.
This is an Operator Lifecycle Manager (OLM) known issue. For more information about the issue, see Red Hat bugzilla .
Symptom
You usually see this issue during upgrade of IBM Cloud Pak foundational services in your cluster.
The operator status shows as installing
, but the installation never completes. The operator pods show CrashLoopBackOff
status.
Verify the operator status by using these commands:
-
Check the status of the operator ClusterServiceVersion (CSV).
oc get csv -n <your-foundational-services-namespace>
Following is a sample output:
... ibm-platform-api-operator.v3.8.1 IBM Platform API 3.8.1 ibm-platform-api-operator.v3.7.2 Installing ...
-
Check the status of the operator pod.
oc get pod -n <your-foundational-services-namespace>
Following is a sample output:
... ibm-platform-api-operator-65f89cd85b-vz6t6 0/1 CrashLoopBackOff 8 18m ...
-
Check the event of the operator pod.
oc describe pod <pod-name> -n <your-foundational-services-namespace>
Following is an example command and output:
oc describe pod ibm-platform-api-operator-65f89cd85b-vz6t6 -n <your-foundational-services-namespace> Events: Type Reason Age From Message ---- ------ ---- ---- ------- ... Normal Started 33m (x3 over 34m) kubelet Started container ibm-platform-api-operator Warning FailedMount 33m (x7 over 34m) kubelet MountVolume.SetUp failed for volume "ibm-platform-api-operator-token-tgvqx" : secret "ibm-platform-api-operator-token-tgvqx" not found ...
Cause
The issue happens due to an intermittent race condition that is seen during the operator upgrade.
The OLM accidentally deletes the service account of the operator.
Operator upgrade is managed by two operators: the OLM operator and the Catalog operator.
During the upgrade, the OLM operator erroneously considers that the upgrade is successfully finished. However, the Catalog operator does not update the owner reference of the service account, which causes the service account to be deleted by the OLM operator.
Resolving the problem
Delete the pod that is in the CrashLoopBackOff
status.
oc delete pod <pod-name> -n <your-foundational-services-namespace>
Following is an example command:
oc delete pod ibm-platform-api-operator-65f89cd85b-vz6t6 -n <your-foundational-services-namespace>
After you delete the pod, the pod is re-created. The operator then successfully upgrades.