Operator installation or upgrade fails with DeadlineExceeded error
The IBM Cloud Pak foundational services
operator ClusterServiceVersion (CSV) status shows Failed
and its InstallPlan status shows Failed
after the subscription is created.
Symptom
The foundational services operator CSV status shows as Failed
. On the Red Hat® OpenShift® Container Platform cluster console, you see an error message similar to the following message:
Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline".
The InstallPlan status also shows as Failed
.
-
Get InstallPlans that have the
Failed
status.oc get subscription <failed-operator-subscription> -n <operator-namespace> -o jsonpath='{.status.installPlanRef}'
See the following sample output:
{"apiVersion":"operators.coreos.com/v1alpha1","kind":"InstallPlan","name":"install-cpqk2","namespace":"cloudpak-control","resourceVersion":"98650091","uid":"9f0210dd-a44f-454a-8b66-722b7520c838"}
-
Confirm the status of the InstallPlan that you got in the previous command output.
oc get installplan install-cpqk2 -n cloudpak-control -o jsonpath='{.status.installPlanRef}'
Following is a sample output:
Failed
-
Inspect the InstallPlan.
oc get installplan install-cpqk2 -n cloudpak-control -o yaml | grep "unpack job not completed"
See the following sample output:
bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline
Cause
Operator Lifecycle Manager (OLM) fails to unpack the operator bundle because the extract job failed (probably due to an issue of remote image access, or any other reason) and corrupted the configmap. Therefore, the operator manifest also most likely gets corrupted. When it is corrupted, any repeated installation attempts by using the same job and configmap fail.
In an air-gapped environment, in most cases the issue happens when the bundle image is unavailable in the private registry.
Resolution
For more information about how to resolve the issue, see Operator installation or upgrade fails with DeadlineExceeded in Red Hat OpenShift Container Platform 4.
Complete the following steps to resolve the issue:
-
Find the corresponding job and configmap in the namespace where the CatalogSource is deployed. Narrow down your search by using the operator name or any other keyword.
oc get job -n <catalog-namespace> -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("<failed-operator-name>")) | .metadata.name'
-
Delete the job and the corresponding configmap that you got in the previous command. In most cases, the job and configmap have the same name.
oc delete job <job-name> -n <catalog-namespace>
oc delete configmap <job-name> -n <catalog-namespace>
-
Delete the
Failed
InstallPlan.oc delete installplan <operator-installplan-name> -n <operator-namespace>
-
Delete the subscription and CSV of the
Failed
operator.oc delete subscription <name-of-the-operator-subscription> -n <operator-namespace>
oc delete csv <name-of-the-corresponding-CSV> -n <operator-namespace>
-
Delete the Operand Deployment Lifecycle Manager pod.
oc delete pod -l name=operand-deployment-lifecycle-manager -n <your-foundational-services-namespace>