Operator installation or upgrade fails with DeadlineExceeded error
IBM Cloud Paks foundational services operator ClusterServiceVersion (CSV) status shows Failed and its InstallPlan status shows Failed after the subscription gets created.
Symptom
The foundational services operator CSV status shows as
Failed
. On the Red Hat® OpenShift® Container Platform cluster console, you see an error message
similar to the following
message:Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline".The InstallPlan status also shows as
Failed
. - Get InstallPlans that have the
Failed
status.
See the following sample output:oc get subscription <failed-operator-subscription> -n <operator-namespace> -o jsonpath='{.status.installPlanRef}'
{"apiVersion":"operators.coreos.com/v1alpha1","kind":"InstallPlan","name":"install-cpqk2","namespace":"cloudpak-control","resourceVersion":"98650091","uid":"9f0210dd-a44f-454a-8b66-722b7520c838"}
- Confirm the status of the InstallPlan that you got in the previous command output.
Following is a sample output:oc get installplan install-cpqk2 -n cloudpak-control -o jsonpath='{.status.installPlanRef}'
Failed
- Inspect the InstallPlan.
See the following sample output:oc get installplan install-cpqk2 -n cloudpak-control -o yaml | grep "unpack job not completed"
bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline
Cause
Operator Lifecycle Manager (OLM) fails to unpack the operator bundle because the extract job failed (probably due to an issue of remote image access, or any other reason) and corrupted the configmap. Therefore, the operator manifest also most likely gets corrupted. When it is corrupted, any repeated installation attempts by using the same job and configmap fail.
In an air-gapped environment, in most cases the issue happens when the bundle image is unavailable in the private registry.
Resolution
For more information about how to resolve the issue, see Operator installation or upgrade fails with DeadlineExceeded in Red
Hat OpenShift Container Platform. Complete the following steps to resolve the issue:
- Find the corresponding job and configmap in the namespace where the CatalogSource is deployed.
Narrow down your search by using the operator name or any other
keyword.
oc get job -n <catalog-namespace> -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("<failed-operator-name>")) | .metadata.name'
- Delete the job and the corresponding configmap that you got in the previous command. In most
cases, the job and configmap have the same
name.
oc delete job <job-name> -n <catalog-namespace>
oc delete configmap <job-name> -n <catalog-namespace>
- Delete the
Failed
InstallPlan.oc delete installplan <operator-installplan-name> -n <operator-namespace>
- Delete the subscription and CSV of the
Failed
operator.oc delete subscription <name-of-the-operator-subscription> -n <operator-namespace>
oc delete csv <name-of-the-corresponding-CSV> -n <operator-namespace>
- Delete the Operand Deployment Lifecycle Manager
pod.
oc delete pod -l name=operand-deployment-lifecycle-manager -n <your-foundational-services-namespace>