Troubleshooting the apply-olm command during installation or upgrade
When you run the cpd-cli
manage
apply-olm command to install or upgrade operators, the command
might fail for various reasons.
If the apply-olm command fails, Red Hat®
OpenShift®
Operator Lifecycle Manager (OLM) encountered a problem. A slow
Kubernetes API server, sequencing issues, or
timing issues can cause problems with OLM. Some of these problems might be
because of inconsistent operator metadata or defects with older OLM versions.
The following
sections provide guidance to help you diagnose and resolve problems with OLM so you can successfully install or
upgrade Cloud Pak for Data operators. Follow the
recommended order of tasks. If you complete these tasks out of order, you might create more problems
with your cluster or deployments. The following diagram shows the process you should follow when you
troubleshoot errors with the apply-olm command:
The apply-olm command fails
apply-olm command fails, see what type of
error is returned:| Error | Action |
|---|---|
| The error is a problem with a catalog source. | Complete the steps in Check the catalog sources. |
| The error is a problem with a subscription. | Complete the steps in Check the OLM operator logs. |
Check the catalog sources
- Inspect the catalog
sources:
for catsrc in $(oc get catalogsource -n ${PROJECT_CPD_INST_OPERATORS} \ --sort-by=.metadata.creationTimestamp -o name); \ do \ oc get $catsrc -n ${PROJECT_CPD_INST_OPERATORS} -o jsonpath='{.metadata.name},{.status.connectionState.lastObservedState}{"\n"}'; \ done -
Complete the appropriate step based on the output of the preceding step:
Error Action Catalog sources are Running.Complete the steps in Check the OLM operator logs. Catalog sources are not Running.Complete the steps in Inspect failed pod logs in the operators project for the instance.
Check the OLM operator logs
Inspect the logs for the OLM operator:
oc logs -n openshift-operator-lifecycle-manager \
$(oc get pods -n openshift-operator-lifecycle-manager -lapp=catalog-operator -o name) | grep ${PROJECT_CPD_INST_OPERATORS}
After you run this command, review messages that begin with ResolutionFailed constraints not satisfiable. Focus on the most recent messages in the logs. These messages might provide helpful information to identify problems with your OLM configuration.
If you're unable to identify and correct the problem based on the information in the logs, complete the steps in Check for orphaned CSVs and unbound subscriptions.
Inspect failed pod logs in the operators project for the instance
- Check whether pods are failing in the project where the Cloud Pak for Data operators are installed (
PROJECT_CPD_INST_OPERATORS):oc get pods -n ${PROJECT_CPD_INST_OPERATORS} | egrep -v -e "(.+)/\1" -e CompletedPods are considered failing if they are in any state other than
RunningorCompleted. - Inspect the logs of the failed pods that were returned in the previous step. Replace
<pod-name> with the name of the pod to
inspect:
oc describe pod <pod-name> -n ${PROJECT_CPD_INST_OPERATORS} -
Complete the appropriate step based on the output of the preceding step:
Error Action The error contains Bundle unpacking failed. Complete the steps in Follow Red Hat guidance for resolving bundle unpacking errors. The error is a problem with pulling the image. - Fix the error and delete the failed
pods:
oc delete pod <pod-name> -n ${PROJECT_CPD_INST_OPERATORS} - Rerun the
apply-olmcommand.
- Fix the error and delete the failed
pods:
Check for orphaned CSVs and unbound subscriptions
- Determine which subscriptions do not have a corresponding CSV
installed:
for sub in $(oc get sub -n ${PROJECT_CPD_INST_OPERATORS} \ --sort-by=.metadata.creationTimestamp -o name); \ do \ echo $sub = \ $(oc get $sub -n ${PROJECT_CPD_INST_OPERATORS} \ -o jsonpath='{.metadata.creationTimestamp}{"\t"}{.status.installedCSV}{"\n"}'); \ done - Review the output of the preceding command:
- Subscriptions that are bound to a CSV have the following
format:
subscription.operators.coreos.com/<operator-name> = <timestamp> <csv-name> - Subscriptions that are not bound to a CSV have the following
format:
subscription.operators.coreos.com/<operator-name> = <timestamp>
- Subscriptions that are bound to a CSV have the following
format:
- For each unbound subscription returned in the preceding step, check whether there are any
unbound CSVs and delete them:
- Check for a
CSV:
oc get -n ${PROJECT_CPD_INST_OPERATORS} \ --ignore-not-found -o name csv $(oc get -n ${PROJECT_CPD_INST_OPERATORS} packagemanifest $(oc get subscription <subscription-name> -n ${PROJECT_CPD_INST_OPERATORS} \ -o jsonpath='{.spec.name}') -o jsonpath='{.status.channels[*].currentCSV}') - Delete the CSV if it
exists:
oc delete csv <csv-name> -n ${PROJECT_CPD_INST_OPERATORS}
- Check for a
CSV:
- For each unbound subscription returned in step 2, run the following command to delete the
subscription:
oc delete subscription <subscription-name> -n ${PROJECT_CPD_INST_OPERATORS} - Restart the following pods in the
openshift-operator-lifecycle-managerproject:- Restart the
catalog-operatorpods:oc delete pods -n openshift-operator-lifecycle-manager -l app=catalog-operator - Restart the
olm-operatorpods:oc delete pods -n openshift-operator-lifecycle-manager -l app=olm-operator
- Restart the
- Confirm that the
olm-operatorpods areRunning:oc get pods -n openshift-operator-lifecycle-manager -l app=olm-operator - Complete the steps in Find and inspect remaining failed CSVs.
Find and inspect remaining failed CSVs
- Determine which CSVs are
failing:
for csv in $(oc get csv -n ${PROJECT_CPD_INST_OPERATORS} \ --sort-by=.metadata.creationTimestamp -o name); \ do \ echo -ne '.'; \ csv_status=$(oc get $csv -n ${PROJECT_CPD_INST_OPERATORS} -o jsonpath='{.status.phase}'); \ if [ "X${csv_status}" != "XSucceeded" ]; \ then \ echo; \ echo "CSV did not succeed: ${csv} status: ${csv_status}"; \ fi; \ done - Inspect any CSV that is not in the
Succeededphase:oc get csv <csv-name> -n ${PROJECT_CPD_INST_OPERATORS} -o yaml - In the
statussection of the CSV YAML file, review the most recent messages for any obvious errors.For example, the following message does not include
phase: Succeeded:lastTransitionTime: "2023-03-28T16:48:58Z" lastUpdateTime: "2023-03-28T16:48:59Z" message: 'installing: waiting for deployment ibm-cpd-ws-runtimes-operator to become ready: deployment "ibm-cpd-ws-runtimes-operator" not available: Deployment does not have minimum availability.'In this case, the deployment did not come up. This can occur if the cluster does not have sufficient resources. You must investigate why the problem occurred.
- If you don't find any errors in the preceding step, find the
InstallPlanthat introduced the failed CSV:for ip in $(oc get ip -n ${PROJECT_CPD_INST_OPERATORS} -o name); \ do \ echo $ip: $(oc get $ip -n ${PROJECT_CPD_INST_OPERATORS} -o yaml | grep <csv-name>); \ done -
In the
InstallPlan, search for messages that contain reason: InstallComponentFailed or phase: Failed.These messages might contain information that can help you identify the reason that theapply-olmcommand failed. Some common error messages you might see are:- Missing required status field
-
If you see a status...Required value message, a custom resource is missing a required
statusfield.For example, you might see a message similar to the following example:message: 'error validating existing CRs against new CRD''s schema for "fdbclusters.foundationdb.opencontent.ibm.com": error validating custom resource against new schema for FdbCluster zen/mdm-foundationdb-1655706402438170: [].status.stage_mirror: Required value'This message indicates that the
status.stage_mirrorfield is missing from themdm-foundationdb-1655706402438170custom resource in thezennamespace. To resolve this problem, add the appropriate value to thestatus.stage_mirrorfield in the indicated custom resource. Then, retry theapply-olmcommand. - Missing required spec field
-
If you see a spec...Required value message, a custom resource is missing a required
specfield.For example, you might see a message similar to the following example:message: 'error validating existing CRs against new CRD''s schema for "paservices.pa.cpd.ibm.com": error validating custom resource against new schema for PAService zen/ibm-planning-analytics-service: [].spec.version: Required value'This message indicates that the
spec.versionfield is missing from theibm-planning-analytics-servicecustom resource in thezennamespace. To resolve this problem, update theversionfield to include the release version that you are upgrading from. Then, retry theapply-olmcommand.
- Complete the steps in Recheck the OLM operator logs.
Recheck the OLM operator logs
- Inspect the logs for the OLM
operator:
oc logs -n openshift-operator-lifecycle-manager \ $(oc get pods -n openshift-operator-lifecycle-manager -lapp=catalog-operator -o name) | grep ${PROJECT_CPD_INST_OPERATORS} -
Complete the appropriate step based on the output of the preceding step:
Error Action There are no recent errors. Rerun the apply-olmcommand.The error contains Bundle unpacking failed. Complete the steps in Follow Red Hat guidance for resolving bundle unpacking errors. Any other error was returned. Complete the steps in Contact IBM Support to clean up OLM artifacts.
Follow Red Hat guidance for resolving bundle unpacking errors
- Complete the steps in the Red Hat OpenShift known issue documentation.
- Rerun the
apply-olmcommand.
Contact IBM Support to clean up OLM artifacts
You can clean up Cloud Pak for Data
OLM artifacts to ensure that all
stale subscriptions and CSVs are removed. The apply-olm
command performs these cleanup actions, but it's possible that some OLM artifacts are removed only with
explicit cleanup.
cpd-cli manage collect-state \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}After you run the command, send the output information to IBM Support. If necessary, your IBM Support team will help you clean up Cloud Pak for Data OLM artifacts.