Upgrade troubleshooting

You can upgrade watsonx Orchestrate you might face issues. The following troubleshoots help you to proceed.

Kafka issue

Symptom
Kafka resource fails verification checks and does not become ready.
Root Cause
The Kafka resource might be in a corrupted or inconsistent state, preventing proper initialization.
Solution
Delete and re-create Kafka resource
  1. Remove the existing Kafka resource to allow it to re-create:
    oc delete kafka wo-watson-orchestrate-kafkaibm
  2. Wait for automatic recreation The Kafka operator automatically re-creates the resource. Monitor the recreation process:
    oc get kafka -w 
    Wait until the Kafka resource shows as Ready.

Role-binding conflicts during upgrade

Symptom
Upgrade process becomes stuck at specific components (wxdengine wo-milvus or UAB component).
Root Cause
Existing role bindings or roles conflict with the upgrade process, preventing proper component initialization.
Solution
  1. Delete role binding if stuck at wxdengine quo-milvus: If the upgrade is stuck at the wxdengine wo-milvus component, delete the conflicting role binding:
    oc delete rolebinding ibm-lakehouse-leader-election-rolebinding -n ${PROJECT_CPD_INST_OPERATORS}
  2. Delete the role if stuck at UAB component: If the upgrade is stuck at the UAB component, delete the conflicting role:
    oc delete role ibm-uab-ads-operator-role -n ${PROJECT_CPD_INST_OPERATORS}
  3. Monitor upgrade progress: After you deleted the conflicting resources, monitor the upgrade to help ensure that it proceeds:
    oc get pods -n ${PROJECT_CPD_INST_OPERATORS} -w

Image pull errors

Symptom
Pods fail to start with ImagePullBackOff or ErrImagePull errors.
Root Cause
Entitlement key secrets are missing, incorrect, or not properly configured in the required namespaces.
Solution
Verify Entitlement Key secrets
  1. Verify the entitlement key secret is present in the namespace:
    oc get secret ibm-entitlement-key -n ${PROJECT_CPD_INST_OPERATORS}
  2. Verify secret content: Check that the secret contains the correct entitlement key:
    oc get secret ibm-entitlement-key -n ${PROJECT_CPD_INST_OPERATORS} -o yaml
  3. Recreate secret if necessary: If the secret is missing or incorrect, re-create it with your valid entitlement key
    oc create secret docker-registry ibm-entitlement-key \
      --docker-server=cp.icr.io \
      --docker-username=cp \
      --docker-password= (your-entitlement-key) \
      -n ${PROJECT_CPD_INST_OPERATORS}

Upgrade failure on cpd-cli

Symptom
The cpd-cli upgrade command fails with constraint satisfaction errors that are related to the events operator or Watson Assistant operator.
Error message
'constraints not satisfiable: bundle ibm-watson-assistant-operator.v5.8.0 
requires an operator with package: ibm-elasticsearch-operator and with version 
in range: >=1.1.1474 2.0.0, subscription ibm-watson-assistant-operator-subscription 
exists, subscription ibm-watson-assistant-operator-subscription requires 
@existing/cpd-operators//ibm-watson-assistant-operator.v5.8.0'
reason: ConstraintsNotSatisfiable
Root Cause
Version conflicts between operator subscriptions prevent the upgrade from proceeding. Different upgrade paths require different cleanup steps.
Solution
Check and clean up events operator subscription
  1. Check the events operator subscription: Verify the current state of the events operator subscription:
    oc get subs ibm-events-operator -n ${PROJECT_CPD_INST_OPERATORS} -o yaml

Bootstrap job failure

Symptom
The bootstrap job fails with a backoff limit error, preventing watsonx Orchestrate from initializing properly.
Root Cause
The bootstrap job is exceeded its retry limit due to persistent failures, requiring manual intervention to reset.
Solution
Check and clean up events operator subscription
  1. Check bootstrap job status: Verify the bootstrap job status and check for backoff limit errors:
    oc get job wo-watson-orchestrate-bootstrap-job -n ${PROJECT_CPD_INST_OPERANDS}
    oc describe job wo-watson-orchestrate-bootstrap-job -n ${PROJECT_CPD_INST_OPERANDS}
  2. Delete the failed bootstrap job: If the job is failed with a backoff limit error, delete it to allow recreation:
    oc delete job wo-watson-orchestrate-bootstrap-job -n ${PROJECT_CPD_INST_OPERANDS}
  3. Wait for automatic recreation The operator automatically re-creates the bootstrap job. Monitor its progress:
    oc get job wo-watson-orchestrate-bootstrap-job -n ${PROJECT_CPD_INST_OPERANDS} -w