Known issues and limitations for Watson Discovery
The following known issues and limitations apply to the Watson Discovery service.
- The Watson Discovery CR (wd) gets stuck with 23/24 status when applying patch1 for the zen component
- Default noobaa resources might cause the OOMKilled error
- The Watson Discovery operator pod goes to CrashLoopBackOff status when a tethered project exists
- The Watson Discovery operator pod goes to CrashLoopBackOff status in GitOps with Argo CD install
- Error is displayed after applying the temporary patch
- During shutdown the DATASTOREQUIESCE field does not update
- Upgrade fails due to existing Elasticsearch 6.x indices
- UpgradeError is shown after resizing PVC
- Disruption of service after upgrading, restarting, or scaling by updating scaleConfig
The Watson
Discovery CR (wd) gets stuck with 23/24
status when applying patch1 for the zen component
Applies to: 5.3.1
- Error
-
When you apply patch1 for the zen component to the base version 5.3.1, Watson Discovery CR (
wd) gets stuck with23/24status.oc get wd -n zenNAME VERSION READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED QUIESCE DATASTOREQUIESCE AGE wd 5.3.1 False InProgress True VerifyWait 24/24 23/24 NOT_QUIESCED NOT_QUIESCED 13hoc get wd -n zen -o yaml... failedComponents: [] unverifiedComponents: - wire verified: 23/24 - Cause
-
The
zencomponent updates its secret with patch1. Certain Watson Discovery secrets that depend on thezensecret are not updated, which causes the ranker pods to crash.oc get secret -n zen --sort-by=.metadata.creationTimestamp | grep -E 'wd-discovery|zen-ca' | tail -rzen-ca-cert-secret kubernetes.io/tls 3 6h46m wd-discovery-jks-secret Opaque 3 10m wd-discovery-cert-manager-tls kubernetes.io/tls 3 7h10m ... wd-discovery-jks-secret Opaque 3 8h ... wd-discovery-cn-postgres16-ca Opaque 2 9h wd-discovery-cn-postgres16-app kubernetes.io/basic-auth 11 9h wd-discovery-cn-postgres16-su kubernetes.io/basic-auth 2 9hoc get pods -n zen | grep -E 'ranker-master|ranker-rest|training-crud'wd-discovery-ranker-master-67c8f665cd-njxqn 0/1 Running 1 (51s ago) 2m16s wd-discovery-ranker-master-67c8f665cd-pfggl 0/1 Running 1 (42s ago) 2m16s wd-discovery-ranker-rest-f49f897f-qkhkn 0/1 CrashLoopBackOff 2 (9s ago) 2m15s wd-discovery-ranker-rest-f49f897f-qrd28 0/1 CrashLoopBackOff 2 (6s ago) 2m15s ... - Solution
- To resolve this issue, complete the following steps:
- Manually delete the old secrets to refresh them, then restart the pods with
errors.
oc delete secret -n zen \ wd-discovery-jks-secret \ wd-discovery-cn-postgres16-ca \ wd-discovery-cn-postgres16-wd \ wd-discovery-cn-postgres16-replicationsecret "wd-discovery-jks-secret" deleted secret "wd-discovery-cn-postgres16-ca" deleted secret "wd-discovery-cn-postgres16-wd" deleted secret "wd-discovery-cn-postgres16-replication" deletedThese secrets are automatically recreated after a while.oc get secret -n zen | grep 'wd-discovery-cn-postgres16'wd-discovery-cn-postgres16-app kubernetes.io/basic-auth 11 14h wd-discovery-cn-postgres16-ca Opaque 2 45s wd-discovery-cn-postgres16-dockercfg-4tzt4 kubernetes.io/dockercfg 1 14h wd-discovery-cn-postgres16-replication kubernetes.io/tls 2 45s wd-discovery-cn-postgres16-su kubernetes.io/basic-auth 2 14h wd-discovery-cn-postgres16-wd Opaque 4 45s - Restart the ranker
pods.
oc delete pod -n zen -l 'app.kubernetes.io/component in (ranker-master,ranker-rest,training-crud)'pod "wd-discovery-ranker-master-674d455cb9-7h75d" deleted pod "wd-discovery-ranker-master-674d455cb9-lwf6h" deleted pod "wd-discovery-ranker-rest-f7b9b979d-nkvdw" deleted pod "wd-discovery-ranker-rest-f7b9b979d-w668r" deleted pod "wd-discovery-training-crud-6577558dd4-wdstf" deletedWait for the new ranker pods to start running.
oc get pods -n zen | grep -E 'ranker-master|ranker-rest|training-crud'wd-discovery-ranker-master-674d455cb9-lgjm6 1/1 Running 0 6m12s wd-discovery-ranker-master-674d455cb9-xwqfp 1/1 Running 0 6m11s wd-discovery-ranker-rest-f7b9b979d-9fz87 1/1 Running 0 5m26s wd-discovery-ranker-rest-f7b9b979d-bmjts 1/1 Running 0 5m25s wd-discovery-training-crud-6577558dd4-qkgvv 1/1 Running 0 3m33s wd-discovery-training-crud-6577558dd4-zlc9k 1/1 Running 0 6m25sThe
wdCR also gets ready after a while.oc get wd -n zenNAME VERSION READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED QUIESCE DATASTOREQUIESCE AGE wd 5.3.1 True Stable False Stable 24/24 24/24 NOT_QUIESCED NOT_QUIESCED 14h
- Manually delete the old secrets to refresh them, then restart the pods with
errors.
Default noobaa resources might cause the OOMKilled error
Applies to: 5.3.0
- Error
-
The noobaa resources might cause the
OOMKillederror due to insufficient memory. This error is triggered especially during the Watson Discovery installation or upgrade, as these operations require significant access to the noobaa storage.If you are using the noobaa backing store, run the following commands to verify and patch its resources. For other types of backing stores, refer to their respective documentation to adjust resource sizes.
oc get pods -n openshift-storagenoobaa-default-backing-store-noobaa-pod-2a4a1886 0/1 CrashLoopBackOff 104 (62s ago) 32hoc get pods -n openshift-storage noobaa-default-backing-store-noobaa-pod-2a4a1886 -o yaml... status: ... lastState: terminated: containerID: ... exitCode: 137 finishedAt: "2026-01-10T23:44:45Z" reason: OOMKilled startedAt: "2026-01-10T23:41:42Z" name: noobaa-agent ready: false restartCount: 102 started: false state: waiting: message: back-off 1m20s restarting failed container=noobaa-agent pod=noobaa-default-backing-store-noobaa-pod-2a4a1886_openshift-storage(68abc3f6-25d9-4ac6-87fa-c39f80f6e1af) reason: CrashLoopBackOffAs a result, the Watson Discovery pods that check noobaa contents might encounter download or access errors.oc logs wd-discovery-orchestrator-setup-r9brl -c verify-resourcesVerifying common-zen-wd/mt/__built-in-tenant__/fileResource/701db916-fc83-57ab-0000-000000000010.zip Check if common-zen-wd exists ... Check if object exists Read timeout on endpoint URL: "https://s3.openshift-storage.svc:443/common-zen-wd?list-type=2&prefix=mt%2F__built-in-tenant__%2FfileResource%2F701db916-fc83-57ab-0000-000000000010.zip&delimiter=%2F&encoding-type=url" Object does not exist Retry after 60 seconds - Cause
-
Not enough resource requests for noobaa.
- Solution
- Increase memory size by patching the following
resources.
oc patch -n openshift-storage backingStore/noobaa-default-backing-store --type merge --patch '{ "spec": { "pvPool": { "resources": { "requests": { "memory": "1Gi" }, "limits": { "memory": "1Gi" } } } } }' oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{ "spec": { "resources": { "noobaa-endpoint": { "limits": { "memory": "4Gi" }, "requests": { "memory": "4Gi" } } } } }'
The Watson
Discovery operator pod goes to CrashLoopBackOff status when a
tethered project exists
Applies to: 5.3.0
Fixed in: 5.3.1
- Error
-
When you install Watson Discovery in an environment with a tethered namespace, the operator pod goes to a
CrashLoopBackOffstate.oc get pod -l icpdsupport/addOnId=discovery,icpdsupport/app=operator -n <operator_namespace>wd-discovery-operator-7df77755d4-647dp 0/1 CrashLoopBackOff 21 (3m1s ago) 100m - Solution
- Apply the following patch to the Watson
Discovery operator deployment. Replace
all <operator_namespace> and <operand_namespace> with the appropriate values for your
environment:
oc patch deployment wd-discovery-operator -n <operator_namespace> --type='strategic' -p='{"spec":{"template":{"spec":{"containers":[{"name":"manager","env":[{"name":"WATCH_NAMESPACE","value":"<operator_namespace>,<operand_namespace>","valueFrom":null}]}]}}}}'
The Watson
Discovery operator pod goes to CrashLoopBackOff status in
GitOps with Argo CD install
Applies to: 5.3.0
- Error
-
When you install Watson Discovery in GitOps with Argo CD, the operator pod goes to
CrashLoopBackOff.oc get pod -l icpdsupport/addOnId=discovery,icpdsupport/app=operator -n ${operatorNS}wd-discovery-operator-7df77755d4-647dp 0/1 CrashLoopBackOff 21 (3m1s ago) 100m - Solution
- To resolve this issue, complete the following steps:
- Apply the following patch to the namespaced Watson
Discovery
application.
oc patch application.argoproj.io "watson-discovery${appSuffix}" -n ${argocdNS} --type=merge -p '{"spec":{"source":{"helm":{"valuesObject":{"watsonDiscovery":{"operator":{"watchNamespaces":["'"${operatorNS}"'","'"${instanceNS}"'"]}}}}}}}' - Sync the Watson
Discovery application in the Argo CD UI or run the following
command:
argocd app sync "watson-discovery${appSuffix}"
- Apply the following patch to the namespaced Watson
Discovery
application.
Error is displayed after applying the temporary patch
Applies to: 5.3.0
- Error
-
After applying the temporary patch, an error is displayed.
oc get temporarypatches.oppy.ibm.comNAME READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED AGE temporary-patch False Errored False Errored 1/1 1/1 4sAfter a while, an error is displayed in the status as well.oc get wdNAME VERSION READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED QUIESCE DATASTOREQUIESCE AGE wd 5.2.1 False ConfigError False Errored 24/24 24/24 NOT_QUIESCED NOT_QUIESCED 11h - Cause
- This issue is caused when the Watson Discovery operator deletes the spec field of the temporary patch.
- Solution
- Apply the same patch again to fill the spec field. Then, delete the Watson Discovery operator to restart
it.
oc delete pod -l icpdsupport/addOnId=discovery,icpdsupport/app=operator -n ${PROJECT_CPD_INST_OPERATORS}
During shutdown the DATASTOREQUIESCE field does not update
- Error
- After you run the
cpd-cli manage shutdowncommand, theDATASTOREQUIESCEstate in the Watson Discovery resource is stuck inQUIESCING.Theshutdowncommand completes successfully. However, when you check the status of theWatsonDiscovery wdcustom resource (oc get WatsonDiscovery wd -n "${PROJECT_CPD_INST_OPERANDS}"), the command returns:NAME VERSION READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED QUIESCE DATASTOREQUIESCE AGE wd 4.7.3 True Stable False Stable 24/24 24/24 QUIESCED QUIESCING 16h
- Cause
- Due to the way quiescing Postgres works, the Postgres pods are still running in background. This results in the metadata not updating in the Watson Discovery resource.
- Solution
- There is no fix for this. However, the state being stuck in
QUIESCINGdoes not affect the Watson Discovery operator.
Upgrade fails due to existing Elasticsearch 6.x indices
Applies to: 5.3.0
- Error
- If the existing Elasticsearch cluster has indices created with Elasticsearch 6.x, then upgrading
Watson
Discovery to Version 5.0.0 and later
fails.
> oc get wd wd NAME VERSION READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED QUIESCE DATASTOREQUIESCE AGE wd 4.8.0 False InProgress True VerifyWait 2/24 1/24 NOT_QUIESCED NOT_QUIESCED 63m - Cause
- Watson Discovery checks for existence of deprecated version of indices in the Elasticsearch cluster when upgrading to Version 5.0.0 and later.
- Solution
- To determine whether existing Elasticsearch 6.x indices are the cause of the upgrade failure,
verify the log of the
wd-discovery-es-detect-indexpod using the following command:
UpgradeError is shown after resizing PVC
Applies to: 5.3.0
- Error
- After you edit the custom resource to change the size of a persistent volume claim for a data store, an error is shown.
- Cause
- You cannot change the persistent volume claim size of a component by updating the custom resource. Instead, you must change the size of the PVC on the persistent volume claim node after it is created.
- Solution
- To prevent the error, undo the changes that were made to the YAML file. For more information about the steps to follow to change the persistent volume claim size successfully, see Scaling an existing persistent volume claim size.
Disruption of service after upgrading, restarting, or scaling by
updating scaleConfig
Applies to: 5.3.0
- Error
- After upgrading, restarting, or scaling Watson
Discovery by updating
the
scaleConfigparameter, the Elasticsearch component might become non-functional, resulting in disruption of service and data loss. - Cause
- The Elasticsearch component uses a quorum of pods to ensure availability when it completes search operations. However, each pod in the quorum must recognize the same pod as the leader of the quorum. The system can run into issues when more than one leader pod is identified.
- Solution
- To determine if confusion about the quorum leader pod is the cause of the issue, complete the
following steps:
- Log in to the cluster, and then set the namespace to the project where the Discovery resources are installed.
- Check each of the Elasticsearch pod with the role of
masterto see which pod it identifies as the quorum leader.
Each pod must list the same pod as the leader.oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd \ -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | while read i; do echo $i; oc exec $i \ -c elasticsearch -- bash -c 'curl -ksS "localhost:19200/_cat/master?v"'; echo; doneFor example, in the following result, two different leaders are identified. Pods1and2identify pod2as the leader. However, pod0identifies itself as the leader.wd-ibm-elasticsearch-es-server-master-0 id host ip node 7q0kyXJkSJirUMTDPIuOHA 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-0 wd-ibm-elasticsearch-es-server-master-1 id host ip node L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2 wd-ibm-elasticsearch-es-server-master-2 id host ip node L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2
If you find that more than one pod is identified as the leader, contact IBM Support.
Limitations
- Formulas that are embedded as images, especially those containing division bars (horizontal fractions) or other complex notations, are not reliably recognized or extracted by Watson Discovery. As a result, these formulas might be omitted, misinterpreted, or rendered incorrectly in the extracted output. This limitation stems from how the SDU pipeline handles embedded images, and currently affects all versions of Watson Discovery that use SDU.
- The service supports single-zone deployments; it does not support multi-zone deployments.
- You cannot upgrade the Watson
Discovery service by using the
service-instance upgradecommand from the Cloud Pak for Data command-line interface. - You cannot use the Cloud Pak for Data OpenShift® APIs for Data Protection (OADP) backup and restore utility to do an offline backup and restore the Watson Discovery service. Online backup and restore with OADP is available.