Known issues and limitations for Watson Discovery
The following known issues and limitations apply to the Watson Discovery service.
- Watson Discovery installation or upgrade does not complete because certain pods fail
- Unable to add documents during upgrade of Watson Discovery
- Gateway pods in a crash loop after upgrading Watson Discovery
- An etcd operator script fails while upgrading to Watson Discovery 4.8.4
- Watson Discovery orchestrator pods not starting because ResourceQuota is applied to the namespace
- Dictionary and Part of Speech facets are not shown in Content Mining projects
- Upgrade fails due to existing Elasticsearch 6.x indices
- During shutdown the DATASTOREQUIESCE field does not update
- UpgradeError is shown after resizing PVC
- Disruption of service after upgrading, restarting, or scaling by updating scaleConfig
- MinIO gets stuck in a loop after several installation attempts
Watson Discovery installation or upgrade does not complete because certain pods fail
Applies to: 4.8.2 and 4.8.5
- Error
-
The Watson Discovery installation or upgrade process does not complete because of certain pods failing.
NAME READY STATUS RESTARTS AGE wd-discovery-entity-suggestion-74dbf8764f-f4xbw 0/1 Running 33 (5m34s ago) 153m wd-discovery-wd-indexer-59c7d968d9-rrt4b 0/1 Running 7 (2m40s ago) 150m wd-discovery-hdp-worker-1 1/2 CrashLoopBackOff 32 (97s ago) 150m wd-discovery-hdp-worker-0 1/2 CrashLoopBackOff 32 (77s ago) 150m wd-discovery-converter-94788d69c-76qlk 0/1 Running 24 (5m39s ago) 149m wd-discovery-orchestrator-576bfbd4b7-r5xt4 0/1 CrashLoopBackOff 25 (3m4s ago)
- Cause
-
Certain pods fall into a
CrashLoopBackOff
state. - Solution
- For assistance with this issue, you can contact IBM® Support.
Unable to add documents during upgrade of Watson Discovery
Applies to: 4.8.3 or earlier
Fixed in: 4.8.4
- Error
-
While upgrading Watson Discovery from version 4.8.3 or earlier versions, Watson Discovery is unable to ingest documents because certain APIs return the 500 error. In addition, the
wd-discovery-crawler
pods fall into aCrashLoopBackOff
state until the upgrade is completed.
- Cause
-
This error occurs because certain APIs related to document ingestion are unable to communicate with Postgres during an upgrade.
- Solution
- Ingest documents after completion of the upgrade.
Gateway pods in a crash loop after upgrading Watson Discovery
Applies to: 4.8.4
- Error
-
After upgrading to Watson Discovery 4.8.4, you might observe that the Gateway pod is in a crash loop. Watson Discovery might also not report the updated version as expected.
- Cause
-
This error occurs as a result of an Out of Memory (OOM) issue.
- Solution
- Attempt to increase the memory
resources.
oc get csv | grep gateway oc edit csv oc patch csv/ibm-watson-gateway-operator.v1.0.26 --type json -p '[{ "op": "replace", "path":"metadata/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory","value":"2Gi" }]' oc patch csv/ibm-watson-gateway-operator.v1.0.26 --type json -p '[{ "op": "replace", "path":"metadata/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/requests/memory","value":"2Gi" }]'
You can edit the CSV file name according to your environment.
An etcd operator script fails while upgrading to Watson Discovery 4.8.4
Applies to: 4.8.4 and 4.8.5
- Error
-
During Watson Discovery upgrade from version 4.8.0 to version 4.8.4, the
Ready
status showsFalse
andReadyReason
showsIn Progress
for a long time.
You can verify etcd in# oc get wd -n zen NAME VERSION READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED QUIESCE DATASTOREQUIESCE AGE wd 4.8.4 False InProgress True VerifyWait 11/23 10/23 NOT_QUIESCED NOT_QUIESCED 2d17h
unverifiedComponents
of Watson Discovery CR.oc get wd -n<ns> -o yaml unverifiedComponents: etcd
Also, an error message similar to one of the following is displayed inibm-etcd-operator
pod logs or theibm-etcd-operator
logs:"msg": "An unhandled exception occurred while templating '{{ q('etcd_member', cluster_host= etcd_cluster_name + '-client.' + etcd_namespace + '.svc', cluster_port=etcd_client_port, ca_cert=tls_directory + '/etcd-ca.crt', cert_cert=tls_directory + '/etcd-client.crt', cert_key=tls_directory + '/etcd-client.key') }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while running the lookup plugin 'etcd_member'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Unable to fetch members. Error: 'Client' object has no attribute 'server_version_sem'. Unable to fetch members. Error: 'Client' object has no attribute 'server_version_sem'"
Symptom: TASK [etcdcluster : Enable authentication when secure client] ****************** [1;30mtask path: /opt/ansible/roles/etcdcluster/tasks/reconcile_pods.yaml:246�[0m /usr/local/lib/python3.8/site-packages/etcd3/baseclient.py:97: Etcd3Warning: cannot detect etcd server version 1. maybe is a network problem, please check your network connection 2. maybe your etcd server version is too low, required: 3.2.2+ warnings.warn(Etcd3Warning("cannot detect etcd server version\n" [0;31mfatal: [localhost]: FAILED! => {[0m [0;31m "msg": "An unhandled exception occurred while running the lookup plugin 'etcd_auth'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Enabling authentication failed. Error: 'Client' object has no attribute 'server_version_sem'"[0m [0;31m}[0m
- Cause
-
A script in the etcd operator that sets authentication might fail. When it fails, the etcd operator does not deploy with
authentication:enabled
in theetcdcluster
CR. This failure stops other components in the service from being upgraded and verified. - Solution
- Attempt to re-execute the etcd operator by
restarting the
etcdcluster
CR.- Get the name of the service
etcdcluster
.oc get etcdcluster | grep etcd <or name of the etcd cluster in the deployment>
- Delete the CR to allow the etcd operator
to re-execute
tasks.
oc delete etcdcluster <cluster>
- Wait until the
etcdcluster
andetcd
pods are re-created. - Check the status of
Ready
,Deployed
, andVerified
to make sure that the upgrade is successful.# oc get wd NAME VERSION READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED QUIESCE DATASTOREQUIESCE AGE wd 4.8.4 True Stable False Stable 23/23 23/23 NOT_QUIESCED NOT_QUIESCED 3d6h
- Get the name of the service
Watson
Discovery orchestrator pods not starting because ResourceQuota
is
applied to the namespace
Applies to: 4.8.2 and 4.8.3
Fixed in: 4.8.4
- Error
-
The
wd-discovery-orchestrator-setup
job fails to run because of an error similar to the following:Error creating: pods "wd-discovery-orchestrator-setup-m5r5s" is forbidden: failed quota: cpd-quota: must specify limits.cpu for: verify-resources; limits.memory for: verify-resources; requests.cpu for: verify-resources; requests.memory for: verify-resources'
- Cause
-
The
wd-discovery-orchestrator-setup
job does not run when aResourceQuota
is applied to the namespace where Watson Discovery is installed without setting theLimitRange
in theverify-resources
container for the following:limits.cpu
,limits.memory
,requests.cpu
, orrequests.memory
. - Solution
- Fix the error by setting a
LimitRange
for limits and requests.To set the
LimitRange
, complete the following steps:- Create a new YAML file by copying the following text. Save the YAML file in a location from
which you can access it in the next
step.
apiVersion: oppy.ibm.com/v1 kind: TemporaryPatch metadata: name: wd-orchestrator-setup-resource-patch spec: apiVersion: discovery.watson.ibm.com/v1 kind: WatsonDiscoveryOrchestrator name: wd patchType: patchStrategicMerge patch: orchestrator: job: spec: template: spec: containers: - name: verify-resources resources: limits: cpu: "1" ephemeral-storage: 1Gi memory: 512Mi requests: cpu: "0.2" ephemeral-storage: 1Mi memory: 256Mi
- Run the following command in the namespace where Watson
Discovery is
installed.
oc apply -f <yaml-file> -n "${PROJECT_CPD_INST_OPERANDS}"
- Wait until the following message appears in the Watson
Discovery pod
logs.
"msg": "Starting reconciliation of TemporaryPatch/wd-orchestrator-setup-resource-patch"
- Delete the
wd-discovery-orchestrator-setup
job.
The operator restarts the job with theoc delete job/wd-discovery-orchestrator-setup
LimitRange
for the limits and requests.
- Create a new YAML file by copying the following text. Save the YAML file in a location from
which you can access it in the next
step.
Dictionary and Part of Speech facets are not shown in Content Mining projects
Applies to: 4.8.0 and 4.8.2
- Error
-
In Content Mining projects, when you apply a dictionary annotator and one or more of the following enrichments to a collection, the dictionary and Part of Speech facets are not shown or appear empty.
- Entities v2
- Keywords
- Sentiment of Document
- Entity extractor
- Document classifier
- Cause
-
Dictionary and Part of Speech facets were unexpectedly removed from collections in Content Mining projects, resulting in this error.
- Solution
- Fix the error by applying a temporary patch.
To apply the patch, complete the following steps:
- Run the following
command:
cat << EOF | oc apply -f - apiVersion: oppy.ibm.com/v1 kind: TemporaryPatch metadata: name: drop-annotations-patch spec: apiVersion: discovery.watson.ibm.com/v1 kind: WatsonDiscoveryEnrichment name: wd patchType: patchStrategicMerge patch: enrichment-service: deployment: spec: template: spec: containers: - name: annotator-manager env: - name: DROP_POS_ANNOTATIONS value: "false" EOF
- Wait for a few minutes until the
wd-discovery-enrichment-service
pods restart. - Run
Rebuild index
for the collection.
In case you want to remove the temporary patch, run the following command:oc delete temporarypatch drop-annotations-patch
- Run the following
command:
Upgrade fails due to existing Elasticsearch 6.x indices
Applies to: 4.8.0 and later
- Error
- If the existing Elasticsearch cluster has indices created with Elasticsearch 6.x, then upgrading
Watson
Discovery to version 4.8.0 or later
fails.
> oc get wd wd NAME VERSION READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED QUIESCE DATASTOREQUIESCE AGE wd 4.8.0 False InProgress True VerifyWait 2/24 1/24 NOT_QUIESCED NOT_QUIESCED 63m
- Cause
- Watson Discovery checks for existence of deprecated version of indices in the Elasticsearch cluster when upgrading to version 4.8.0 or later.
- Solution
- To determine whether existing Elasticsearch 6.x indices are the cause of the upgrade failure,
verify the log of the
wd-discovery-es-detect-index
pod using the following command:
During shutdown the DATASTOREQUIESCE field does not update
Applies to: 4.7.0 and later
- Error
-
After successfully executing the cpd-cli manage shutdown command, the
DATASTOREQUIESCE
state in the Watson Discovery resource is stuck inQUIESCING
:
# oc get WatsonDiscovery wd -n "${PROJECT_CPD_INST_OPERANDS}"
NAME VERSION READY READYREASON UPDATING UPDATINGREASON DEPLOYED VERIFIED QUIESCE DATASTOREQUIESCE AGE
wd 4.7.3 True Stable False Stable 24/24 24/24 QUIESCED QUIESCING 16h
- Cause
-
Due to the way quiescing Postgres works, the Postgres pods are still running in background. This results in the metadata not updating in the Watson Discovery resource.
- Solution
- There is no fix for this. However, the state being stuck in
QUIESCING
does not affect the Watson Discovery operator.
UpgradeError is shown after resizing PVC
- Error
- After you edit the custom resource to change the size of a persistent volume claim for a data store, an error is shown.
- Cause
- You cannot change the persistent volume claim size of a component by updating the custom resource. Instead, you must change the size of the PVC on the persistent volume claim node after it is created.
- Solution
- To prevent the error, undo the changes that were made to the YAML file. For more information about the steps to follow to change the persistent volume claim size successfully, see Scaling an existing persistent volume claim size.
Disruption of service after upgrading, restarting, or scaling by
updating scaleConfig
- Error
- After upgrading, restarting, or scaling Watson
Discovery by updating
the
scaleConfig
parameter, the Elasticsearch component might become non-functional, resulting in disruption of service and data loss. - Cause
- The Elasticsearch component uses a quorum of pods to ensure availability when it completes search operations. However, each pod in the quorum must recognize the same pod as the leader of the quorum. The system can run into issues when more than one leader pod is identified.
- Solution
- To determine if confusion about the quorum leader pod is the cause of the issue, complete the
following steps:
- Log in to the cluster, and then set the namespace to the project where the Discovery resources are installed.
- Check each of the Elasticsearch pod with the role of
master
to see which pod it identifies as the quorum leader.
Each pod must list the same pod as the leader.oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd \ -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | while read i; do echo $i; oc exec $i \ -c elasticsearch -- bash -c 'curl -ksS "localhost:19200/_cat/master?v"'; echo; done
For example, in the following result, two different leaders are identified. Pods1
and2
identify pod2
as the leader. However, pod0
identifies itself as the leader.wd-ibm-elasticsearch-es-server-master-0 id host ip node 7q0kyXJkSJirUMTDPIuOHA 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-0 wd-ibm-elasticsearch-es-server-master-1 id host ip node L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2 wd-ibm-elasticsearch-es-server-master-2 id host ip node L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2
If you find that more than one pod is identified as the leader, contact IBM Support.
MinIO gets stuck in a loop after several installation attempts
- Error
- The message,
Cannot find volume "export" to mount into container "ibm-minio"
, is displayed during an upgrade of Watson Discovery from Version 4.6 or previous versions. When you check the status of the MinIO pods by using the following command:
Then, check the MinIO operator logs by using the following commands:oc get pods -l release=wd-minio -o wide
oc get pods -A | grep ibm-minio-operator
You see an error that is similar to either of the following messages in the logs:oc logs -n <namespace> ibm-minio-operator-XXXXX
ibm-minio/templates/minio-create-bucket-job.yaml failed: jobs.batch "wd-minio-discovery-create-bucket" already exists) and failed rollback: failed to replace object"
ibm-minio/templates/minio-create-bucket-job.yaml failed: jobs.batch "wd-minio-discovery-create-pvc" already exists) and failed rollback: failed to replace object"
- Cause
- A job that creates a storage bucket or PVC for MinIO and then is deleted after it completes, is not being deleted properly.
- Solution
- Complete the following steps to check whether an incomplete
create-bucket
job orcreate-pvc
job for MinIO exists. If so, delete the incomplete jobs so that the jobs can be recreated and can then run successfully.- Check for the MinIO jobs by using the following
commands:
oc get jobs | grep 'wd-minio-discovery-create-bucket'
oc get jobs | grep 'wd-minio-discovery-create-pvc'
- If an existing
create-bucket
job is listed in the response, delete the job by using the following command:oc delete job $(oc get jobs -oname | grep 'wd-minio-discovery-create-bucket')
- If an existing
create-pvc
job is listed in the response, delete the job by using the following command:oc delete job $(oc get jobs -oname | grep 'wd-minio-discovery-create-pvc')
- Verify that all of the MinIO pods start successfully by using the following
command:
oc get pods -l release=wd-minio -o wide
- Check for the MinIO jobs by using the following
commands:
Limitations
- The service supports single-zone deployments; it does not support multi-zone deployments.
- You cannot upgrade the Watson
Discovery service by using the
service-instance upgrade
command from the Cloud Pak for Data command-line interface. - You cannot use the Cloud Pak for Data OpenShift® APIs for Data Protection (OADP) backup and restore utility to do an offline backup and restore the Watson Discovery service. Online backup and restore with OADP is available.