Limitations and known issues in Watson Discovery
The following limitations and known issues apply to the Watson Discovery service.
- Unable to apply a user-trained SDU model to a collection with documents from external data sources
- Mirroring Watson service images fails with an Insufficient Scope error
- Error installing Watson Discovery when pulling images from a Version 4.6.1 private container registry
- Inaccurate status message from command line after upgrade
- UpgradeError is shown after resizing PVC
- Errored state is shown after upgrade
- Disruption of service after upgrade or restart
- RabbitMQ gets stuck in a loop after several installation attempts
- MinIO gets stuck in a loop after several installation attempts
- Attempted upgrade from early 4.0.x versions without quiescing
- Unable to upgrade from 4.0.x to 4.6 successfully
- Cannot update operators with a dependency on etcd, MinIO, or RabbitMQ
- Unable to modify the resources of the Postgres pods associated with Watson Discovery
- Watson Discovery installation receives a wd-discovery-haywire error due to an NSX plugin being installed as a CNI plugin on OpenShift
- Watson Discovery MinIO pods not starting because quota is applied to the namespace
- ETCD error when upgrading Watson Discovery from 4.5 to 4.6
- Retrieving Watson Discovery ElasticSearch PVCs during uninstallation
Limitations
- You cannot use the Cloud Pak for Data OpenShift APIs for Data Protection (OADP) backup and restore utility to back up and restore the Watson Discovery service. Instead, use the backup and restore process that is described in the product documentation on the IBM Cloud Docs site.
- The service supports single-zone deployments; it does not support multi-zone deployments.
- Watson Discovery cannot always reconcile temporary software patches. If you are asked by IBM Support to apply a patch, you must complete an additional step to make sure that the patch gets applied properly. For more information, see Applying a temporary patch.
- You cannot upgrade the Watson
Discovery service by using the
service-instance upgrade
command from the Cloud Pak for Data command-line interface.
Unable to apply a user-trained SDU model to a collection with documents from external data sources
Applies to 4.6.0 only
- Problem
-
When you create a collection that crawls an external data source, and then choose to create a user-trained Smart Document Understanding (SDU) model from the Identifying fields page, the SDU tool is not displayed. Instead, a message is displayed that says, “Come back later”.
- Resolving the problem
-
For a 4.6.0 deployment only, you can apply a patch that adds an updated version of the
wd-ingestion
operator to your deployment. To do so, run the following command:oc patch wd/wd --type=merge \ --patch='{"spec":{"ingestion":{"image":{"digest":"sha256:eef24fa7d8a43a23adb2db64121d397f985c6994629f0c0a853643f04cf0420a",\ "name":"wd-ingestion","tag":"14.6.0-11038"}}}}'
Mirroring Watson service images fails with an Insufficient Scope
error
Applies to: 4.6.0 - 4.6.2
Fixed in: 4.6.3
- Problem
- When you run the
cpd-cli manage mirror-images
command, the command fails with anInsufficient Scope
error. This problem occurs for the following services:- Watson Assistant
- Watson Discovery
- Watson Knowledge Studio
- Watson Speech services
This problem occurs because the command is trying to mirror the images for EDB Postgres Enterprise but you do not have a license for EDB Postgres Enterprise.
- Resolving the problem
- To mirror the service images to the private container registry:
- Run the
cpd-cli manage list-images
command to download the EDB Postgres CASE package:
Mirroring images directly to the private container registry
cpd-cli manage list-images \ --components=edb_cp4d \ --target_registry=${PRIVATE_REGISTRY_LOCATION}
Mirroring images using an intermediary container registry
cpd-cli manage list-images \ --components=edb_cp4d \ --target_registry=127.0.0.1:12443
- Replace the EDB Postgres Enterprise images
with the EDB Postgres images:
The cpd-cli uses the default location for the work directory.
sed -i -e '/edb-postgres-advanced/d' \ ./cpd-cli-workspace/olm-utils-workspace/work/offline/$VERSION/{component_name}/ibm-cloud-native-postgresql-*-images.csv
Changecomponent_name
to the appropriate component name from the following options:watson_assistant
watson_discovery
watson_ks
watson_speech
sed -i -e '/edb-postgres-advanced/d' \ ./cpd-cli-workspace/olm-utils-workspace/work/offline/$VERSION/{watson_assistant,watson_discovery}/ibm-cloud-native-postgresql-*-images.csv
The cpd-cli uses the CPD_CLI_MANAGE_WORKSPACE environment variable to determine the location of the work directory
sed -i -e '/edb-postgres-advanced/d' \ CPD_CLI_MANAGE_WORKSPACE/work/offline/$VERSION/{component_name}/ibm-cloud-native-postgresql-*-images.csv
Changecomponent_name
to the appropriate component name from the following options:watson_assistant
watson_discovery
watson_ks
watson_speech
- Run the
Error installing Watson Discovery when pulling images from a Version 4.6.1 private container registry
- Error
- When you install Watson
Discovery by pulling
images from a private container registry on Cloud Pak for Data Version 4.6.1, the installation does not complete successfully. When you check the pods and get
the etcd pod with a command such as
oc -n ${PROJECT_CPD_INSTANCE} get pod | grep wd-discovery-etcd
, anImagePullBackOff
error is returned. - Cause
- Watson Discovery did not release a 4.6.1 version of the software. Therefore, when you install Watson Discovery on Cloud Pak for Data Version 4.6.1, a 4.6.0 version of the Watson Discovery software is installed. The service defines the etcd operator to use in its custom resource (because the etcd operator doesn't always provide a default image). With the 4.6.1 release, a newer version of etcd is specified. As a result, the older version of the etcd image that is specified for Discovery is not mirrored to the private registry. This missing etcd image results in an error.
- Solution
- Mirror the etcd image to the private registry by completing the following steps:
- Set the following variables:
export VERSION=4.6.0 export COMPONENTS=opencontent_etcd
- Mirror etcd images to the registry by following the steps in the Mirroring images to a private container registry procedure.
- Delete the failed wd-discovery-etcd pods.
One or more new etcd pods, depending on your deployment type, are started.
- Confirm that the etcd pods are running successfully.
- Set the following variables:
Inaccurate status message from command line after upgrade
- Problem
- If you run the
cpd-cli service-instance upgrade
command from the Cloud Pak for Data command-line interface, and then use theservice-instance list
command to check the status of each service, the provision status for the service is listed asUPGRADE_FAILED
. - Cause of the problem
- When you upgrade the service, only the
cpd-cli manage apply-cr
command is supported. You cannot use thecpd-cli service-instance upgrade
command to upgrade the service. And after you upgrade the service with theapply-cr
method, the change in version and status is not recognized by theservice-instance
command. However, the correct version is displayed from the Cloud Pak for Data web client. - Resolving the problem
- No action is required. As long as you use the
cpd-cli manage apply-cr
method to upgrade the service as documented, the upgrade is successful and you can ignore the version and status information that is generated by thecpd-cli service-instance list
command.
UpgradeError is shown after resizing PVC
- Error
- After you edit the custom resource to change the size of a persistent volume claim for a data store, an error is shown.
- Cause
- You cannot change the persistent volume claim size of a component by updating the custom resource. Instead, you must change the size of the PVC on the persistent volume claim node after it is created.
- Solution
- To prevent the error, undo the changes that were made to the YAML file. For more information about the steps to follow to change the persistent volume claim size successfully, see Scaling an existing persistent volume claim size.
Errored state is shown after upgrade
This issue applies only when you upgrade to versions 4.6.0 and 4.6.2.
- Error
- After you run the
cpd-cli manage apply-olm --upgrade=true
command to upgrade the service to version 4.6, the ready state shows asFailed
with the reasonErrored
. - Cause
- Changes that are specific to the operator between minor versions cause errors during future reconciliation loops. The instance is operational but the operator is unable to complete successfully.
- Solution
- Complete the upgrade by using the command that updates the operator and operand at the same
time, which is the
cpd-cli manage apply-cr
command.
Disruption of service after upgrade or restart
- Error
- After an upgrade or restart, one or more pods in the cluster are in an
Init
state, or are intermittently in aCrashLoopBackOff
orRunning
state. - Cause
- The Elasticsearch component uses a quorum of pods to ensure availability when it completes search operations. However, each pod in the quorum must recognize the same pod as the leader of the quorum. The system can run into issues when more than one leader pod is identified.
- Solution
- To determine if confusion about the quorum leader pod is the cause of the issue, complete the
following steps:
- Log in to the cluster, and then set the namespace to the project where the Discovery resources are installed.
- Check each of the Elasticsearch pod with the role of
master
to see which pod it identifies as the quorum leader.
Each pod must list the same pod as the leader.oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd \ -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | while read i; do echo $i; oc exec $i \ -c elasticsearch -- bash -c 'curl -ksS "localhost:19200/_cat/master?v"'; echo; done
For example, in the following result, two different leaders are identified. Pods1
and2
identify pod2
as the leader. However, pod0
identifies itself as the leader.wd-ibm-elasticsearch-es-server-master-0 id host ip node 7q0kyXJkSJirUMTDPIuOHA 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-0 wd-ibm-elasticsearch-es-server-master-1 id host ip node L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2 wd-ibm-elasticsearch-es-server-master-2 id host ip node L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2
If you find that more than one pod is identified as the leader, complete the following steps to fix the problem:- Delete all but one of the
master
Elasticsearch pods, and then wait until new pods are started and become available. - Repeat the check described earlier to find out whether all Elasticsearch pods with the
master
role identify the same pod as the leader.
RabbitMQ gets stuck in a loop after several installation attempts
- Error
- After an initial installation or upgrade failure and repeated attempts to retry, the common
services RabbitMQ operator pod can get into a
CrashLoopBackOff
state. For example, the log might include the following types of messages:"error":"failed to upgrade release: post-upgrade hooks failed: warning: Hook post-upgrade ibm-rabbitmq/templates/rabbitmq-backup-labeling-job.yaml failed: jobs.batch "{%name}-ibm-rabbitmq-backup-label" already exists"
- Cause
- Resources for the RabbitMQ operator component must be fully removed before a new installation or upgrade is started. If too many attempts occur in succession, remaining resources can cause new installations to fail.
- Solution
-
- Delete the RabbitMQ backup label job from the previous installation or upgrade attempt. Look for
the name of the job in the logs. The name ends in
-ibm-rabbitmq-backup-label
.oc delete job {%name}-ibm-rabbitmq-backup-label -n ${PROJECT_CPD_INSTANCE}
- Check that the pod returns a
Ready
state.oc get pods -n PROJECT_CPFS_OPS | grep ibm-rabbitmq
- Delete the RabbitMQ backup label job from the previous installation or upgrade attempt. Look for
the name of the job in the logs. The name ends in
MinIO gets stuck in a loop after several installation attempts
- Error
- The message,
Cannot find volume "export" to mount into container "ibm-minio"
, is displayed during an installation or upgrade of Discovery. When you check the status of the MinIO pods by using the command,oc get pods -l release=wd-minio -o wide
, and then check the MinIO operator logs by using the commands,oc get pods -A | grep ibm-minio-operator
, and thenoc logs -n <namespace> ibm-minio-operator-XXXXX
, you see an error that is similar to the following message in the logs:ibm-minio/templates/minio-create-bucket-job.yaml failed: jobs.batch "wd-minio-discovery-create-bucket" already exists) and failed rollback: failed to replace object"
- Cause
- A job that creates a storage bucket for MinIO and then is deleted after it completes, is not being deleted properly.
- Solution
- Complete the following steps to check whether an incomplete
create-bucket
job for MinIO exists. If so, delete the incomplete job so that the job can be recreated and can then run successfully.- Check for the MinIO job by using the following
command:
oc get jobs | grep 'wd-minio-discovery-create-bucket'
- If an existing job is listed in the response, delete the job by using the following
command:
oc delete job $(oc get jobs -oname | grep 'wd-minio-discovery-create-bucket')
- Verify that all of the MinIO pods start successfully by using the following
command:
oc get pods -l release=wd-minio -o wide
- Check for the MinIO job by using the following
command:
Attempted upgrade from early 4.0.x versions without quiescing
- Error
- When you check the status of the upgrade, errors are shown and only 8 or so of the 24 components are ready.
- Cause
- If you upgraded Watson Discovery from version numbers 4.0.2 through 4.0.5 without first quiescing the service, you can run into issues with the upgrade process.
- Solution
- Complete the following steps to redo the upgrade:
- Revert the version of the service to the old version by using the following
command:
oc patch wd wd --type='merge' --patch '{"spec":{"version": "<old_version>"}}'
- Apply a temporary patch to modify the application configuration.
- Download the patch file named app-config-override-patch.yaml from the Watson Developer Cloud repository on GitHub.
- Use the following command to apply the
patch:
oc apply -f app-config-override-patch.yaml
- Upgrade the service by completing the Upgrading Watson Discovery from Version 4.0.x to Version 4.6 procedure.Attention: Be sure to complete the step to quiesce the service, and then check the status of the service before you run the upgrade command. Wait until the
QUIESCE
status showsQUIESCED
. - After the upgrade, be sure to complete the step to stop the quiesce of the service.
- Run the following commands to remove the files that were created by the temporary
patch.
oc patch temporarypatch wd-app-config-override-patch \ --type json --patch '[{ "op": "remove", "path": "/metadata/finalizers" }]' oc delete -f app-config-override-patch.yaml oc get crd | grep watsondiscovery | cut -d' ' -f1 | xargs -I{} oc annotate \ --overwrite {} wd oppy.ibm.com/temporary-patches-
- Revert the version of the service to the old version by using the following
command:
Unable to upgrade from 4.0.x to 4.6 successfully
- Error
-
An upgrade from a 4.0.x installation to 4.6 does not complete. The
READY
column showsFalse
andREADYREASON
showsErrored
and does not resolve. - Cause
-
Discovery defaults to creating a Development deployment type. However, you can override that default configuration by specifying a deployment type with the
spec.shared.deploymentType
setting.In 4.0.x releases, the
spec.shared.deploymentType
field with aStarter
value (which is equivalent to Development) was applied if you did not change it to Production. In a 4.6 installation, when usingcpd-cli manage
, Discovery sets thespec.shared.deploymentType
field toProduction
to create a Production-ready installation by default.Deployment types cannot be changed after an initial deployment. They cannot be changed during an upgrade either. If you had a Starter or Development deployment previously, you might have inadvertently created a
Production
deployment during the upgrade. This configuration mismatch will not work.If you aren't sure which deployment type was used for your 4.6 upgrade, you can check by completing the following steps:- Run the following command to check the current deployment
type:
If Production is returned, then you need to apply the workaround. If no value is returned, then you might not have specified a value and a Development deployment might be applied (because that is the internal default configuration).oc get WatsonDiscovery wd -ojsonpath='{.spec.shared.deploymentType}'
- Run the following command to check the number of persistent volume claims (PVCs) that were
created by the EDB PostgreSQL instance for your
installation:
If there is more than one, and theoc get pvc -lk8s.enterprisedb.io/cluster=wd-discovery-cn-postgres
AGE
of most recent one aligns with the time of the upgrade, it means that you used aProduction
deployment type during the upgrade by mistake.For example, the following response means the upgrade was configured to create a production installation because there are two PostgreSQL pods:NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE wd-discovery-cn-postgres-1 Bound pvc-169dd8cf-2c02-452d-8f2e-85ecf3ce31aa 30Gi RWO ocs-storagecluster-ceph-rbd 7d1h wd-discovery-cn-postgres-2 Bound pvc-e3056a68-2603-436a-bf01-30057c34ad1a 30Gi RWO ocs-storagecluster-ceph-rbd 41h
- Run the following command to check the current deployment
type:
- Solution
- To resolve the problem and continue the upgrade, complete the following steps:
- Do one of the following things:
- If
Production
was returned in the previous procedure, modify thespec.shared.deploymentType
field in the custom resource to match the value that was used during the original installation.oc patch WatsonDiscovery wd --type=merge \ --patch='{"spec":{"shared":"deploymentType": "Starter"}}'
Confirm that theStarter
deployment type is returned now by using the following command:
Theoc get WatsonDiscovery wd -ojsonpath='{.spec.shared.deploymentType}'
Development
andStarter
types are functionally the same, and both values are accepted by the service. - If an empty value was returned in the previous procedure, then the field was not specified in
the initial installation. In this case, you must remove the field in the current custom resource.
When you do so, the internal default setting of Development will be used, which is what you want in
this case. To remove the field, enter the following command:
oc patch WatsonDiscovery wd --type=json \ --patch='[{"op": "remove", "path": "/spec/shared/deploymentType"}]'
- If
- After the patch is applied, verify that EDB PostgreSQL is running with one instance only by
using the following
command:
oc get Cluster wd-discovery-cn-postgres
- If more than one instance is reported, set PostgreSQL in maintenance mode by using the following
command:
oc patch WatsonDiscovery wd --type=merge \ --patch='{"spec":{"postgres":{"quiesce":{"enabled": true}}}}'
PostgreSQL is in maintenance mode when the following command returnstrue
. You might need to wait a few minutes.oc get Cluster wd-discovery-cn-postgres \ -o jsonpath='{.spec.nodeMaintenanceWindow.inProgress}{"\n"}
- Remove the additional pods and persistent volume claims (PVCs) that are associated with the
instance. You got a list of these PVCs in an earlier step.
oc delete pod/wd-discovery-cn-postgres-2 pvc/wd-discovery-cn-postgres-2
- After the PVCs are removed, return PostgreSQL to normal operation by using the following
command:
oc patch WatsonDiscovery wd --type=merge \ --patch='{"spec":{"postgres":{"quiesce":{"enabled": false}}}}'
PostgreSQL is out of maintenance mode when the following command returnsfalse
.oc get Cluster wd-discovery-cn-postgres \ -o jsonpath='{.spec.nodeMaintenanceWindow.inProgress}{"\n"}
- Confirm that the state of the cluster is now
healthy.
oc get Cluster wd-discovery-cn-postgres
The upgrade resumes.
- Do one of the following things:
Cannot update operators with a dependency on etcd, MinIO, or RabbitMQ
Applies to: Upgrades from Version 4.0.x or 4.5.x to Version 4.6
When you run the cpd-cli
manage
apply-olm
command, the operator for one or more of the following
services might get stuck in the Installing
phase:
Service | MinIO | RabbitMQ | etcd |
---|---|---|---|
IBM® Match 360 | ✓ | ||
OpenPages® | ✓ | ||
Watson Assistant | ✓ | ✓ | ✓ |
Watson Discovery | ✓ | ✓ | ✓ |
Watson Knowledge Studio | ✓ | ✓ | |
Watson Speech services | ✓ | ✓ |
This issue can occur when the upgrade of the etcd operator, MinIO operator, or RabbitMQ operator fails. These dependencies use Helm-based operators. When the upgrade of a Helm-based operator fails, the failed version is not automatically deleted. If there is insufficient memory to retry the upgrade, the operators encounter an out-of-memory error and the upgrade fails.
- Diagnosing the problem
-
To determine which operator is causing the problem:
- If you are upgrading a service that has a dependency on RabbitMQ, check the status of the RabbitMQ operator:
- Check the last state of the
operator:
oc get pods -n $PROJECT_CPD_OPS \ -lapp.kubernetes.io/instance=ibm-rabbitmq-operator \ -ojsonpath='{.items[*].status.containerStatuses[*].lastState.terminated}'
- If the state includes
"exitCode":137,..."reason":"OOMKilled"
, then get the name of the operator:oc get csv -n "${PROJECT_CPD_OPS}" \ -loperators.coreos.com/ibm-rabbitmq-operator.${PROJECT_CPD_OPS}
You will need the name to resolve the problem. The name has the following format:
ibm-rabbitmq-operator.vX.X.X
.
- Check the last state of the
operator:
- If you are upgrading a service that has a dependency on MinIO, check the status of the MinIO operator:
- Check the last state of the
operator:
oc get pods -n $PROJECT_CPD_OPS \ -lapp.kubernetes.io/instance=ibm-minio-operator \ -ojsonpath='{.items[*].status.containerStatuses[*].lastState.terminated}'
- If the state includes
"exitCode":137,..."reason":"OOMKilled"
, then get the name of the operator:oc get csv -n "${PROJECT_CPD_OPS}" \ -loperators.coreos.com/ibm-minio-operator.${PROJECT_CPD_OPS}
You will need the name to resolve the problem. The name has the following format:
ibm-minio-operator.vX.X.X
.
- Check the last state of the
operator:
- If you are upgrading a service that has a dependency on etcd, check the status of the etcd operator:
- Check the last state of the
operator:
oc get pods -n $PROJECT_CPD_OPS \ -lapp.kubernetes.io/instance=ibm-etcd-operator \ -ojsonpath='{.items[*].status.containerStatuses[*].lastState.terminated}'
- If the state includes
"exitCode":137,..."reason":"OOMKilled"
, then get the name of the operator:oc get csv -n "${PROJECT_CPD_OPS}" \ -loperators.coreos.com/ibm-etcd-operator.${PROJECT_CPD_OPS}
You will need the name to resolve the problem. The name has the following format:
ibm-etcd-operator.vX.X.X
.
- Check the last state of the
operator:
- If you are upgrading a service that has a dependency on RabbitMQ, check the status of the RabbitMQ operator:
- Resolving the problem
-
If one or more pods are in the
CrashLoopBackOff
state, complete the following steps to resolve the problem:- Check the current
limits
andrequests
for the operator with pods that are in a poor state.If all of the operators were stuck, repeat this process for each operator.
- Set the
OP_NAME
environment variable to the name of the operator:export OP_NAME=<operator-name>
- Check the current
limits
for the operator:oc get csv -n ${PROJECT_CPD_OPS} ${OP_NAME} \ -ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.memory}'
- Check the current
requests
for the operator:oc get csv -n ${PROJECT_CPD_OPS} ${OP_NAME} \ -ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.requests.memory}'
- Choose the appropriate action based on the values returned by the preceding commands:
- If either the
limits
orrequests
are below1Gi
, continue to the next step. - If both values are above
1Gi
, then the cause of the problem was misdiagnosed. This solution will not resolve the issues you are seeing.
- If either the
- Set the
- Increase the memory
limits
andrequests
for the affected operator.If all of the operators are stuck, repeat this process for each operator.
- Create a JSON file named
patch.json
with the following content:[ { "op": "replace", "path":"/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/requests/memory", "value": "1Gi" }, { "op": "replace", "path":"/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory", "value": "1Gi" } ]
- Ensure that the
OP_NAME
environment variable is set to the correct operator name:echo ${OP_NAME}
- Patch the
operator:
oc patch csv -n ${PROJECT_CPD_OPS} ${OP_NAME} \ --type=json --patch="$(cat patch.json)"
- Confirm that the patch was successfully
applied:
oc get csv -n ${PROJECT_CPD_OPS} ${OP_NAME} \ -ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.memory}
The command should return
1Gi
.
Important: The patch is temporary. The memory settings apply only to the current deployment. The next time you update the operator, the settings are replaced by the default settings. - Create a JSON file named
- Check the current
Unable to modify the resources of the Postgres pods associated with Watson Discovery
This issue was fixed in the 4.6.5 release.
- Error
-
In Watson Discovery version 4.6, the Postgres pod's resource requests or limits can no longer be edited. Previously, a Watson Discovery operator was able to check
spec.postgres.database.resources
. - Cause
- Because the WatsonDiscovery CustomResourceDefinition does not have
.spec.postgres.resources
defined, modifications to the WatsonDiscovery CR are automatically reverted. This prevents changes from rolling out to the pods. - Solution
- Patch the WatsonDiscovery CustomResourceDefinition to include the new field prior to submitting
a patch to modify the Postgres pod resources. To verify whether the patch is already applied, run
the following
command:
oc get CustomResourceDefinition watsondiscoveries.discovery.watson.ibm.com \ --output jsonpath='{.spec.versions[].schema.openAPIV3Schema.properties.spec.properties.postgres.properties.resources}'
If the patch is already applied, the command returns
{"x-kubernetes-preserve-unknown-fields":true}
. If nothing returns, run the following command to patch the CustomResourceDefinition:oc patch CustomResourceDefinition watsondiscoveries.discovery.watson.ibm.com \ --type json \ --patch '[{"op":"add","path":"/spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/postgres/properties/resources","value":{"x-kubernetes-preserve-unknown-fields":true}}]'
Once the patch is applied, rerun the previous
oc get
command to verify that{"x-kubernetes-preserve-unknown-fields":true}
returns. Once the patch has been successfully applied, you can resume modifying the Postgres pod resource configuration.
Watson
Discovery installation receives a
wd-discovery-haywire
error due to an NSX plugin being installed as a CNI plugin on
OpenShift
- Error
- Installation of the Watson
Discovery service is
pending with the following error on
wd-discovery-haywire
:wd-discovery-haywire-56bc476b76-plv84.log ----- Name: wd-discovery-haywire-56bc476b76-plv84 Container: wd-discovery-haywire Namespace: cpd4-main-qi1001 Logs: {"@timestamp":"2023-02-28T05:02:24.692Z","message":"Listening on 50,051","logger_name":"com.ibm.watson.wire.notices.Server","thread_name":"main","level":"INFO"} {"@timestamp":"2023-02-28T05:02:57.307Z","message":"*** shutting down gRPC server since JVM is shutting down","logger_name":"com.ibm.watson.wire.notices.Server$1","thread_name":"Thread-6","level":"INFO"} {"@timestamp":"2023-02-28T05:02:57.312Z","message":"*** server shut down","logger_name":"com.ibm.watson.wire.notices.Server$1","thread_name":"Thread-6","level":"INFO"} [INFO tini (1)] Spawned child process 'java' with pid '7' [INFO tini (1)] Main child exited normally (with status '143') -----
- Cause
- Installing the NSX plugin as a OpenShift CNI Plug-in can cause the
ibm-ngix
pod not to access itself through its Service IP/DNS name. As a result, Watson Discovery gateway cannot receive incoming requests. - Solution
- Apply a temporary patch to allow
nginx
in the gateway pod connect the other container without the kubernetes service:- Download the temporary patch wd-gateway-service-patch.zip.
- Extract
wd-gateway-service-patch.yml
from the zip file. - Apply the temporary patch:
Wait for theoc apply -f wd-gateway-service-patch.yml
wd-discovery-gateway
pod to restart. - Create a Watson Discovery instance in Cloud Pak for Data.
If you would like to remove temporary patch, enter the command:
If the command didn't return anything, open another terminal and enter the following commands:oc delete temporarypatch.oppy.ibm.com wd-gateway-service-patch
oc patch temporarypatch.oppy.ibm.com wd-gateway-service-patch --type json --patch '[{ "op": "remove", "path": "/metadata/finalizers" }]' oc get crd | grep watsondiscovery | cut -d' ' -f 1 | xargs -I{} -t oc annotate {} wd --overwrite oppy.ibm.com/temporary-patches-
Watson Discovery MinIO pods not starting because quota is applied to the namespace
Applies to: 4.5.3 and later
- Problem
- The job
wd-minio-discovery-create-pvc
failed to complete when ResourceQuotas are applied to the namespace. When the job is described withoc describe job wd-minio-discovery-create-pvc
, there is aFailedCreate
event mentionedfailed quota
. Example:Warning FailedCreate 31m job-controller Error creating: pods "wd-minio-discovery-create-pvc-6shj8" is forbidden: failed quota: cpd-quota: must specify limits.cpu,limits.memory,requests.cpu,requests.memory
- Cause
- The MinIO Job cannot start if a ResourceQuota is applied to namespace but a LimitRange is not set due to the Job pod not having resources.requests or resources.limits configured.
- Solution
- Apply limit range with defaults for limits and requests. Modify the namespace in the following
yaml to the namespace where Cloud Pak for Data is
installed:
apiVersion: v1 kind: LimitRange metadata: name: cpu-resource-limits namespace: zen #Change it to the namespace where CPD is installed spec: limits: - default: cpu: 300m memory: 200Mi defaultRequest: cpu: 200m memory: 200Mi type: Container
ETCD error when upgrading Watson Discovery from 4.5 to 4.6
Applies to: 4.6.0 and later
- Problem
- After applying Watson
Discovery CR
(
cpd-cli manage apply-cr .. --version=4.6.x ..
), theetcdcluster
resourcewd-discovery-etcd
gets into a failed state due to invalid labels. You can verify this by checking the etcdcluster conditions:oc -n ${PROJECT_CPD_INSTANCE} get etcdcluster wd-discovery-etcd -o jsonpath="{.status.conditions}" --- [{"lastTransitionTime":"2023-03-30T12:45:58Z","message":"Failedtopatchobject:b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"StatefulSet.apps\\\\\"wd-discovery-etcd\\\\\"isinvalid:spec:Forbidden:updatestostatefulsetspecforfieldsotherthan\\'replicas\\',\\'template\\',\\'updateStrategy\\',\\'persistentVolumeClaimRetentionPolicy\\'and\\'minReadySeconds\\'areforbidden\",\"reason\":\"Invalid\",\"details\":{\"name\":\"wd-discovery-etcd\",\"group\":\"apps\",\"kind\":\"StatefulSet\",\"causes\":[{\"reason\":\"FieldValueForbidden\",\"message\":\"Forbidden:updatestostatefulsetspecforfieldsotherthan\\'replicas\\',\\'template\\',\\'updateStrategy\\',\\'persistentVolumeClaimRetentionPolicy\\'and\\'minReadySeconds\\'areforbidden\",\"field\":\"spec\"}]},\"code\":422}\\n'","reason":"Failed","status":"False","type":"Failure"}]
- Cause
- The ETCD operator does not recreate the statefulset on an immutable field change.
- Solution
- Manually delete the etcd statefulset to allow the operator to recreate it:
- Delete the etcd statefulset:
oc delete sts wd-discovery-etcd statefulset.apps "wd-discovery-etcd" deleted
- Wait for the sts to be recreated:
oc get sts wd-discovery-etcd NAME READY AGE wd-discovery-etcd 0/3 24s --- oc get etcdcluster wd-discovery-etcd -o jsonpath="{.status.conditions}" [{"ansibleResult":{"changed":3,"completion":"2023-05-16T18:57:17.965897","failures":0,"ok":39,"skipped":36},"lastTransitionTime":"2023-05-16T18:56:24Z","message":"Awaiting next reconciliation","reason":"Successful","status":"True","type":"Running"},{"lastTransitionTime":"2023-05-16T18:57:18Z","message":"Last reconciliation succeeded","reason":"Successful","status":"True","type":"Successful"},{"lastTransitionTime":"2023-05-16T18:56:24Z","message":"","reason":"","status":"False","type":"Failure"}]
- Delete the etcd statefulset:
Retrieving Watson Discovery ElasticSearch PVCs during uninstallation
Applies to: 4.0.x and later
- Problem
- During uninstallation, the following command fails to retrieve the Watson
Discovery ElasticSearch
PVCs:
oc get pvc -l 'app.kubernetes.io/name in (wd,discovery)'
- Cause
- The command fails if the labels changed when other ElasticSearch PVCs exist. The ElasticSearch
operator only updates the labels of the ElasticSearch PVCs if other ElasticSearch PVCs exist with a
different set of labels compared to the Watson
Discovery ones. Those PVCs also include the
ibm-es-data
label (either set toTrue
orFalse
). - Solution
- You can retrieve the Watson
Discovery
ElasticSearch PVCs by entering the
command:
oc get pvc | grep wd-ibm-elasticsearch
Delete any PVCs that are listed.