Limitations and known issues in Watson Discovery

The following limitations and known issues apply to the Watson Discovery service.

Limitations

The following limitations apply to the Watson Discovery service:
  • You cannot use the Cloud Pak for Data OpenShift APIs for Data Protection (OADP) backup and restore utility to back up and restore the Watson Discovery service. Instead, use the backup and restore process that is described in the product documentation on the IBM Cloud Docs site.
  • The service supports single-zone deployments; it does not support multi-zone deployments.
  • Watson Discovery cannot always reconcile temporary software patches. If you are asked by IBM Support to apply a patch, you must complete an additional step to make sure that the patch gets applied properly. For more information, see Applying a temporary patch.
  • You cannot upgrade the Watson Discovery service by using the service-instance upgrade command from the Cloud Pak for Data command-line interface.

Unable to apply a user-trained SDU model to a collection with documents from external data sources

Applies to 4.6.0 only

Problem

When you create a collection that crawls an external data source, and then choose to create a user-trained Smart Document Understanding (SDU) model from the Identifying fields page, the SDU tool is not displayed. Instead, a message is displayed that says, “Come back later”.

Resolving the problem
For a 4.6.0 deployment only, you can apply a patch that adds an updated version of the wd-ingestion operator to your deployment. To do so, run the following command:
oc patch wd/wd --type=merge \
--patch='{"spec":{"ingestion":{"image":{"digest":"sha256:eef24fa7d8a43a23adb2db64121d397f985c6994629f0c0a853643f04cf0420a",\
"name":"wd-ingestion","tag":"14.6.0-11038"}}}}'

Mirroring Watson service images fails with an Insufficient Scope error

Applies to: 4.6.0 - 4.6.2

Fixed in: 4.6.3

Problem
When you run the cpd-cli manage mirror-images command, the command fails with an Insufficient Scope error. This problem occurs for the following services:
  • Watson Assistant
  • Watson Discovery
  • Watson Knowledge Studio
  • Watson Speech services

This problem occurs because the command is trying to mirror the images for EDB Postgres Enterprise but you do not have a license for EDB Postgres Enterprise.

Resolving the problem
To mirror the service images to the private container registry:
  1. Run the cpd-cli manage list-images command to download the EDB Postgres CASE package:
    Mirroring images directly to the private container registry
    cpd-cli manage list-images \
    --components=edb_cp4d \
    --target_registry=${PRIVATE_REGISTRY_LOCATION}

    Mirroring images using an intermediary container registry
    cpd-cli manage list-images \
    --components=edb_cp4d \
    --target_registry=127.0.0.1:12443

  2. Replace the EDB Postgres Enterprise images with the EDB Postgres images:
    The cpd-cli uses the default location for the work directory.
    sed -i -e '/edb-postgres-advanced/d' \
    ./cpd-cli-workspace/olm-utils-workspace/work/offline/$VERSION/{component_name}/ibm-cloud-native-postgresql-*-images.csv
    
    Change component_name to the appropriate component name from the following options:
    • watson_assistant
    • watson_discovery
    • watson_ks
    • watson_speech
    To specify more than one component, separate the component names with commas. For example, the following command replaces the Enterprise version for Watson Assistant and Watson Discovery:
    sed -i -e '/edb-postgres-advanced/d' \
    ./cpd-cli-workspace/olm-utils-workspace/work/offline/$VERSION/{watson_assistant,watson_discovery}/ibm-cloud-native-postgresql-*-images.csv

    The cpd-cli uses the CPD_CLI_MANAGE_WORKSPACE environment variable to determine the location of the work directory
    sed -i -e '/edb-postgres-advanced/d' \
    CPD_CLI_MANAGE_WORKSPACE/work/offline/$VERSION/{component_name}/ibm-cloud-native-postgresql-*-images.csv
    Change component_name to the appropriate component name from the following options:
    • watson_assistant
    • watson_discovery
    • watson_ks
    • watson_speech
    To specify more than one component, separate the component names with commas.

Error installing Watson Discovery when pulling images from a Version 4.6.1 private container registry

Error
When you install Watson Discovery by pulling images from a private container registry on Cloud Pak for Data Version 4.6.1, the installation does not complete successfully. When you check the pods and get the etcd pod with a command such as oc -n ${PROJECT_CPD_INSTANCE} get pod | grep wd-discovery-etcd, an ImagePullBackOff error is returned.
Cause
Watson Discovery did not release a 4.6.1 version of the software. Therefore, when you install Watson Discovery on Cloud Pak for Data Version 4.6.1, a 4.6.0 version of the Watson Discovery software is installed. The service defines the etcd operator to use in its custom resource (because the etcd operator doesn't always provide a default image). With the 4.6.1 release, a newer version of etcd is specified. As a result, the older version of the etcd image that is specified for Discovery is not mirrored to the private registry. This missing etcd image results in an error.
Solution
Mirror the etcd image to the private registry by completing the following steps:
  1. Set the following variables:
    export VERSION=4.6.0
    export COMPONENTS=opencontent_etcd
  2. Mirror etcd images to the registry by following the steps in the Mirroring images to a private container registry procedure.
  3. Delete the failed wd-discovery-etcd pods.

    One or more new etcd pods, depending on your deployment type, are started.

  4. Confirm that the etcd pods are running successfully.

Inaccurate status message from command line after upgrade

Problem
If you run the cpd-cli service-instance upgrade command from the Cloud Pak for Data command-line interface, and then use the service-instance list command to check the status of each service, the provision status for the service is listed as UPGRADE_FAILED.
Cause of the problem
When you upgrade the service, only the cpd-cli manage apply-cr command is supported. You cannot use the cpd-cli service-instance upgrade command to upgrade the service. And after you upgrade the service with the apply-cr method, the change in version and status is not recognized by the service-instance command. However, the correct version is displayed from the Cloud Pak for Data web client.
Resolving the problem
No action is required. As long as you use the cpd-cli manage apply-cr method to upgrade the service as documented, the upgrade is successful and you can ignore the version and status information that is generated by the cpd-cli service-instance list command.

UpgradeError is shown after resizing PVC

Error
After you edit the custom resource to change the size of a persistent volume claim for a data store, an error is shown.
Cause
You cannot change the persistent volume claim size of a component by updating the custom resource. Instead, you must change the size of the PVC on the persistent volume claim node after it is created.
Solution
To prevent the error, undo the changes that were made to the YAML file. For more information about the steps to follow to change the persistent volume claim size successfully, see Scaling an existing persistent volume claim size.

Errored state is shown after upgrade

This issue applies only when you upgrade to versions 4.6.0 and 4.6.2.

Error
After you run the cpd-cli manage apply-olm --upgrade=true command to upgrade the service to version 4.6, the ready state shows as Failed with the reason Errored.
Cause
Changes that are specific to the operator between minor versions cause errors during future reconciliation loops. The instance is operational but the operator is unable to complete successfully.
Solution
Complete the upgrade by using the command that updates the operator and operand at the same time, which is the cpd-cli manage apply-cr command.

Disruption of service after upgrade or restart

Error
After an upgrade or restart, one or more pods in the cluster are in an Init state, or are intermittently in a CrashLoopBackOff or Running state.
Cause
The Elasticsearch component uses a quorum of pods to ensure availability when it completes search operations. However, each pod in the quorum must recognize the same pod as the leader of the quorum. The system can run into issues when more than one leader pod is identified.
Solution
To determine if confusion about the quorum leader pod is the cause of the issue, complete the following steps:
  1. Log in to the cluster, and then set the namespace to the project where the Discovery resources are installed.
  2. Check each of the Elasticsearch pod with the role of master to see which pod it identifies as the quorum leader.
    oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'  | while read i; do echo $i; oc exec $i \
    -c elasticsearch -- bash -c 'curl -ksS "localhost:19200/_cat/master?v"'; echo; done
    
    Each pod must list the same pod as the leader.
    For example, in the following result, two different leaders are identified. Pods 1 and 2 identify pod 2 as the leader. However, pod 0 identifies itself as the leader.
    wd-ibm-elasticsearch-es-server-master-0
    id                     host      ip        node
    7q0kyXJkSJirUMTDPIuOHA 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-0
    
    wd-ibm-elasticsearch-es-server-master-1
    id                     host      ip        node
    L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2
    
    wd-ibm-elasticsearch-es-server-master-2
    id                     host      ip        node
    L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2
If you find that more than one pod is identified as the leader, complete the following steps to fix the problem:
  1. Delete all but one of the master Elasticsearch pods, and then wait until new pods are started and become available.
  2. Repeat the check described earlier to find out whether all Elasticsearch pods with the master role identify the same pod as the leader.

RabbitMQ gets stuck in a loop after several installation attempts

Error
After an initial installation or upgrade failure and repeated attempts to retry, the common services RabbitMQ operator pod can get into a CrashLoopBackOff state. For example, the log might include the following types of messages:
"error":"failed to upgrade release: post-upgrade hooks failed: warning: 
Hook post-upgrade ibm-rabbitmq/templates/rabbitmq-backup-labeling-job.yaml 
failed: jobs.batch "{%name}-ibm-rabbitmq-backup-label" already exists"
Cause
Resources for the RabbitMQ operator component must be fully removed before a new installation or upgrade is started. If too many attempts occur in succession, remaining resources can cause new installations to fail.
Solution
  1. Delete the RabbitMQ backup label job from the previous installation or upgrade attempt. Look for the name of the job in the logs. The name ends in -ibm-rabbitmq-backup-label.
    oc delete job {%name}-ibm-rabbitmq-backup-label -n ${PROJECT_CPD_INSTANCE}
  2. Check that the pod returns a Ready state.
    oc get pods -n PROJECT_CPFS_OPS | grep ibm-rabbitmq

MinIO gets stuck in a loop after several installation attempts

Error
The message, Cannot find volume "export" to mount into container "ibm-minio", is displayed during an installation or upgrade of Discovery. When you check the status of the MinIO pods by using the command, oc get pods -l release=wd-minio -o wide, and then check the MinIO operator logs by using the commands, oc get pods -A | grep ibm-minio-operator, and then oc logs -n <namespace> ibm-minio-operator-XXXXX, you see an error that is similar to the following message in the logs:
ibm-minio/templates/minio-create-bucket-job.yaml failed: jobs.batch "wd-minio-discovery-create-bucket" 
already exists) and failed rollback: failed to replace object"
Cause
A job that creates a storage bucket for MinIO and then is deleted after it completes, is not being deleted properly.
Solution
Complete the following steps to check whether an incomplete create-bucket job for MinIO exists. If so, delete the incomplete job so that the job can be recreated and can then run successfully.
  1. Check for the MinIO job by using the following command:
    oc get jobs | grep 'wd-minio-discovery-create-bucket'
  2. If an existing job is listed in the response, delete the job by using the following command:
    oc delete job $(oc get jobs -oname | grep 'wd-minio-discovery-create-bucket')
  3. Verify that all of the MinIO pods start successfully by using the following command:
    oc get pods -l release=wd-minio -o wide

Attempted upgrade from early 4.0.x versions without quiescing

Error
When you check the status of the upgrade, errors are shown and only 8 or so of the 24 components are ready.
Cause
If you upgraded Watson Discovery from version numbers 4.0.2 through 4.0.5 without first quiescing the service, you can run into issues with the upgrade process.
Solution
Complete the following steps to redo the upgrade:
  1. Revert the version of the service to the old version by using the following command:
    oc patch wd wd --type='merge' --patch '{"spec":{"version": "<old_version>"}}'
  2. Apply a temporary patch to modify the application configuration.
  3. Upgrade the service by completing the Upgrading Watson Discovery from Version 4.0.x to Version 4.6 procedure.
    Attention: Be sure to complete the step to quiesce the service, and then check the status of the service before you run the upgrade command. Wait until the QUIESCE status shows QUIESCED.
  4. After the upgrade, be sure to complete the step to stop the quiesce of the service.
  5. Run the following commands to remove the files that were created by the temporary patch.
    oc patch temporarypatch wd-app-config-override-patch \
    --type json --patch '[{ "op": "remove", "path": "/metadata/finalizers" }]'
    oc delete -f app-config-override-patch.yaml
    oc get crd | grep watsondiscovery | cut -d' ' -f1 | xargs -I{} oc annotate \
    --overwrite {} wd oppy.ibm.com/temporary-patches-

Unable to upgrade from 4.0.x to 4.6 successfully

Error

An upgrade from a 4.0.x installation to 4.6 does not complete. The READY column shows False and READYREASON shows Errored and does not resolve.

Cause

Discovery defaults to creating a Development deployment type. However, you can override that default configuration by specifying a deployment type with the spec.shared.deploymentType setting.

In 4.0.x releases, the spec.shared.deploymentType field with a Starter value (which is equivalent to Development) was applied if you did not change it to Production. In a 4.6 installation, when using cpd-cli manage, Discovery sets the spec.shared.deploymentType field to Production to create a Production-ready installation by default.

Deployment types cannot be changed after an initial deployment. They cannot be changed during an upgrade either. If you had a Starter or Development deployment previously, you might have inadvertently created a Production deployment during the upgrade. This configuration mismatch will not work.

If you aren't sure which deployment type was used for your 4.6 upgrade, you can check by completing the following steps:
  1. Run the following command to check the current deployment type:
    oc get WatsonDiscovery wd -ojsonpath='{.spec.shared.deploymentType}'
    If Production is returned, then you need to apply the workaround. If no value is returned, then you might not have specified a value and a Development deployment might be applied (because that is the internal default configuration).
  2. Run the following command to check the number of persistent volume claims (PVCs) that were created by the EDB PostgreSQL instance for your installation:
    oc get pvc -lk8s.enterprisedb.io/cluster=wd-discovery-cn-postgres
    If there is more than one, and the AGE of most recent one aligns with the time of the upgrade, it means that you used a Production deployment type during the upgrade by mistake.
    For example, the following response means the upgrade was configured to create a production installation because there are two PostgreSQL pods:
    NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
    wd-discovery-cn-postgres-1  Bound    pvc-169dd8cf-2c02-452d-8f2e-85ecf3ce31aa   30Gi       RWO            ocs-storagecluster-ceph-rbd   7d1h
    wd-discovery-cn-postgres-2  Bound    pvc-e3056a68-2603-436a-bf01-30057c34ad1a   30Gi       RWO            ocs-storagecluster-ceph-rbd   41h
Solution
To resolve the problem and continue the upgrade, complete the following steps:
  1. Do one of the following things:
    • If Production was returned in the previous procedure, modify the spec.shared.deploymentType field in the custom resource to match the value that was used during the original installation.
      oc patch WatsonDiscovery wd --type=merge \
      --patch='{"spec":{"shared":"deploymentType": "Starter"}}'
      Confirm that the Starter deployment type is returned now by using the following command:
      oc get WatsonDiscovery wd -ojsonpath='{.spec.shared.deploymentType}'
      The Development and Starter types are functionally the same, and both values are accepted by the service.
    • If an empty value was returned in the previous procedure, then the field was not specified in the initial installation. In this case, you must remove the field in the current custom resource. When you do so, the internal default setting of Development will be used, which is what you want in this case.
      To remove the field, enter the following command:
      oc patch WatsonDiscovery wd --type=json \
      --patch='[{"op": "remove", "path": "/spec/shared/deploymentType"}]'
  2. After the patch is applied, verify that EDB PostgreSQL is running with one instance only by using the following command:
    oc get Cluster wd-discovery-cn-postgres
  3. If more than one instance is reported, set PostgreSQL in maintenance mode by using the following command:
    oc patch WatsonDiscovery wd --type=merge \
    --patch='{"spec":{"postgres":{"quiesce":{"enabled": true}}}}'
    PostgreSQL is in maintenance mode when the following command returns true. You might need to wait a few minutes.
    oc get Cluster wd-discovery-cn-postgres \
    -o jsonpath='{.spec.nodeMaintenanceWindow.inProgress}{"\n"}
  4. Remove the additional pods and persistent volume claims (PVCs) that are associated with the instance. You got a list of these PVCs in an earlier step.
    oc delete pod/wd-discovery-cn-postgres-2 pvc/wd-discovery-cn-postgres-2 
  5. After the PVCs are removed, return PostgreSQL to normal operation by using the following command:
    oc patch WatsonDiscovery wd --type=merge \
    --patch='{"spec":{"postgres":{"quiesce":{"enabled": false}}}}'
    PostgreSQL is out of maintenance mode when the following command returns false.
    oc get Cluster wd-discovery-cn-postgres \
    -o jsonpath='{.spec.nodeMaintenanceWindow.inProgress}{"\n"}
  6. Confirm that the state of the cluster is now healthy.
    oc get Cluster wd-discovery-cn-postgres

The upgrade resumes.

Cannot update operators with a dependency on etcd, MinIO, or RabbitMQ

Applies to: Upgrades from Version 4.0.x or 4.5.x to Version 4.6

When you run the cpd-cli manage apply-olm command, the operator for one or more of the following services might get stuck in the Installing phase:

Service MinIO RabbitMQ etcd
IBM® Match 360    
OpenPages®    
Watson Assistant
Watson Discovery
Watson Knowledge Studio  
Watson Speech services  

This issue can occur when the upgrade of the etcd operator, MinIO operator, or RabbitMQ operator fails. These dependencies use Helm-based operators. When the upgrade of a Helm-based operator fails, the failed version is not automatically deleted. If there is insufficient memory to retry the upgrade, the operators encounter an out-of-memory error and the upgrade fails.

Diagnosing the problem
To determine which operator is causing the problem:
  1. If you are upgrading a service that has a dependency on RabbitMQ, check the status of the RabbitMQ operator:
    1. Check the last state of the operator:
      oc get pods -n $PROJECT_CPD_OPS \
        -lapp.kubernetes.io/instance=ibm-rabbitmq-operator \
        -ojsonpath='{.items[*].status.containerStatuses[*].lastState.terminated}'
    2. If the state includes "exitCode":137,..."reason":"OOMKilled", then get the name of the operator:
      oc get csv -n "${PROJECT_CPD_OPS}" \
      -loperators.coreos.com/ibm-rabbitmq-operator.${PROJECT_CPD_OPS}

      You will need the name to resolve the problem. The name has the following format: ibm-rabbitmq-operator.vX.X.X.

  2. If you are upgrading a service that has a dependency on MinIO, check the status of the MinIO operator:
    1. Check the last state of the operator:
      oc get pods -n $PROJECT_CPD_OPS \
        -lapp.kubernetes.io/instance=ibm-minio-operator \
        -ojsonpath='{.items[*].status.containerStatuses[*].lastState.terminated}'
    2. If the state includes "exitCode":137,..."reason":"OOMKilled", then get the name of the operator:
      oc get csv -n "${PROJECT_CPD_OPS}" \
      -loperators.coreos.com/ibm-minio-operator.${PROJECT_CPD_OPS}

      You will need the name to resolve the problem. The name has the following format: ibm-minio-operator.vX.X.X.

  3. If you are upgrading a service that has a dependency on etcd, check the status of the etcd operator:
    1. Check the last state of the operator:
      oc get pods -n $PROJECT_CPD_OPS \
        -lapp.kubernetes.io/instance=ibm-etcd-operator \
        -ojsonpath='{.items[*].status.containerStatuses[*].lastState.terminated}'
    2. If the state includes "exitCode":137,..."reason":"OOMKilled", then get the name of the operator:
      oc get csv -n "${PROJECT_CPD_OPS}" \
      -loperators.coreos.com/ibm-etcd-operator.${PROJECT_CPD_OPS}

      You will need the name to resolve the problem. The name has the following format: ibm-etcd-operator.vX.X.X.

Resolving the problem

If one or more pods are in the CrashLoopBackOff state, complete the following steps to resolve the problem:

  1. Check the current limits and requests for the operator with pods that are in a poor state.

    If all of the operators were stuck, repeat this process for each operator.

    1. Set the OP_NAME environment variable to the name of the operator:
      export OP_NAME=<operator-name>
    2. Check the current limits for the operator:
      oc get csv -n ${PROJECT_CPD_OPS} ${OP_NAME} \
      -ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.memory}'
    3. Check the current requests for the operator:
      oc get csv -n ${PROJECT_CPD_OPS} ${OP_NAME} \
      -ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.requests.memory}'
    4. Choose the appropriate action based on the values returned by the preceding commands:
      • If either the limits or requests are below 1Gi, continue to the next step.
      • If both values are above 1Gi, then the cause of the problem was misdiagnosed. This solution will not resolve the issues you are seeing.
  2. Increase the memory limits and requests for the affected operator.

    If all of the operators are stuck, repeat this process for each operator.

    1. Create a JSON file named patch.json with the following content:
      [
        {
          "op": "replace",
          "path":"/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/requests/memory",
          "value": "1Gi"
        },
        {
          "op": "replace",
          "path":"/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory",
          "value": "1Gi"
        }
      ]
    2. Ensure that the OP_NAME environment variable is set to the correct operator name:
      echo ${OP_NAME}
    3. Patch the operator:
      oc patch csv -n ${PROJECT_CPD_OPS} ${OP_NAME} \
      --type=json --patch="$(cat patch.json)"
    4. Confirm that the patch was successfully applied:
      oc get csv -n ${PROJECT_CPD_OPS} ${OP_NAME} \
      -ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.memory}

      The command should return 1Gi.

    Important: The patch is temporary. The memory settings apply only to the current deployment. The next time you update the operator, the settings are replaced by the default settings.

Unable to modify the resources of the Postgres pods associated with Watson Discovery

This issue was fixed in the 4.6.5 release.

Error

In Watson Discovery version 4.6, the Postgres pod's resource requests or limits can no longer be edited. Previously, a Watson Discovery operator was able to check spec.postgres.database.resources.

Cause
Because the WatsonDiscovery CustomResourceDefinition does not have .spec.postgres.resources defined, modifications to the WatsonDiscovery CR are automatically reverted. This prevents changes from rolling out to the pods.
Solution
Patch the WatsonDiscovery CustomResourceDefinition to include the new field prior to submitting a patch to modify the Postgres pod resources. To verify whether the patch is already applied, run the following command:
oc get CustomResourceDefinition watsondiscoveries.discovery.watson.ibm.com \
  --output jsonpath='{.spec.versions[].schema.openAPIV3Schema.properties.spec.properties.postgres.properties.resources}'

If the patch is already applied, the command returns {"x-kubernetes-preserve-unknown-fields":true}. If nothing returns, run the following command to patch the CustomResourceDefinition:

oc patch CustomResourceDefinition watsondiscoveries.discovery.watson.ibm.com  \
  --type json \
  --patch '[{"op":"add","path":"/spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/postgres/properties/resources","value":{"x-kubernetes-preserve-unknown-fields":true}}]'

Once the patch is applied, rerun the previous oc get command to verify that {"x-kubernetes-preserve-unknown-fields":true} returns. Once the patch has been successfully applied, you can resume modifying the Postgres pod resource configuration.

Watson Discovery installation receives a wd-discovery-haywire error due to an NSX plugin being installed as a CNI plugin on OpenShift

Error
Installation of the Watson Discovery service is pending with the following error on wd-discovery-haywire:
wd-discovery-haywire-56bc476b76-plv84.log
-----
Name: wd-discovery-haywire-56bc476b76-plv84
Container: wd-discovery-haywire
Namespace: cpd4-main-qi1001
Logs: 
		{"@timestamp":"2023-02-28T05:02:24.692Z","message":"Listening on 50,051","logger_name":"com.ibm.watson.wire.notices.Server","thread_name":"main","level":"INFO"}
{"@timestamp":"2023-02-28T05:02:57.307Z","message":"*** shutting down gRPC server since JVM is shutting down","logger_name":"com.ibm.watson.wire.notices.Server$1","thread_name":"Thread-6","level":"INFO"}
{"@timestamp":"2023-02-28T05:02:57.312Z","message":"*** server shut down","logger_name":"com.ibm.watson.wire.notices.Server$1","thread_name":"Thread-6","level":"INFO"}
[INFO  tini (1)] Spawned child process 'java' with pid '7'
[INFO  tini (1)] Main child exited normally (with status '143')
-----
Cause
Installing the NSX plugin as a OpenShift CNI Plug-in can cause the ibm-ngix pod not to access itself through its Service IP/DNS name. As a result, Watson Discovery gateway cannot receive incoming requests.
Solution
Apply a temporary patch to allow nginx in the gateway pod connect the other container without the kubernetes service:
  1. Download the temporary patch wd-gateway-service-patch.zip.
  2. Extract wd-gateway-service-patch.yml from the zip file.
  3. Apply the temporary patch:
    oc apply -f wd-gateway-service-patch.yml
    Wait for the wd-discovery-gateway pod to restart.
  4. Create a Watson Discovery instance in Cloud Pak for Data.

If you would like to remove temporary patch, enter the command:

oc delete temporarypatch.oppy.ibm.com wd-gateway-service-patch
If the command didn't return anything, open another terminal and enter the following commands:
oc patch temporarypatch.oppy.ibm.com wd-gateway-service-patch --type json --patch '[{
              "op": "remove", "path": "/metadata/finalizers" }]'
oc get crd | grep watsondiscovery | cut -d' ' -f 1 | xargs -I{} -t oc annotate {} wd
              --overwrite
    oppy.ibm.com/temporary-patches-

Watson Discovery MinIO pods not starting because quota is applied to the namespace

Applies to: 4.5.3 and later

Problem
The job wd-minio-discovery-create-pvc failed to complete when ResourceQuotas are applied to the namespace. When the job is described with oc describe job wd-minio-discovery-create-pvc, there is a FailedCreate event mentioned failed quota. Example:
Warning  FailedCreate  31m job-controller  Error creating: pods "wd-minio-discovery-create-pvc-6shj8" is forbidden: failed quota: cpd-quota: must specify limits.cpu,limits.memory,requests.cpu,requests.memory
Cause
The MinIO Job cannot start if a ResourceQuota is applied to namespace but a LimitRange is not set due to the Job pod not having resources.requests or resources.limits configured.
Solution
Apply limit range with defaults for limits and requests. Modify the namespace in the following yaml to the namespace where Cloud Pak for Data is installed:
apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-resource-limits
  namespace:   zen   #Change it to the namespace where CPD is installed
spec:
  limits:
  - default:
      cpu: 300m
      memory: 200Mi
    defaultRequest:
      cpu: 200m
      memory: 200Mi
    type: Container

ETCD error when upgrading Watson Discovery from 4.5 to 4.6

Applies to: 4.6.0 and later

Problem
After applying Watson Discovery CR (cpd-cli manage apply-cr .. --version=4.6.x ..), the etcdcluster resource wd-discovery-etcd gets into a failed state due to invalid labels. You can verify this by checking the etcdcluster conditions:
oc -n ${PROJECT_CPD_INSTANCE} get etcdcluster wd-discovery-etcd -o jsonpath="{.status.conditions}" 
---
[{"lastTransitionTime":"2023-03-30T12:45:58Z","message":"Failedtopatchobject:b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"StatefulSet.apps\\\\\"wd-discovery-etcd\\\\\"isinvalid:spec:Forbidden:updatestostatefulsetspecforfieldsotherthan\\'replicas\\',\\'template\\',\\'updateStrategy\\',\\'persistentVolumeClaimRetentionPolicy\\'and\\'minReadySeconds\\'areforbidden\",\"reason\":\"Invalid\",\"details\":{\"name\":\"wd-discovery-etcd\",\"group\":\"apps\",\"kind\":\"StatefulSet\",\"causes\":[{\"reason\":\"FieldValueForbidden\",\"message\":\"Forbidden:updatestostatefulsetspecforfieldsotherthan\\'replicas\\',\\'template\\',\\'updateStrategy\\',\\'persistentVolumeClaimRetentionPolicy\\'and\\'minReadySeconds\\'areforbidden\",\"field\":\"spec\"}]},\"code\":422}\\n'","reason":"Failed","status":"False","type":"Failure"}]
Cause
The ETCD operator does not recreate the statefulset on an immutable field change.
Solution
Manually delete the etcd statefulset to allow the operator to recreate it:
  1. Delete the etcd statefulset:
    oc delete sts wd-discovery-etcd
    statefulset.apps "wd-discovery-etcd" deleted
  2. Wait for the sts to be recreated:
    oc get sts wd-discovery-etcd
    NAME                READY   AGE
    wd-discovery-etcd   0/3     24s
    ---
    oc get etcdcluster wd-discovery-etcd -o jsonpath="{.status.conditions}"
    [{"ansibleResult":{"changed":3,"completion":"2023-05-16T18:57:17.965897","failures":0,"ok":39,"skipped":36},"lastTransitionTime":"2023-05-16T18:56:24Z","message":"Awaiting next reconciliation","reason":"Successful","status":"True","type":"Running"},{"lastTransitionTime":"2023-05-16T18:57:18Z","message":"Last reconciliation succeeded","reason":"Successful","status":"True","type":"Successful"},{"lastTransitionTime":"2023-05-16T18:56:24Z","message":"","reason":"","status":"False","type":"Failure"}]

Retrieving Watson Discovery ElasticSearch PVCs during uninstallation

Applies to: 4.0.x and later

Problem
During uninstallation, the following command fails to retrieve the Watson Discovery ElasticSearch PVCs:
oc get pvc -l 'app.kubernetes.io/name in (wd,discovery)'
Cause
The command fails if the labels changed when other ElasticSearch PVCs exist. The ElasticSearch operator only updates the labels of the ElasticSearch PVCs if other ElasticSearch PVCs exist with a different set of labels compared to the Watson Discovery ones. Those PVCs also include the ibm-es-data label (either set to True or False).
Solution
You can retrieve the Watson Discovery ElasticSearch PVCs by entering the command:
oc get pvc | grep wd-ibm-elasticsearch

Delete any PVCs that are listed.