Known issues and limitations for Watson Discovery

The following known issues and limitations apply to the Watson Discovery service.

After Watson Discovery installation, one or more ranker pods are in CrashLoopBackOff

Applies to: 5.1.0 and later

Error
Ranker master or ranker serve pods, or both of them are in CrashLoopBackOff.
oc get pod|grep ranker
wd-discovery-ranker-master-6dbfb6bcc5-qv6nj                       0/1     CrashLoopBackOff   525 (15s ago)     2d18h
wd-discovery-ranker-master-6dbfb6bcc5-t2xzw                       0/1     Running            515 (3m29s ago)   2d18h
wd-discovery-ranker-monitor-agent-85bcc9bc9-b8kc9                 1/1     Running            0                 4d9h
wd-discovery-ranker-monitor-agent-85bcc9bc9-g4gzq                 1/1     Running            0                 4d9h
wd-discovery-ranker-rest-85bc97fd7d-8qr2w                         1/1     Running            0                 2d18h
wd-discovery-ranker-rest-85bc97fd7d-fzq7h                         1/1     Running            0                 2d18h
wd-discovery-serve-ranker-78bf7696bf-f596r                        2/2     Running            1 (2d18h ago)     2d18h
wd-discovery-serve-ranker-78bf7696bf-mqx8t                        1/2     CrashLoopBackOff   582 (100s ago)    2d18h
Reviewing the pod shows probe timeouts.
oc describe pod <wd-discovery-ranker-masterORserve-pod> -n ${PROJECT_CPD_INST_OPERANDS}
Events:
  Type     Reason     Age                       From     Message
  ----     ------     ----                      ----     -------
  Warning  Unhealthy  79m (x2279 over 6d12h)    kubelet  Readiness probe failed: command timed out
  Warning  Unhealthy  75m (x915 over 6d11h)     kubelet  (combined from similar events): Liveness probe failed: command timed out
  Warning  BackOff    60m (x5356 over 6d12h)    kubelet  Back-off restarting failed container wd-discovery-ranker-master in pod wd-discovery-ranker-master-5c575b659b-5wfqb_cpd-instance-upgrade(c5f5684d-fe4e-4585-aeb6-e6b1e9bf3583)
  Warning  Unhealthy  15m (x1439 over 6d12h)    kubelet  Startup probe failed: command timed out
  Warning  Unhealthy  5m28s (x2451 over 6d12h)  kubelet  Liveness probe failed: command timed out
Cause

Ranker master or ranker serve pods, or both of them did not come up properly after installation.

Solution
  1. Check if ranker master is in CrashLoopBackOff:
    oc describe pod <wd-discovery-ranker-master-pod> -n ${PROJECT_CPD_INST_OPERANDS}
  2. Apply the following patch command to increase the resources on ranker master pod.

    oc patch wd wd --type=merge --patch='{"spec": {"wire": {"rankerMaster": {"resources":{"requests":{"cpu":"1","memory":"3000Mi"},"limits":{"cpu":"1","memory":"3000Mi"}}}}}}'
  3. Check if ranker serve pod is in CrashLoopBackOff.
    oc describe pod <wd-discovery-ranker-serve-pod> -n ${PROJECT_CPD_INST_OPERANDS}
  4. Apply the following patch command to increase the resources on ranker serve pod.
    oc patch wd wd --type=merge --patch='{"spec": {"mm": {"mmSideCar": {"resources":{"requests":{"cpu":"2","memory":"2000Mi"},"limits":{"cpu":"2","memory":"2000Mi"}}}}}}'
  5. Perform the following steps to increase the pods probe timeout on ranker serve pod.
    1. Save the following patch as serve-ranker-patch.yaml.
      apiVersion: oppy.ibm.com/v1
      kind: TemporaryPatch
      metadata:
       name: serve-ranker-patch
      spec:
       apiVersion: discovery.watson.ibm.com/v1
       kind: WatsonDiscovery
       name: wd
       patchType: patchStrategicMerge
       patch:
         wire:
           cr:
             spec:
               wire:
                 serveRanker:
                   mmRuntime:
                     livenessProbe:
                       initialDelaySeconds: 180
                       periodSeconds: 60
                       timeoutSeconds: 45
                     readinessProbe:
                       initialDelaySeconds: 180
                       periodSeconds: 60
                       timeoutSeconds: 45
                     startupProbe:
                       initialDelaySeconds: 120
                       periodSeconds: 60
                       timeoutSeconds: 45
      
    2. Apply the patch using the following command:
      oc apply -f serve-ranker-patch.yaml
      

Watson Discovery is not accessible after upgrading

Applies to: 5.1.0 and later

Error

After upgrading to 5.1.0 or later, the Watson Discovery service is healthy, but not accessible from IBM Cloud Pak for Data. This error is likely to occur if the service was originally installed as Version 4.0.x.

Cause

This issue is caused due to a version parameter in the spec.

Solution
  1. Check the zen extensions to ensure the following are present:
    oc get zenextensions/discovery-gw-addon zenextensions/wd-discovery-watson-gateway-gw-instance
    If these zen extensions are missing, there might be an override in the Watson Discovery CR.
  2. Verify the version of Watson Gateway currently in use.

    oc get watsongateways
    NAME                          VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   AGE
    wd-discovery-watson-gateway   main   True    Stable        False      Stable           117d
  3. If the result shows the version as main, perform the following steps:
    1. Edit the Watson Discovery CR.
      oc edit wd wd
    2. Locate the relevant version line in the spec.
      Gateway:
        size: small
        version: main
    3. Remove version: main, then save and exit.
    4. Monitor the gateway object for changes. This can take several minutes. After it shows as using the default version, attempt to access Watson Discovery through IBM Cloud Pak for Data.
      oc get watsongateways
      The response should appear similar to:
      NAME                          VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   AGE
      wd-discovery-watson-gateway   default   True    Stable        False      Stable           19d

Upgrading from Version 4.8.6 or earlier to Version 5.1.0 does not complete

Applies to: 5.1.0

Fixed in: 5.1.1

Error

When upgrading Watson Discovery from Version 4.8.6 or earlier to Version 5.1.0, the upgrade does not complete and the status remains in the InProgress state.

Cause

This error is caused by a parameter setting in PostgreSQL.

Solution
  1. Enable superuser access on the source cluster:
    oc patch cluster.postgresql wd-discovery-cn-postgres --type merge --patch '{"spec": {"enableSuperuserAccess": true}}'
  2. Delete the new PostgreSQL CR so that the Watson Discovery operator can recreate it:
    oc delete cluster.postgresql wd-discovery-cn-postgres16

Failed to restore elasticsearch data in Watson Discovery

Applies to: 4.8.x and later

Error
After running OADP restore, sometimes Elasticsearch data is not restored in Watson Discovery. If this problem occurs, you can see an error message in the CPD-CLI*.log log file under cpd-cli-workspace/logs directory, for example:
"[cloudpak:cloudpak_snapshot_2024-09-01-15-07-58/COvZbNZfTgGYBZ7OfSfOfA] cannot restore index [.ltrstore] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
Cause

This error is caused when an index .ltrstore is created by deployment/wd-discovery-training-crud before restoring the back up data.

Solution
  1. Go to the PROJECT_CPD_INST_OPERANDS namespace:
    oc project ${PROJECT_CPD_INST_OPERANDS}
  2. Get an Elasticsearch pod name:
    pod=$(oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd --field-selector=status.phase=Running  -o jsonpath='{.items[0].metadata.name}')
  3. Note the number of replicas of deployment/wd-discovery-training-crud:
    oc get deployment wd-discovery-training-crud -o jsonpath='{.spec.replicas}'
  4. Scale down deployment/wd-discovery-training-crud:
    oc patch wd wd --type merge --patch '{"spec": {"wire": {"trainingCrud": {"trainingCrudReplicas": 0}}}}'
  5. Delete .ltstore index:
    oc exec $pod -c elasticsearch -- bash -c 'curl -XDELETE -s -k -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/.ltrstore"'
  6. Get the snapshot name that includes the data of Watson Discovery:
    oc exec $pod -c elasticsearch -- bash -c 'curl -XGET -s -k -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/_cat/snapshots/cloudpak?h=id&s=end_epoch"'
    The command output indicates the latest snapshot name, for example:
    cloudpak_snapshot_2024-09-01-15-07-58
  7. Restore using the snapshot (replace <snapshot-name> with your snapshot name):
    oc exec $pod -c elasticsearch -- bash -c 'curl -XPOST -s -k -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/_snapshot/cloudpak/<snapshot-name>/_restore"'
  8. Scale deployment/wd-discovery-training-crud up to its original state:
    oc patch wd wd --type merge --patch '{"spec": {"wire": {"trainingCrud": {"trainingCrudReplicas": <number-of-original-replicas>}}}}'

Custom resources are not accessible from the Teach domain concepts section after upgrading

Applies to: Upgrading from 4.7.1 and 4.7.2 to any later version

Error

In rare cases, a resource clean-up job might invalidate resources in certain projects when upgrading Watson Discovery. Invalidated resources lead to issues such as dictionaries and entity extractors not being accessible from the Teach domain concepts section of the Improvement tools panel on the Improve and customize page.

Cause

An issue with the resource clean-up job in 4.7.1 and 4.7.2 invalidates the project resources, resulting in this issue.

Solution
Scale down the wd-cnm-api pod before upgrading Watson Discovery from 4.7.1 and 4.7.2.
oc -n ${namespace} patch wd wd --type=merge --patch '{"spec": {"cnm": {"apiServer": {"replicas": 0}}}}'
After completing the upgrade process, either scale up the pod to its default value or scale the pod to a specific number of replicas instead of the default value.
To scale up the pod to its default value, run the following command:
oc -n ${namespace} patch wd wd --type=merge --patch '{"spec": {"cnm": {"apiServer": {"replicas": 1}}}}'
oc -n ${namespace} patch wd wd --type=json --patch '[{"op":"remove","path":"/spec/cnm"}]'
To scale the pod to a specific number of replicas, run the following command:
oc -n ${namespace} patch wd wd --type=merge --patch '{"spec": {"cnm": {"apiServer": {"replicas": 1}}}}'
oc -n ${namespace} patch wd wd --type=merge --patch "{\"spec\": {\"cnm\": {\"apiServer\": {\"replicas\": ${num_of_replicas}}}}}"

During shutdown the DATASTOREQUIESCE field does not update

Applies to: 5.1.0 and later

Error

After successfully executing the cpd-cli manage shutdown command, the DATASTOREQUIESCE state in the Watson Discovery resource is stuck in QUIESCING:

# oc get WatsonDiscovery wd -n "${PROJECT_CPD_INST_OPERANDS}"
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE    DATASTOREQUIESCE   AGE
wd     4.7.3     True    Stable        False      Stable           24/24      24/24      QUIESCED   QUIESCING       16h
Cause

Due to the way quiescing Postgres works, the Postgres pods are still running in background. This results in the metadata not updating in the Watson Discovery resource.

Solution
There is no fix for this. However, the state being stuck in QUIESCING does not affect the Watson Discovery operator.

Upgrade fails due to existing Elasticsearch 6.x indices

Applies to: 5.1.0 and later

Error
If the existing Elasticsearch cluster has indices created with Elasticsearch 6.x, then upgrading Watson Discovery to Version 5.0.0 and later fails.
> oc get wd wd
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
wd     4.8.0     False   InProgress    True       VerifyWait       2/24       1/24       NOT_QUIESCED   NOT_QUIESCED       63m
Cause
Watson Discovery checks for existence of deprecated version of indices in the Elasticsearch cluster when upgrading to Version 5.0.0 and later.
Solution
To determine whether existing Elasticsearch 6.x indices are the cause of the upgrade failure, verify the log of the wd-discovery-es-detect-index pod using the following command:
> oc logs -l app=es-detect-index --tail=-1
If an Elasticsearch 6.x index is found, the following content is displayed in the log:
> oc logs -l app=es-detect-index --tail=-1
Checking connection to Elastic endpoint
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   569  100   569    0     0  28450      0 --:--:-- --:-{:-- --:--:--     0
  "name" : "wd-ibm-elasticsearch-es-server-client-0",
  "cluster_name" : "es-cluster",
  "cluster_uuid" : "XHm71iR_REu0VzbM16BRgg",
  "version" : {
    "number" : "7.10.2-SNAPSHOT",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
    "build_date" : "2023-10-22T21:59:42.077083382Z",
    "build_snapshot" : true,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
-:-- --:--:-- 28450
Retrieve list of indexes
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   357  100   357    0     0   2811      0 --:--:-- --:--:-- --:--:--  2811
Checking for ElasticSearch 6 index
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2582  100  2582    0     0  95629      0 --:--:-- --:--:-- --:--:-- 95629
ElasticSearch 6 index found. Failing job

To upgrade, you must reindex all Elasticsearch 6.x indices to Elasticsearch 7.x indices by running a script.

To reindex from Elasticsearch 6.x to Elasticsearch 7.x, complete the following steps:
  1. Go to the watson-developer-cloud/doc-tutorial-downloads GitHub repository and download the reindex_es6_indices.sh script.
  2. Make the script an executable file.
    > chmod +x ./reindex_es6_indices.sh
  3. Copy the script from your local directory to the wd-ibm-elasticsearch-es-server-data-0 pod of the cluster.
    > oc cp -c elasticsearch ./reindex_es6_indices.sh wd-ibm-elasticsearch-es-server-data-0:/tmp/ 
  4. Use the exec command for the wd-ibm-elasticsearch-es-server-data-0 pod and run the script to reindex.
    > oc exec -c elasticsearch  wd-ibm-elasticsearch-es-server-data-0 -- bash -c "/tmp/reindex_es6_indices.sh"
    After reindexing is successful, the following content is displayed in the log:
    > oc exec -c elasticsearch  wd-ibm-elasticsearch-es-server-data-0 -- bash -c "/tmp/reindex_es6_indices.sh"
    Checking status of ElasticSearch
    Getting index list
    Total number of indices: 245
    [1 / 245] ElasticSearch 6 index found: 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
    ----------------------------
    Updating index - 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations ...
    Generating new settings
    Removing unnecessary settings
    Getting mappings
    Remove existing index : 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    Removing index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    {"acknowledged":true}
    Creating new index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    {"acknowledged":true,"shards_acknowledged":true,"index":"6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new"}
    Executing reindex index to 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    Reindex task ID: MF8B0SsSSXWZwPYnS4wxCQ:225874
    Reindexed: 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    Removing index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
    {"acknowledged":true}
    Setting index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new to read-only
    {"acknowledged":true}
    Renaming index from 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new to 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
    {"acknowledged":true,"shards_acknowledged":true,"index":"6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations"}
    Unsetting index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new to read-only
    {"acknowledged":true}
    Unsetting index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations to read-only
    {"acknowledged":true}
    Removing index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    {"acknowledged":true}
    ----------------------------
    [2 / 245] ElasticSearch 6 index found: ecadd1ee-d025-845b-0000-017b2c281668_notice
    ...
    Completed!
    After the Elasticsearch 6.x indices are reindexed to Elasticsearch 7.x indices, the upgrade should continue and finish successfully.
    > oc get wd
    NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
    wd     4.8.0     True    Stable        False      Stable           24/24      24/24      NOT_QUIESCED   NOT_QUIESCED       82m
    
Contact IBM Support if the Elasticsearch cluster or reindexing to Elasticsearch 7.x fails, such as in the following cases:
  • When checking the logs of the wd-discovery-es-detect-index pod, if indices other than Elasticsearch 6.x or Elasticsearch 7.x are found, the following content is displayed in the log:
    > oc logs -l app=es-detect-index --tail=-1
    Checking connection to Elastic endpoint
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   569  100   569    0     0  28450      0 --:--:-- --:-{:-- --:--:--     0
      "name" : "wd-ibm-elasticsearch-es-server-client-0",
      "cluster_name" : "es-cluster",
      "cluster_uuid" : "XHm71iR_REu0VzbM16BRgg",
      "version" : {
        "number" : "7.10.2-SNAPSHOT",
        "build_flavor" : "oss",
        "build_type" : "tar",
        "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
        "build_date" : "2023-10-22T21:59:42.077083382Z",
        "build_snapshot" : true,
        "lucene_version" : "8.7.0",
        "minimum_wire_compatibility_version" : "6.8.0",
        "minimum_index_compatibility_version" : "6.0.0-beta1"
      },
      "tagline" : "You Know, for Search"
    }
    -:-- --:--:-- 28450
    Retrieve list of indexes
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   357  100   357    0     0   2811      0 --:--:-- --:--:-- --:--:--  2811
    Checking for ElasticSearch 6 index
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  2582  100  2582    0     0  95629      0 --:--:-- --:--:-- --:--:-- 95629
    Unidentified index found. Please verify
  • When checking the logs of the wd-discovery-es-detect-index pod, if a connection to the Elasticsearch cluster is not established, the following content is displayed in the log:
    > oc logs -l app=es-detect-index --tail=-1
    Checking connection to Elastic endpoint
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to wd-ibm-elasticsearch-srv.zen port 443: Connection refused
    Unable to connect. Please check Elastic
  • When reindexing starts, but is unsuccessful, the following content is displayed in the log:
    > oc exec -c elasticsearch  wd-ibm-elasticsearch-es-server-data-0 -- bash -c "/tmp/reindex_es6_indices.sh"
    Checking status of ElasticSearch
    Getting index list
    Total number of indices: 247
    [1 / 247] ElasticSearch 6 index found: 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
    ...
    [49 / 247] ElasticSearch 6 index found: ecadd1ee-d025-845b-0000-017a3722ef7f
    ----------------------------
    Updating index - ecadd1ee-d025-845b-0000-017a3722ef7f ...
    Generating new settings
    Removing unnecessary settings
    Getting mappings
    Remove existing index : ecadd1ee-d025-845b-0000-017a3722ef7f_new
    Removing index ecadd1ee-d025-845b-0000-017a3722ef7f_new
    {"acknowledged":true}
    Creating new index ecadd1ee-d025-845b-0000-017a3722ef7f_new
    {"acknowledged":true,"shards_acknowledged":true,"index":"ecadd1ee-d025-845b-0000-017a3722ef7f_new"}
    Executing reindex index to ecadd1ee-d025-845b-0000-017a3722ef7f_new
    Reindex task ID: MF8B0SsSSXWZwPYnS4wxCQ:182680
    In Progress: reindex from [ecadd1ee-d025-845b-0000-017a3722ef7f] to [ecadd1ee-d025-845b-0000-017a3722ef7f_new][_doc]
    In Progress: reindex from [ecadd1ee-d025-845b-0000-017a3722ef7f] to [ecadd1ee-d025-845b-0000-017a3722ef7f_new][_doc]
    Failed to reindex: ecadd1ee-d025-845b-0000-017a3722ef7f_new
    {
      "took": 299943,
      "timed_out": false,
      "total": 110237,
      "updated": 0,
      "created": 48998,
      "deleted": 0,
      "batches": 49,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled": "0s",
      "throttled_millis": 0,
      "requests_per_second": -1.0,
      "throttled_until": "0s",
      "throttled_until_millis": 0,
      "failures": [
        {
          "index": "ecadd1ee-d025-845b-0000-017a3722ef7f_new",
          "type": "_doc",
          "id": "bc670579c33c9d2644dceef7ac94c249b96c568a9e79b0d1e6bbe2349ae371f9",
          "cause": {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse",
            "caused_by": {
              "type": "stream_constraints_exception",
              "reason": "String length (5046272) exceeds the maximum length (5000000)"
            }
          },
          "status": 400
        },
        {
          "index": "ecadd1ee-d025-845b-0000-017a3722ef7f_new",
          "type": "_doc",
          "id": "8f4a27f149a93fead6852695290cc079635ea8a1d190616adcb8bfdafba09450",
          "cause": {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse",
            "caused_by": {
              "type": "stream_constraints_exception",
              "reason": "String length (5046272) exceeds the maximum length (5000000)"
            }
          },
          "status": 400
        }
      ]
    }
    Error: Please contact support. Do not run this scripts again.
    command terminated with exit code 1

UpgradeError is shown after resizing PVC

Applies to: 5.1.0 and later

Error
After you edit the custom resource to change the size of a persistent volume claim for a data store, an error is shown.
Cause
You cannot change the persistent volume claim size of a component by updating the custom resource. Instead, you must change the size of the PVC on the persistent volume claim node after it is created.
Solution
To prevent the error, undo the changes that were made to the YAML file. For more information about the steps to follow to change the persistent volume claim size successfully, see Scaling an existing persistent volume claim size.

Disruption of service after upgrading, restarting, or scaling by updating scaleConfig

Applies to: 5.1.0 and later

Error
After upgrading, restarting, or scaling Watson Discovery by updating the scaleConfig parameter, the Elasticsearch component might become non-functional, resulting in disruption of service and data loss.
Cause
The Elasticsearch component uses a quorum of pods to ensure availability when it completes search operations. However, each pod in the quorum must recognize the same pod as the leader of the quorum. The system can run into issues when more than one leader pod is identified.
Solution
To determine if confusion about the quorum leader pod is the cause of the issue, complete the following steps:
  1. Log in to the cluster, and then set the namespace to the project where the Discovery resources are installed.
  2. Check each of the Elasticsearch pod with the role of master to see which pod it identifies as the quorum leader.
    oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'  | while read i; do echo $i; oc exec $i \
    -c elasticsearch -- bash -c 'curl -ksS "localhost:19200/_cat/master?v"'; echo; done
    
    Each pod must list the same pod as the leader.
    For example, in the following result, two different leaders are identified. Pods 1 and 2 identify pod 2 as the leader. However, pod 0 identifies itself as the leader.
    wd-ibm-elasticsearch-es-server-master-0
    id                     host      ip        node
    7q0kyXJkSJirUMTDPIuOHA 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-0
    
    wd-ibm-elasticsearch-es-server-master-1
    id                     host      ip        node
    L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2
    
    wd-ibm-elasticsearch-es-server-master-2
    id                     host      ip        node
    L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2

If you find that more than one pod is identified as the leader, contact IBM Support.

Limitations

The following limitations apply to the Watson Discovery service:
  • Formulas that are embedded as images, especially those containing division bars (horizontal fractions) or other complex notations, are not reliably recognized or extracted by Watson Discovery. As a result, these formulas might be omitted, misinterpreted, or rendered incorrectly in the extracted output. This limitation stems from how the SDU pipeline handles embedded images, and currently affects all versions of Watson Discovery that use SDU.
  • The service supports single-zone deployments; it does not support multi-zone deployments.
  • You cannot upgrade the Watson Discovery service by using the service-instance upgrade command from the Cloud Pak for Data command-line interface.
  • You cannot use the Cloud Pak for Data OpenShift® APIs for Data Protection (OADP) backup and restore utility to do an offline backup and restore the Watson Discovery service. Online backup and restore with OADP is available.