Limitations and known issues in Watson Discovery

The following limitations and known issues apply to the Watson Discovery service.

Refresh 9 of Cloud Pak for Data Version 4.0

Operand version: 4.0.9

Discovery generates a partial failure status message for the Cloud Pak for Data OADP backup and restore utility.

Error: When you check the status of the OADP backup utility after using it to backup a cluster where Discovery is installed, a Phase: PartiallyFailed message is displayed. One or more Discovery components are included in the Failed list.
Cause: Discovery cannot be backed up and restored by using the OADP backup and restore utility. When the Discovery service is present, and an administrator backs up an entire Cloud Pak for Data instance, a status message is displayed that indicates a partial failure. This status is displayed because the persistent volume claims (PVCs) for Discovery are not backed up. However, the message does not impact the back up of the rest of the services.
Solution: No action is required to resolve the status message. You can remove the persistent volume claims that are associated with the Discovery service separately. After using the scripts to back up your Discovery service data, you can follow the step that is documented in the uninstall instructions for the Discovery service to delete the PVCs. For more information about how to remove the PVC associated with Discovery, see Uninstalling the Discovery service.

This issue exists in refresh versions 8 through 9.

Machine configuration pool is stuck because it cannot evict a pod

Error: During an action that requires nodes to be drained, such as an upgrade, the machine configuration pool reports that scheduling is disabled for one of the worker nodes. If you debug further, you learn that one node is unscheduled because the system failed to drain node.
Cause: Watson Discovery uses a single instance of EDB PostgreSQL on Starter installations. If you have a Starter deployment type and did not define a maintenance window, this error can occur because the primary PostgreSQL pod is protected by a PodDisruptionBudget configuration setting. As a result, the pod cannot be evicted automatically during upgrade.
Solution: Set the PostrgreSQL pod to maintenance mode before you upgrade the service.
Note: The service is unavailable while the pod is in maintenance mode.
Complete the following steps:
1. Make sure that the PostgreSQL pods on the PostgreSQL cluster are operational and healthy. To check, enter the following command:
```
oc cnp status wd-discovery-cn-postgres
```
  Also check that the PostgreSQL pods are running and do not keep restarting when you enter the following command:
```
oc get pods | grep wd-discovery-cn-postgres
```
2. Set the PostgreSQL service in maintenance mode with the following command:
```
oc patch WatsonDiscovery wd --type=merge \
--patch='{"spec":{"postgres":{"quiesce":{"enabled": true}}}}'
```
3. Verify that the PostgreSQL service is in maintenance mode. The following command must return true.
```
oc get cluster wd-discovery-cn-postgres \
-o jsonpath='{.spec.nodeMaintenanceWindow.inProgress}{"\n"}'
```
4. Perform the action that requires the nodes to be drained, one at a time, such as an upgrade.
  If, while draining, a node cannot evict a PostgreSQL pod that is running on it, delete the respective PostgreSQL pod by using the following command:
```
oc delete pod wd-discovery-cn-postgres-<n>
```
  Check that the deleted PostgreSQL pod gets created in another worker-node that is up and running. If this newly created PostgreSQL pod is created on a node that is yet to be drained, you might need to do delete the pod again when the time comes for the node to be drained.
5. Revert change to the PostgreSQL service to take it out of maintenance mode.
```
oc patch WatsonDiscovery wd --type=merge \
--patch='{"spec":{"postgres":{"quiesce":{"enabled": false}}}}'
```
6. Verify that the PostgreSQL service is out of maintenance mode. The following command must return false.
```
oc get cluster wd-discovery-cn-postgres \
-o jsonpath='{.spec.nodeMaintenanceWindow.inProgress}{"\n"}'
```

This issue exists in all refresh versions.

Error when quiescing the data stores

Error: When quiescing the data stores, the following error is displayed:

The task includes an option with an undefined variable. 
The error was: `ScheduleName` is undefined.

Cause: The operator cannot change or set the value of the schedulerName field because, although it exists in the etcdcluster and Statefulset/wd-discovery-etcd pods, it is not explicitly defined.
Solution: Apply a patch that defines the schedulerName field so that the operator can set or change the value of the field successfully.
Important: When you apply this patch, any running etcd pods are restarted.
Run the following command to apply the patch:
```
oc patch etcdclusters wd-discovery-etcd --type merge \
-p '{"spec":{"schedulerName":"default-scheduler"}}'
```

This issue exists in all refresh versions.

Refresh 8 of Cloud Pak for Data Version 4.0

Operand version: 4.0.8

The wd-discovery-multi-tenant-migration job fails if anyone besides a system administrator performs the migration.

Error: When you upgrade to version 4.0.8 with a user ID other than admin, the migration job fails.
Cause: The migration script assumes that the script is run by a user with the admin user ID.
Solution: Apply a patch that allows the migration to be successful. Complete the following steps:
1. From the Cloud Pak for Data web client, get the user ID of the owner of the instance that you want to upgrade.
2. Download the wd-migration-uid-patch.zip patch file from the Watson Developer Cloud GitHub repository.
3. Extract the wd-migration-uid-patch.yaml file from the archive file, and then open it in a text editor.
4. Replace the <user_id> variable with the user ID of the owner of the instance that you want to upgrade.
5. Run the following command in a terminal that is logged in to the cluster:
```
oc create -f wd-migration-uid-patch.yaml
```
6. Delete the previous migration job by using the following command:
```
oc delete job wd-discovery-multi-tenant-migration
```
After the job is deleted, the migration job restarts and the migration resumes.

This issue exists in refresh versions 6 through 8.

Discovery generates a partial failure status message for the Cloud Pak for Data OADP backup and restore utility.

Error: When you check the status of the OADP backup utility after using it to backup a cluster where Discovery is installed, a Phase: PartiallyFailed message is displayed. One or more Discovery components are included in the Failed list.
Cause: Discovery cannot be backed up and restored by using the OADP backup and restore utility. When the Discovery service is present, and an administrator backs up an entire Cloud Pak for Data instance, a status message is displayed that indicates a partial failure. This status is displayed because the persistent volume claims (PVCs) for Discovery are not backed up. However, the message does not impact the back up of the rest of the services.
Solution: No action is required to resolve the status message. You can remove the persistent volume claims that are associated with the Discovery service separately. After using the scripts to back up your Discovery service data, you can follow the step that is documented in the uninstall instructions for the Discovery service to delete the PVCs. For more information about how to remove the PVC associated with Discovery, see Uninstalling the Discovery service.

This issue exists in refresh versions 8 through 9.

Refresh 7 of Cloud Pak for Data Version 4.0

Operand version: 4.0.7

The Deployed status of resources fluctuates after the 4.0.7 upgrade is completed.

Error: When you check the status by submitting the oc get WatsonDiscovery command, the ready status of the resources toggles between showing 23/23 and 20/23 components as being ready for use.
Cause: The readiness state of the resources is not reported consistently after a migration.

Solution: To manually refresh the status information, run the following commands in a terminal that is logged in to the cluster:

# Creates a proxy server between localhost and the Kubernetes API server and runs in the background
oc proxy &
# Clear the status from the WatsonDiscovery Operand (<namespace> must be set to the namespace where Discovery is installed)
curl -ksS -X PATCH -H "Accept: application/json, */*" \
-H "Content-Type: application/merge-patch+json" \
http://127.0.0.1:8001/apis/discovery.watson.ibm.com/v1/namespaces/<namespace>/watsondiscoveries/wd/status \
--data '{"status": null}'

Discovery generates an error in the Cloud Pak for Data OADP backup and restore utility.

Error: The utility does not complete successfully and the following message is written to the log: preBackupViaConfigHookRule on backupconfig/watson-discovery in namespace cpd (status=error).
Cause: Discovery cannot be backed up and restored by using the OADP backup and restore utility. When the Discovery service is present, and an administrator attempts to backup an entire Cloud Pak for Data instance, Discovery prevents the utility from completing successfully.
Solution: Apply a patch that stops Discovery from preventing the utility from completing successfully.
To apply the patch, complete the following steps:
1. Download the wd-aux-br-patch.zip file from the Watson Developer Cloud Github repository.
2. Extract the wd-aux-br-patch.yaml file from the ZIP file.
3. Run the following command in a terminal that is logged in to the cluster:
```
oc create -f wd-aux-br-patch.yaml
```

This issue exists in refresh versions 2 through 7.

The wd-discovery-multi-tenant-migration job fails if anyone besides a system administrator performs the migration.

Error: When you upgrade to version 4.0.8 with a user ID other than admin, the migration job fails.
Cause: The migration script assumes that the script is run by a user with the admin user ID.
Solution: Apply a patch that allows the migration to be successful. Complete the following steps:
1. From the Cloud Pak for Data web client, get the user ID of the owner of the instance that you want to upgrade.
2. Download the wd-migration-uid-patch.zip patch file from the Watson Developer Cloud GitHub repository.
3. Extract the wd-migration-uid-patch.yaml file from the archive file, and then open it in a text editor.
4. Replace the <user_id> variable with the user ID of the owner of the instance that you want to upgrade.
5. Run the following command in a terminal that is logged in to the cluster:
```
oc create -f wd-migration-uid-patch.yaml
```
6. Delete the previous migration job by using the following command:
```
oc delete job wd-discovery-multi-tenant-migration
```
After the job is deleted, the migration job restarts and the migration resumes.

Refresh 6 of Cloud Pak for Data Version 4.0

Operand version: 4.0.6

Upgrade to 4.0.6 fails if no Discovery instance is provisioned in the existing cluster before you begin the upgrade process.

Error: The 4.0.6 upgrade process assumes that a Watson Discovery instance is provisioned in the existing cluster. For example, if you are upgrading from 4.0.5 to 4.0.6, you must have an instance provisioned in the 4.0.5 cluster before you begin the migration.
Cause: The current code returns an error when no instance exists because it cannot find a document index to migrate.
Solution: Verify that an instance of Watson Discovery has been provisioned in the existing Cloud Pak for Data cluster before you start the upgrade to 4.0.6. If you tried to upgrade to 4.0.6, but no instances were provisioned and the migration failed, remove the existing installation and install 4.0.6 from scratch.

The Deployed status of resources fluctuates after the 4.0.7 upgrade is completed.

Error: When you check the status by submitting the oc get WatsonDiscovery command, the ready status of the resources toggles between showing 23/23 and 20/23 components as being ready for use.
Cause: The readiness state of the resources is not reported consistently after a migration.

Solution: Typically, the instance is ready for use despite the ready state instability. The ready state settles after approximately 5 hours. You can wait for the readiness state to consistently show 23/23 or you can manually refresh the status information by running the following commands in a terminal that is logged in to the cluster:

# Creates a proxy server between localhost and the Kubernetes API server and runs in the background
oc proxy &
# Clear the status from the WatsonDiscovery Operand (<namespace> must be set to the namespace where Discovery is installed)
curl -ksS -X PATCH -H "Accept: application/json, */*" \
-H "Content-Type: application/merge-patch+json" \
http://127.0.0.1:8001/apis/discovery.watson.ibm.com/v1/namespaces/<namespace>/watsondiscoveries/wd/status \
--data '{"status": null}'

The wd-discovery-multi-tenant-migration job fails if anyone besides a system administrator performs the migration.

Error: When you upgrade to version 4.0.8 with a user ID other than admin, the migration job fails.
Cause: The migration script assumes that the script is run by a user with the admin user ID.
Solution: Apply a patch that allows the migration to be successful. Complete the following steps:
1. From the Cloud Pak for Data web client, get the user ID of the owner of the instance that you want to upgrade.
2. Download the wd-migration-uid-patch.zip patch file from the Watson Developer Cloud GitHub repository.
3. Extract the wd-migration-uid-patch.yaml file from the archive file, and then open it in a text editor.
4. Replace the <user_id> variable with the user ID of the owner of the instance that you want to upgrade.
5. Run the following command in a terminal that is logged in to the cluster:
```
oc create -f wd-migration-uid-patch.yaml
```
6. Delete the previous migration job by using the following command:
```
oc delete job wd-discovery-multi-tenant-migration
```
After the job is deleted, the migration job restarts and the migration resumes.

For more information about earlier releases, see Known issues in the product documentation.