Known issues and limitations for IBM Cloud Pak for Data

Important: IBM Cloud Pak® for Data Version 4.7 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.7 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

The following issues apply to IBM Cloud Pak for Data.

Each issue includes information about the releases that it applies to. If the issue was fixed in a refresh, that information is also included.

Customer-reported issues
General issues
Installation and upgrade issues
Security issues
Backup and restore issues (all methods)
Online backup and restore with IBM Storage Fusion issues
Online backup and restore with the OADP backup and restore utility issues
Offline backup and restore with the OADP backup and restore utility issues
Cloud Pak for Data API issues
Service issues

Customer-reported issues

Issues that are found after the release are posted on the IBM Support site.

General issues

Services with a dependency on Db2 as a service crash due to an npm EACCES error
Intermittent login issues when Cloud Pak for Data is integrated with the Identity Management Service
The create-rsi-patch command fails
Common core services is not aligned on the New diagnostics job page
Critical alerts might appear on the home page after installation
Elasticsearch pods shut down when they reach their ephemeral storage
Storage volume pods cannot start when the persistent volume has a lot of files

Services with a dependency on Db2 as a service crash due to an `npm EACCES` error

If you have one of the following services installed on Cloud Pak for Data, you might see deployments returning CrashLoopBackOff status:

Db2®
Db2 Warehouse
OpenPages®
Watson Knowledge Catalog

Symptoms

The zen-databases deployment returns the CrashLoopBackOff status when you try to complete one of the following procedures:
- Install the service
- Upgrade the service
- Restore from an offline backup
- Restart a zen-database deployment
To check the status of the zen-database deployments, run the following command:
```
oc get pods -n=${PROJECT_CPD_INST_OPERATORS} -l component=zen-databases
```

For any pods that are in the CrashLoopBackOff state, check the pod logs:

oc logs <pod-name> -n=${PROJECT_CPD_INST_OPERATORS} | grep "npm ERR!"

Look for the following error:

npm ERR! code EACCES
npm ERR! syscall mkdir
npm ERR! path /.npm
npm ERR! errno -13
npm ERR!
npm ERR! Your cache folder contains root-owned files, due to a bug in
npm ERR! previous versions of npm which has since been addressed.
npm ERR!
npm ERR! To permanently fix this problem, please run:
npm ERR!   sudo chown -R 1000700000:0 "/.npm"

Resolving the problem

To resolve the problem, run the following command:

oc patch deployment zen-databases \
-n=${PROJECT_CPD_INST_OPERATORS} \
--type='json' \
--patch='[{"op": "add", "path": "/spec/template/spec/containers/0/env/-", "value": {"name": "npm_config_cache", "value": "/tmp"}}]'

Intermittent login issues when Cloud Pak for Data is integrated with the Identity Management Service

Applies to: 4.7.0 and later

When Cloud Pak for Data is integrated with the Identity Management Service, users sometimes encounter an error when the log in to the web client. Users might see one of the following errors when they log in:

Error 504 - Gateway Timeout
Internal Server Error

Some users might be directed to the Identity providers page rather than the Cloud Pak for Data home page.

Diagnosing the problem

If users experience one or more of the issues described in the preceding text, check the platform-identity-provider pods to determine whether the pods have been restarted multiple times:

oc get pods -n ${PROJECT_CPD_INST_OPERANDS} | grep platform-identity-provider

If the output indicates multiple restarts, proceed to Resolving the problem

Resolving the problem

Restart the icp-mongodb-0 pod:

oc delete pods icp-mongodb-0 -n ${PROJECT_CPD_INST_OPERANDS}

Restart the icp-mongodb-1 pod:

oc delete pods icp-mongodb-1 -n ${PROJECT_CPD_INST_OPERANDS}

Restart the icp-mongodb-2 pod:

oc delete pods icp-mongodb-2 -n ${PROJECT_CPD_INST_OPERANDS}

Restart the platform-auth-service pod:

oc delete pods -n ${PROJECT_CPD_INST_OPERANDS} | grep platform-auth-service

Restart the platform-identity-management pod:

oc delete pods -n ${PROJECT_CPD_INST_OPERANDS} | grep platform-identity-management

The `create-rsi-patch` command fails

Applies to: 4.7.3 and later

When you run the cpd-cli manage create-rsi-patch command fails when you try to create or update an existing resource specification injection (RSI) patch.

Diagnosing the problem

Confirm that the command fails during the following task:

TASK [utils : Wait until zen extension Cr is in Completed state]

Review the errors in the ibm-zen-operator pod log:
1. Set the ZEN_OPERATOR_POD environment variable to the name of the zen-operator pod in the operators project for the instance:
```
export ZEN_OPERATOR_POD=$(oc get pods -n=${PROJECT_CPD_INST_OPERATORS} | grep zen-operator | awk '{print $1}')
```
2. Change to the operators project:
```
oc project ${PROJECT_CPD_INST_OPERATORS}
```
3. Open a remote shell on the pod:
```
oc rsh ${ZEN_OPERATOR_POD}
```
4. In the remote shell, set the following environment variables:
  1. Set the PROJECT_CPD_INST_OPERANDS environment variable to the operands project for the instance:
```
export PROJECT_CPD_INST_OPERANDS=<project-name>
```
  2. Set the RSI-EXTENSION environment variable to the name of the ZenExtension that was created to manage the RSI patch:
```
export RSI_EXTENSION=rsi-<patch-name>
```
    Replace <patch-name> with the name of the patch that you attempted to create or update.
5. Run the following command to look for the message We were unable to read either as JSON nor YAML in the latest log file:
```
cat /tmp/ansible-operator/runner/zen.cpd.ibm.com/v1/ZenExtension/${PROJECT_CPD_INST_OPERANDS}/${RSI_EXTENSION}/artifacts/latest/stdout | grep "We were unable to read either as JSON nor YAML"
```
  If the command returns a non-empty response, continue to Resolving the problem.

Resolving the problem

If you need to use the RSI feature, contact IBM Support.

Common core services is not aligned on the New diagnostics job page

Applies to: 4.7.0 and later

Diagnosing the problem

When you create a new diagnostics job, the Common Core Services option is not aligned with the other services on the page.

This image shows the New diagnostics job page. On this page, Common core services is not properly indented.

The Common Core Services row functions normally. This issue does not prevent you from creating a diagnostics job. After you create the job, you must refresh the Gather diagnostics page to see the job.

Critical alerts might appear on the home page after installation

Applies to: 4.7.0

Diagnosing the problem

After installation, you might see critical alerts on the home page. However, the events that generated the alerts have cleared. The alerts will continue to display on the Alerts card for up to 3 days unless you delete the pods that generated the alerts.

The alerts are visible to users with one of the following permissions:

Administer platform
Manage platform health
View platform health

Log in to the web client as a user with the appropriate permissions to view alerts.
On the home page, click View all on the Alerts card.
On the Alerts and events page, confirm that the alerts were generated by one of the following services:
- Common core services
- Watson Knowledge Catalog

This issue can occur because wkc-db2-init pods or jdbc-driver-sync-job pods are in an Error state.

Resolving the problem

An instance administrator or cluster administrator must resolve the problem.

Log in to Red Hat® OpenShift® Container Platform as a user with sufficient permissions to complete the task.
```
oc login ${OCP_URL}
```

Check the status of the wkc-db2-init pods and the jdbc-driver-sync-job pods.

oc get pods --sort-by=.status.startTime -n ${PROJECT_CPD_INST_OPERANDS} | grep -E 'wkc-db2u-init|jdbc-driver'

Delete any pods that are in the Error state.
Replace <pod-name> with the name of the pod in the error state.
```
oc delete pod <pod-name> -n ${PROJECT_CPD_INST_OPERANDS}
```

Elasticsearch pods shut down when they reach their ephemeral storage

Applies to: 4.7.2 and 4.7.3

Fixed in: 4.7.4

Diagnosing the problem

When the ibm-elasticsearch-operator-ibm-es-controller-manager pod reaches its ephemeral storage limit of 1Gi, the pod fails with a ContainerStatusUnknown error. Each time the pod fails, a new pod is created.

This problem occurs because of debug logs that are created.

Get the status of the ibm-elasticsearch-operator-ibm-es-controller-manager pods:

oc get pods -n=${PROJECT_CPD_INST_OPERATORS} | grep ibm-elasticsearch-operator-ibm-es-controller-manager

If the command returns one or more pods in the ContainerStatusUnknown state, check the ephemeral storage setting:

oc get csv ibm-elasticsearch-operator.v1.1.1654 \
--n=${PROJECT_CPD_INST_OPERATORS} \
--ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.ephemeral-storage}'

If the command returns a value of 1Gi, continue to Resolving the problem.

Resolving the problem

Delete the ibm-elasticsearch-operator-ibm-es-controller-manager pods that are in the ContainerStatusUnknown state:

oc delete pods -n=${PROJECT_CPD_INST_OPERATORS} | grep ibm-elasticsearch-operator-ibm-es-controller-manager | grep ContainerStatusUnknown

Update the ephemeral storage setting:

oc patch csv ibm-elasticsearch-operator.v1.1.1654 \
-n=${PROJECT_CPD_INST_OPERATORS} \
--type=json \
--patch="[{'op': 'replace', 'path': '/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/ephemeral-storage', 'value': '2Gi'}]"

Confirm that the patch was successfully applied:

oc get csv ibm-elasticsearch-operator.v1.1.1654 \
--n=${PROJECT_CPD_INST_OPERATORS} \
--ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.ephemeral-storage}'

The command should return a value of 2Gi.

Wait several minutes, then confirm that no pods are in the ContainerStatusUnknown state:

oc get pods -n=${PROJECT_CPD_INST_OPERATORS} | grep ibm-elasticsearch-operator-ibm-es-controller-manager

Storage volume pods cannot start when the persistent volume has a lot of files

Applies to: 4.7.3 and 4.7.4

If you have a storage volume that points to a persistent volume with lots of files, the storage volume pods cannot start. This issue occurs on Red Hat OpenShift Data Foundation storage.

Diagnosing the problem

If you cannot access the storage volume:

From the web client, ensure that file server is not Stopped.
1. From the navigation menu, select Administration > Storage volumes.
2. Click the storage volume name and check the File server status.
  - If the file server is Stopped, restart it.
  - If the file server is Running, proceed to the next step.
Set the VOLUMES_PROJECT environment variable to the name of the project where the storage volume exists:
The volume is in the operands project
```
export VOLUMES_PROJECT=${PROJECT_CPD_INST_OPERANDS}
```
The volume is in a tethered project
```
export VOLUMES_PROJECT=${PROJECT_CPD_INSTANCE_TETHERED}
```

Get the name of the pod:

oc get pods -n ${VOLUMES_PROJECT} | grep volumes-volume

Set the VOLUMES_POD environment variable to the name of the pod:
```
export VOLUMES_POD=<pod-name>
```

Get the pod logs:

oc logs ${VOLUMES_POD} -n ${VOLUMES_PROJECT}

Look for an error with the following format:

Error: kubelet may be retrying requests that are timing out in CRI-O due to system load. 
Currently at stage container volume configuration: context deadline exceeded: error reserving ctr 
name k8s_volumes-... for id <ID>: name is reserved

If the logs include the preceding error, proceed to Resolving the problem.

Resolving the problem

To enable the storage volume pod to start:

Get the name of the deployment associated with the pod:

oc get deployments -n ${VOLUMES_{PROJECT} | grep volumes-volume

Set the VOLUMES_DEPLOYMENT environment variable to the name of the deployment:
```
export VOLUMES_DEPLOYMENT=<deployment-name>
```

Patch the deployment to add the

io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel:
true

label:

oc patch deploy ${VOLUMES_DEPLOYMENT}\
--namespace=${VOLUMES_PROJECT} \
--type='json' \
--patch='[{"op": "add", "path": "/spec/template/metadata/annotations/io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel", "value": "true" }]'

Installation and upgrade issues

General installation and upgrade issues
Upgrades from 4.5
- Cannot update operators with a dependency on etcd or RabbitMQ
Upgrades from 4.7
- The setup-instance-topology command fails
General upgrade issues

The `apply-cluster-components` command fails when another IBM Cloud Pak is installed on the cluster

Applies to: 4.7.0, 4.7.1, 4.7.2, and 4.7.3

Fixed in: 4.7.4

If you try to install IBM Cloud Pak for Data Version 4.7 and another IBM Cloud Pak is already installed on the cluster, the apply-cluster-components command fails with the following message:

[✘] [ERROR] The max version of ibm-common-services-operator installed is X.XX.X. 
                  Version 4.0.0 is the minimum required version. 
                  Re-run the command with the '--migrate_from_cs_ns' option.

This problem occurs when the existing IBM Cloud Pak is running IBM Cloud Pak foundational services Version 2.x.

Resolving the problem: To resolve the problem, re-run the cpd-cli manage apply-cluster-components command without the --migrate_from_cs_ns option.

Installs and upgrades fail when you use a proxy server

Applies to: 4.7.0 and 4.7.1

Fixed in: 4.7.2

If you use a cluster-wide proxy for Red Hat OpenShift Container Platform, the cpd-cli manage apply-cr command fails during the zen installation or upgrade.

Diagnosing the problem

The cpd-cli manage apply-cr command times out while installing or upgrading the zen component.
Examine the ZenService custom resource:
```
oc describe ZenService \
--namespace=${PROJECT_CPD_INST_OPERANDS}
```
In the Status section, confirm that the following information is true:
- The Progress is 66%.
- The message specifies an unknown playbook failure

Examine the

zen-watchdog-frontdoor-extension
ZenExtensions

oc describe ZenExtensions zen-watchdog-frontdoor-extension \
--namespace=${PROJECT_CPD_INST_OPERANDS}

In the Status section, confirm that one of the following messages is displayed:

403 error:

Status code was -1 and not [200, 404]: Request failed: <urlopen error Tunnel connection failed: 403 Forbidden>

502 error:

Status code was -1 and not [200, 404]: Request failed: <urlopen error Tunnel connection failed: 502 Bad Gateway>

503 error:

Status code was -1 and not [200, 404]: Request failed: <urlopen error Tunnel connection failed: 503 Service Unavailable>

504 error:

Status code was -1 and not [200, 404]: Request failed: <urlopen error Tunnel connection failed: 504 Gateway Timeout>

Confirm that the issue is caused by your proxy settings:
1. Set the ZEN_OPERATOR_POD environment variable to the name of the zen-operator pod in the operators project for the instance:
```
export ZEN_OPERATOR_POD=$(oc get pods -n=${PROJECT_CPD_INST_OPERATORS} | grep zen-operator | awk '{print $1}')
```
2. Change to the operators project:
```
oc project ${PROJECT_CPD_INST_OPERATORS}
```
3. Open a remote shell on the pod:
```
oc rsh ${ZEN_OPERATOR_POD}
```
4. In the remote shell, set PROJECT_CPD_INST_OPERANDS to the operands project for the instance:
```
export PROJECT_CPD_INST_OPERANDS=<project-name>
```
5. Run the following command to determine whether you can access the zen-core-api:
```
curl -vks "https://zen-core-api-svc.${PROJECT_CPD_INST_OPERANDS}:4444/v2/config"
```
  If there is an issue with the proxy configuration, the command should return one of the following error codes: 403, 502, 503 or 504.

Resolving the problem

Update the ZEN_CORE_API_URL property in the product-configmap ConfigMap:

oc patch cm product-configmap \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--type=merge \
--patch="{\"data\": {\"ZEN_CORE_API_URL\": \"https://zen-core-api-svc.${PROJECT_CPD_INST_OPERANDS}.svc:4444\"}}"

Confirm that the patch was applied:
```
oc get cm product-configmap \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
-o yaml
```
The value of the ZEN_CORE_API_URL property should be https://zen-core-api-svc.<project-name>.svc:4444, where <project-name> is the value of the PROJECT_CPD_INST_OPERANDS environment variable.

Patch the ZenService custom resource to trigger a reconcile loop:

oc patch ZenService lite-cr \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--type=merge \
--patch='{"spec": {"patchProductConfigmap": "true"}}'

Wait for the status of the zen component to be Completed. To check the status of the zen component, run:

cpd-cli manage get-cr-status \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
--components=zen

The `apply-cr` command fails when installing the `zen` component

Applies to: 4.7.0, 4.7.1, 4.7.2

Fixed in: 4.7.3

When you install IBM Cloud Pak for Data Version 4.7, the apply-cr command fails when installing the zen component. This issue occurs when the PostgreSQL API server is installed on the cluster.

Diagnosing the problem

The apply-cr command fails when installing the zen component.

Review the ibm-zen-operator pod log:

Get the name of the ibm-zen-operator pod:

oc get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep ibm-zen-operator

Check the pod log for a RecursionError:

oc logs <ibm-zen-operator-pod-name> -n ${PROJECT_CPD_INST_OPERATORS} | grep "RecursionError: maximum recursion depth exceeded in comparison"

If the command returns a non-empty response, continue to the next step.

Examine the ZenService custom resource:
```
oc describe ZenService -n ${PROJECT_CPD_INST_OPERANDS}
```
In the Status section, confirm that the following information is true:
- The Progress is 3% or 66%
- The message specifies an unknown playbook failure

Resolving the problem

Contact IBM Software support to patch the following operators:

ibm-zen-operator
cpd-platform-operator

The `apply-cr` command fails when installing services with a dependency on Db2U

Applies to: 4.7.0, 4.7.1, 4.7.2, and 4.7.3

Fixed in: 4.7.4

Diagnosing the problem

You can specify the privileges that Db2U runs with. If you configured Db2U to run with limited privileges, the apply-cr command will fail if:

You set DB2U_RUN_WITH_LIMITED_PRIVS: "true" in the db2u-product-cm ConfigMap.
The kernel parameter settings were not modified to allow Db2U to run with limited privileges.

This issue can manifest in several ways.

The wkc-db2u-init job fails

If you are installing Watson Knowledge Catalog, the apply-cr command fails with the message:

"WKC DB2U post install job failed ('wkc-db2u-init' job)

When you get the status of the wkc-db2u-init pods, they are in the Error state.

oc get pods -n ${PROJECT_CPD_INST_OPERANDS} | grep wkc-db2u-init

The Db2UCluster resource never becomes ready

For other services you might notice that the Db2uCluster resource never becomes Ready.

oc get Db2uCluster -n ${PROJECT_CPD_INST_OPERANDS}

You cannot provision service instances

For services such as Db2 and Db2 Warehouse, the apply-cr command completes successfully, but the service instances never finish provisioning and the *-db2u-0 pods are stuck in Pending or SysctlForbidden.

oc get pods -n ${PROJECT_CPD_INST_OPERANDS} | grep db2u-0

Resolving the problem

This problem occurs when you set DB2U_RUN_WITH_LIMITED_PRIVS: "true" in the db2u-product-cm ConfigMap but the kernel parameter settings were not modified to allow Db2U to run with limited privileges.

Review Changing kernel parameter settings to confirm that you can change the kernel parameter settings.

If you can change the kernel parameter settings, ensure that the worker nodes are restarted after you change the settings.
In some cases, when you run the cpd-cli manage apply-db2-kubelet command, the worker nodes are not restarted.
If you cannot or do not change the kernel parameter settings, update the db2u-product-cm ConfigMap to set DB2U_RUN_WITH_LIMITED_PRIVS: "false". For more information, see Specifying the privileges that Db2U runs with.

The `apply-cr` command fails if the `--components` list includes the `scheduler` component

Applies to: 4.7.0 and 4.7.1

Fixed in: 4.7.2

Diagnosing the problem

When you run the cpd-cli manage apply-cr command and the --components list includes the scheduler component, the command returns the following error:

[ERROR] You cannot use the apply-cr command to install or upgrade the scheduling service (component=scheduler). 
To install or upgrade the scheduling service, run the apply-scheduler command.

Remove the scheduler from the --components list.

If you are using an environment variables script to populate the --components, temporarily remove scheduler from the COMPONENTS environment variable so that you can run the apply-cr command.

Re-add the component after the apply-cr command runs successfully.

Resolving the problem

If you are installing Cloud Pak for Data Version 4.7.2, you have a version of the olm-utils-v2 image with the fix.
If you are installing Cloud Pak for Data Version 4.7.0 or 4.7.1, run the following command when the client workstation is connected to the internet:
```
cpd-cli manage restart-container
```
The command loads the latest version of the olm-utils-v2 image on the client workstation.
If you pull images from a private container registry, you can use the following commands to move the latest version of the image to the private container registry:
1. cpd-cli manage save-image (required only if the client workstation cannot connect to the internet and the private container registry at the same time)
2. cpd-cli manage copy-image

Cannot update operators with a dependency on etcd or RabbitMQ

Applies to: Upgrades from Version 4.5.x

Fixed in: etcd issue: 4.7.2

Diagnosing the problem

When you run the

cpd-cli
manage
apply-olm

command, the operator for one or more of the following services might get stuck in the Installing phase.

Service	RabbitMQ	etcd
IBM® Match 360	✓
OpenPages	✓
Watson Assistant		✓
Watson Discovery		✓
Watson Knowledge Studio		✓

This issue can occur when the upgrade of the etcd operator or RabbitMQ operator fails. These dependencies use Helm based operators. When the upgrade of a Helm based operator fails, the failed version is not automatically deleted. If the operator does not have sufficient memory to retry the upgrade, the operators encounter an out-of-memory error, and the upgrade fails.

To determine which operator is causing the problem, complete the following steps.

If you are upgrading a service that has a dependency on RabbitMQ, check the status of the RabbitMQ operator.
1. Check the last state of the operator.
```
oc get pods -n $PROJECT_CPD_OPS \
  -lapp.kubernetes.io/instance=ibm-rabbitmq-operator \
  -ojsonpath='{.items[*].status.containerStatuses[*].lastState.terminated}'
```
2. If the state includes "exitCode":137,..."reason":"OOMKilled", then get the name of the operator.
```
oc get csv -n "${PROJECT_CPD_INST_OPERATORS}" \
-loperators.coreos.com/ibm-rabbitmq-operator.${PROJECT_CPD_INST_OPERATORS}
```
  You need the name to resolve the problem. The name has the following format: ibm-rabbitmq-operator.vX.X.X.
If you are upgrading a service that has a dependency on etcd, check the status of the etcd operator.
1. Check the last state of the operator.
```
oc get pods -n $PROJECT_CPD_OPS \
  -lapp.kubernetes.io/instance=ibm-etcd-operator \
  -ojsonpath='{.items[*].status.containerStatuses[*].lastState.terminated}'
```
2. If the state includes "exitCode":137,..."reason":"OOMKilled", then get the name of the operator.
```
oc get csv -n "${PROJECT_CPD_INST_OPERATORS}" \
-loperators.coreos.com/ibm-etcd-operator.${PROJECT_CPD_INST_OPERATORS}
```
  You need the name to resolve the problem. The name has the following format: ibm-etcd-operator.vX.X.X.

Resolving the problem

If one or more pods are in the CrashLoopBackOff state, complete the following steps to resolve the problem.

Check the current limits and requests for the operator with pods that are in a poor state.
If all operators were stuck, repeat this process for each operator.
1. Set the OP_NAME environment variable to the name of the operator.
```
export OP_NAME=<operator-name>
```
2. Check the current limits for the operator.
```
oc get csv -n ${PROJECT_CPD_INST_OPERATORS} ${OP_NAME} \
-ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.memory}'
```
3. Check the current requests for the operator.
```
oc get csv -n ${PROJECT_CPD_INST_OPERATORS} ${OP_NAME} \
-ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.requests.memory}'
```
4. Choose the appropriate action based on the values returned by the preceding commands.
  - If either the limits or requests are less than 1Gi, continue to the next step.
  - If both values are greater than 1Gi, then the cause of the problem was misdiagnosed. This solution will not resolve the issues that you are seeing.

Increase the memory limits and requests for the affected operator.

If all operators are stuck, repeat this process for each operator.

Create a JSON file named patch.json with the following content.

[
  {
    "op": "replace",
    "path":"/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/requests/memory",
    "value": "1Gi"
  },
  {
    "op": "replace",
    "path":"/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory",
    "value": "1Gi"
  }
]

Ensure that the OP_NAME environment variable is set to the correct operator name.
```
echo ${OP_NAME}
```

Patch the operator.

oc patch csv -n ${PROJECT_CPD_INST_OPERATORS} ${OP_NAME} \
--type=json --patch="$(cat patch.json)"

Confirm that the patch was successfully applied.

oc get csv -n ${PROJECT_CPD_INST_OPERATORS} ${OP_NAME} \
-ojsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.memory}

The command should return 1Gi.

Important: The patch is temporary. The memory settings apply only to the current deployment. The next time that you update the operator, the settings are replaced by the default settings.

The setup-instance-topology command fails

Applies to: Upgrades from 4.7.0 and 4.7.1

Fixed in: 4.7.2

When you run the cpd-cli manage setup-instance-topology command, the command fails.

The error occurs because the ibm-common-service-operator.v4.0.1 operator does not come up.

Diagnosing the problem

To determine whether the failure was caused by the ibm-common-service-operator operator, run the following command.

oc get csv --namespace=${PROJECT_CPD_INST_OPERATORS} | grep ibm-common-service-operator

If the phase is Failed, proceed to Resolving the problem.
If the phase is Succeeded, use the information returned by the cpd-cli to identify the root cause of the problem.

Resolving the problem

To resolve the problem, follow the guidance in Operator installation or upgrade fails with exceeded progress deadline error in the IBM Cloud Pak foundational services documentation.

Service upgrades fail when the `ibm-zen-operator` pod runs out of memory

Applies to:

Upgrades from Version 4.5 to Version 4.7
Upgrades from Version 4.6 to Version 4.7

When you run the cpd-cli manage apply-cr to upgrade a service, the command fails. In some situations, this error occurs because the ibm-zen-operator pod runs out of memory.

Diagnosing the problem

To determine whether the service upgrade failed because the ibm-zen-operator pod ran out of memory:

Set the OP_NAME environment variable to the name of the ibm-zen-operator pod:

export OP_NAME=$(oc get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep ibm-zen| awk '{print $1}')

Get the status of the ibm-zen-operator pod:
```
oc get pods ${OP_NAME} -n ${PROJECT_CPD_INST_OPERATORS}
```
If the pod ran out of memory, the pod will either be in CrashLoopBackOff or in Running with a large number of restarts.
- If either of the preceding statements are true, continue to the next step.
- If neither of the statements are true, review the other known issues.
Check the last state of the pod:
- If the pod is in the CrashLoopBackOff state, run the following command:
```
oc get pods ${OP_NAME} -n ${PROJECT_CPD_INST_OPERATORS} -o yaml
```
- If the pod is in the Running state but has been restarted multiple times, run the following command:
```
oc describe pods ${OP_NAME} -n ${PROJECT_CPD_INST_OPERATORS}
```
Review the containerStatuses section of the output. Confirm that the last state includes "exitCode":137,..."reason":"OOMKilled".

Resolving the problem

To resolve the problem, patch the ibm-zen-operator pod to increase the memory to 2 Gi:

Patch the pod:

oc patch pod ${OP_NAME} \
--namespace=${PROJECT_CPD_INST_OPERATORS} \
--type=merge \
--patch='{"spec":{"containers":[{"name":"ibm-zen-operator", "resources":{"limits":{"memory":"2Gi"}}}]}}'

The `apply-cr` command fails when services with a dependency on the common core services get stuck in the `InProgress` state during upgrade

Applies to:

Upgrades from Version 4.5 to Version 4.7
Upgrades from Version 4.6 to Version 4.7

When you upgrade to IBM Cloud Pak for Data Version 4.7, services with a dependency on the common core services get stuck in the InProgress state. The apply-cr command fails waiting for the components to be Completed. This problem occurs when the existing Elasticsearch pods are not successfully terminated.

For a list of services with a dependency on the common core services, see Software requirements.

Diagnosing the problem

The apply-cr command fails with the following message:

[ERROR] Playbook failed while running 'wait_for_cr.yml' of '<cr-name>' for component '<component-ID>' in namespace '<project-name>'.

Review the custom resource for the component to confirm that the issue is caused by common core services.

Run the following command to see the contents of the custom resource:

oc get <service-kind> <service-cr> -n ${PROJECT_CPD_INST_OPERANDS} -o yaml

Use the following table to find the appropriate values for <service-kind> and <service-cr>. If the apply-cr command returns a different name for the custom resource, use the name returned by the apply-cr command.

Component	Service kind	Default custom resource (CR) name
`cognos_analytics`	`CAService`	`ca-addon-cr`
`datastage_ent`	`DataStage`	`datastage`
`datastage_ent_plus`	`DataStage`	`datastage`
`dv`	`DvService`	`dv-service`
`match360`	`MasterDataManagement`	`mdm-cr`
`replication`	`ReplicationService`	`replicationservice-cr`
`wkc`	`WKC`	`wkc-cr`
`wml`	`WmlBase`	`wml-cr`
`ws`	`WS`	`ws-cr`
`ws_pipelines`	`WSPipelines`	`wspipelines-cr`

Review the output for the following message:

message: |-
        Dependency CCS failed to install

Review the common core services custom resource:

oc get ccs ccs-cr -n ${PROJECT_CPD_INST_OPERANDS} -o yaml

Look for the following message:

message: |-
        unknown playbook failure
        The playbook has failed at task - 'check if original es sts is ready'

Check the status of the Elasticsearch StatefulSet that is created by common core services:
```
oc get sts -n ${PROJECT_CPD_INST_OPERANDS} --selector=app=elasticsearch-master
```
If some of the pods in the StatefulSet are not ready, proceed to Resolving the problem.

Resolving the problem

To resolve the issue, forcefully delete the pods that are associated with the StatefulSet:

oc delete pods -n ${PROJECT_CPD_INST_OPERANDS} -l app=elasticsearch-master --force

After you upgrade a Red Hat OpenShift Container Platform cluster, the FoundationDB resource can become unavailable

Applies to: 4.7.0 and later

Diagnosing the problem

After you upgrade your cluster to a new version of Red Hat OpenShift Container Platform, the IBM FoundationDB pods can become unavailable. When this issue occurs, services that rely on FoundationDB such as Watson Knowledge Catalog and IBM Match 360 cannot function correctly.

This issue affects deployments of the following services.

IBM Watson® Knowledge Catalog
IBM Match 360 with Watson

To identify the cause of this issue, check the FoundationDB status and details.

Check the FoundationDB status.
```
oc get fdbcluster -o yaml | grep fdbStatus
```
If this command is successful, the returned status is Complete. If the status is InProgress or Failed, proceed to the workaround steps.
If the status is Complete but FoundationDB is still unavailable, log in to one of the FDB pods and check the status details to ensure that the database is available and all coordinators are reachable.
```
oc rsh sample-cluster-log-1 /bin/fdbcli
```
To check the detailed status of the FDB pod, run fdbcli to enter the FoundationDB command-line interface, then run the following command at the fdb> prompt.
```
status details
```
- If you get a message that is similar to Could not communicate with a quorum of coordination servers, run the coordinators command with the IP addresses specified in the error message as input.
```
oc get pod -o wide | grep storage
> coordinators IP-ADDRESS-1:4500:tls IP-ADDRESS-2:4500:tls IP-ADDRESS-3:4500:tls 
```
  If this step does not resolve the problem, proceed to the workaround steps.
- If you get a different message, such as Recruiting new transaction servers, proceed to the workaround steps.

Resolving the problem

To resolve this issue, restart the FoundationDB pods.

Required role: To complete this task, you must be a cluster administrator.

Restart the FoundationDB cluster pods.
```
oc get fdbcluster 
oc get po |grep ${CLUSTER_NAME} |grep -v backup|awk '{print }' |xargs oc delete po
```
Replace ${CLUSTER_NAME} in the command with the name of your fdbcluster instance.

Restart the FoundationDB operator pods.

oc get po |grep fdb-controller |awk '{print }' |xargs oc delete po

After the pods finish restarting, check to ensure that FoundationDB is available.
1. Check the FoundationDB status.
```
oc get fdbcluster -o yaml | grep fdbStatus
```
  The returned status must be Complete.
2. Check to ensure that the database is available.
```
oc rsh sample-cluster-log-1 /bin/fdbcli
```
  If the database is still not available, complete the following steps.
  1. Log in to the ibm-fdb-controller pod.
  2. Run the fix-coordinator script.
```
kubectl fdb fix-coordinator-ips -c ${CLUSTER_NAME} -n ${PROJECT_CPD_INST_OPERATORS}
```
    Replace ${CLUSTER_NAME} in the command with the name of your fdbcluster instance.
    
    Note: For more information about the fix-coordinator script, see the workaround steps from the resolved IBM Match 360 known issue item The FoundationDB cluster can become unavailable.

Inaccurate status message from the command line after you upgrade

Applies to: Version 4.7.0 and later of the following services.

Watson Assistant
Watson Discovery
Watson Knowledge Studio
Watson Speech services

Diagnosing the problem: If you run the cpd-cli service-instance upgrade command from the Cloud Pak for Data command-line interface, and then use the service-instance list command to check the status of each service, the provision status for the service is listed as UPGRADE_FAILED.
Cause of the problem: When you upgrade the service, only the cpd-cli manage apply-cr command is supported. You cannot use the cpd-cli service-instance upgrade command to upgrade the service. And after you upgrade the service with the apply-cr method, the change in version and status is not recognized by the service-instance command. However, the correct version is displayed from the Cloud Pak for Data web client.
Resolving the problem: No action is required. If you use the cpd-cli manage apply-cr method to upgrade the service as documented, the upgrade is successful and you can ignore the version and status information that is generated by the cpd-cli service-instance list command.

Secrets are not visible in connections after upgrade

Applies to: Version 4.7.0 and later

If you use secrets when you create connections, the secrets are not visible in the connection details after you upgrade Cloud Pak for Data. This issue occurs when your vault uses a private CA signed certificate.

Resolving the problem

To see the secrets in the user interface:

Change to the project where Cloud Pak for Data is installed:
```
oc project ${PROJECT_CPD_INST_OPERANDS}
```

Set the following environment variables:

oc set env deployment/zen-core-api VAULT_BRIDGE_TLS_RENEGOTIATE=true
oc set env deployment/zen-core-api VAULT_BRIDGE_TOLERATE_SELF_SIGNED=true

Security issues

Security scans return an Inadequate Account Lockout Mechanism message

Security scans return an `Inadequate Account Lockout Mechanism` message

Applies to: 4.7.0 and later

Diagnosing the problem

If you run a security scan against Cloud Pak for Data, the scan returns the following message.

Inadequate Account Lockout Mechanism

Resolving the problem

This is by design. It is strongly recommended that you use an enterprise-grade password management solution, such as SAML SSO or an LDAP provider for password management, as described in the following resources.

Backup and restore issues (all methods)

Online backup of Watson Discovery fails at checkpoint stage
Restore job fails at Db2 workload step

Online backup of Watson Discovery fails at checkpoint stage

Applies to: 4.7.3 and later

Diagnosing the problem

When you try to create an online backup, the backup process fails at the checkpoint hook stage. For example, if you are creating the backup with IBM Storage Fusion, the backup process fails at the Hook: br-service-hooks-checkpoint stage in the backup sequence. In the log file, you see an error message similar to the following example:

download failed: s3://common-zen-wd/mt/__built-in-tenant__/fileResource/701db916-fc83-57ab-0000-000000000010.zip 
to tmp/s3-backup/common-zen-wd/mt/__built-in-tenant__/fileResource/701db916-fc83-57ab-0000-000000000010.zip 
Connection was closed before we received a valid response from endpoint URL: "https://s3.openshift-storage.svc:443/common-zen-wd/mt/__built-in-tenant__/fileResource/701db916-fc83-57ab-0000-000000000010.zip".

Cause of the problem

Large resource files can become corrupted while they are downloaded to the backup. As a result, the wd-discovery-aux-ch-s3-backup job does not complete successfully.

Workaround

Delete the file that is shown in the error message and recreate it.

Exec into the wd-discovery-support pod:

oc exec -it deploy/wd-discovery-support -- bash

Do the following steps within the pod.
1. Delete the file:
```
mc rm s3://common-zen-wd/mt/__built-in tenant__/fileResource/<file_name>
```
2. Confirm that the file is not listed when you run the following command:
```
mc ls s3://common-zen-wd/mt/__built-in-tenant__/fileResource/
```
3. Exit from the pod.

Delete the wd-discovery-orchestrator-setup job:

oc delete job/wd-discovery-orchestrator-setup

Wait for the wd-discovery-orchestrator-setup job to run again and complete.

Confirm that the file was successfully recreated:

Exec into the wd-discovery-support pod:

oc exec -it deploy/wd-discovery-support -- bash

Do the following steps within the pod.
1. Copy the file to the tmp directory:
```
aws-wd s3 cp s3://common-zen-wd/mt/__built-in-tenant__/fileResource/<file_name> /tmp
```
2. Confirm that the file is copied:
```
ls /tmp/<file_name>
```
3. Exit from the pod.

You can now retake the backup, and the wd-discovery-aux-ch-s3-backup job will complete successfully.

Restore job fails at Db2 workload step

Applies to: 4.7.3 and later

Diagnosing the problem

When restoring a Cloud Pak for Data deployment that includes Db2 or Db2 Warehouse, the restore job fails.

When backup and restore is done with IBM Storage Fusion, the restore job fails at the Hook: br-service-hooks/post-workload step.

In the cpdbr-oadp.log file, the following entry appears:

WV is not running yet...sleep 30

Workaround

Before you take an online backup, for each db2u deployment, run the following commands:

clusters=$(kubectl get db2ucluster -o jsonpath='{.items[*].metadata.name}')

for cluster in $clusters; do
  pod_name="c-${cluster}-db2u-0"
  echo "Running commands on ${pod_name}"

  # Check if wvcli exists in the pod
  if kubectl exec $pod_name -- bash -c "which wvcli 2> /dev/null"; then
   
    kubectl exec $pod_name -- bash -c "wvcli system mln-buffer --disable persist && sv stop wolverine && sv start wolverine"
  
  else
    echo "wvcli not found in $pod_name, skipping to the next pod."
  fi
done

Online backup and restore with IBM Storage Fusion issues

Unable to run checkpoint backup post-hooks command after a failed online backup
Unable to back up Cloud Pak for Data operators when OpenPages is installed
Db2 Data Management Console is not successfully restored
Restoring Cognos Dashboards remains in progress

Unable to run checkpoint backup post-hooks command after a failed online backup

Applies to: 4.7.0 and later

Diagnosing the problem

When an online backup fails, Cloud Pak for Data must be returned to a good state before you can retry a backup. You return Cloud Pak for Data to a good state by running the checkpoint backup post-hooks command:

cpd-cli oadp backup posthooks \
--include-namespaces ${PROJECT_CPD_INST_OPERANDS} \
--hook-kind=checkpoint \
--log-level=debug \
--verbose

Running the command results in an error like in the following example:

Error: error running post-backup hooks: cannot get checkpoint id in namespace PROJECT_CPD_INST_OPERANDS, namespaces: [PROJECT_CPD_INST_OPERATORS PROJECT_CPD_INST_OPERANDS], checkpointId: , err: info is nil
[ERROR] <timestamp> RunPluginCommand:Execution error:  exit status 1

Workaround

Do the following steps.

Check whether the configmap cpdbr-ckpt-cm is empty:
```
oc get cpdbr-ckpt-cm -o yaml
```

If the configmap is empty, do the following steps.

Copy the following script:

create-checkpoint-id.sh script (click to expand)

#!/bin/bash

function createUninitializedCmContent() {
    local hashedNamespaces=""
    local namespacesArr=()
    local quotedNamespacesArr=()
    local namespacesStr=""
    local namespacesJsonArr=""

    local uninitializedCheckpointLine=""
    local cpdOperatorLine=""
    local zenLine=""
    local currentTimeEpoch=$(date +%s)

    # if shasum command does not exists, resort to sha1sum command
    local hashCmd=$(command -v shasum > /dev/null && echo "shasum -a 1" || echo "sha1sum")
    
    # operator namespace should be appended first before the cpd namespace
    # to fully reproduce the logic in go
    if [ -n ${CPD_OPERATOR_NAMESPACE} ]; then
        namespacesArr+=(${CPD_OPERATOR_NAMESPACE})
    fi

    namespacesArr+=(${NAMESPACE})

    # do string join from array of namespaces to create hash
    IFS="_"
    namespacesStr="${namespacesArr[*]}"
    unset IFS

    # create json array string like ["ibm-common-services", "zen"]
    for namespace in "${namespacesArr[@]}"
    do
        quotedNamespacesArr+=("\"${namespace}\"")
    done
    IFS=","
    namespacesJsonArr="[${quotedNamespacesArr[*]}]"
    unset IFS
    
    

    hashedNamespaces="hk_$( echo -n $namespacesStr | eval $hashCmd | awk '{print $1}' )" 
    uninitializedCheckpointId="dummy-$(uuidgen | tr A-Z a-z)"

    uninitializedCheckpointLine=$(cat << EOV
${hashedNamespaces}: '{"infos": [{"uid": "${uninitializedCheckpointId}","namespaces": ${namespacesJsonArr}, "createdAt": ${currentTimeEpoch}, "startTime":0, "completionTime": 0, "status": "", "hookInfos": null}]}'
EOV
)
    
    zenLine="ns_${NAMESPACE}: ${hashedNamespaces}"
    if [ ! -z ${CPD_OPERATOR_NAMESPACE} ]; then
        cpdOperatorLine="ns_${CPD_OPERATOR_NAMESPACE}: ${hashedNamespaces}"
    fi

    
    CONFIGMAP_DATA=$(cat << EOF
data:
    ${uninitializedCheckpointLine}
    ${zenLine}
    ${cpdOperatorLine}
EOF
)
}

function patchCheckpointData() {

    echo "writing to yaml file ${OUTPUT_FILE_PATH}..."
    cat > "${OUTPUT_FILE_PATH}" << EOF
${CONFIGMAP_DATA}
EOF

    echo "patching cpdbr-ckpt-cm with yaml file..."
    oc patch cm cpdbr-ckpt-cm --patch-file "${OUTPUT_FILE_PATH}"
    echo "all done!"

}


function help() {
    echo ""
    echo "create-checkpoint-id.sh - Tenant Backup and Restore"
    echo "    SYNTAX:"
    echo "        ./create-checkpoint-id.sh --namespace 'namespace'  [--tenant-operator-namespace 'CPD Operators' | --out 'out yaml file' | --dry-run]"
    echo ""
    echo "    COMMANDS:"
    echo "        help : Display help usage"    
    echo ""
    echo "    PARAMETERS:"
	  echo "        --tenant-operator-namespace : CPD Operator namespace. Used with label-tenant command."
      echo "        --namepace : zen namespace."
      echo "        --out : output yaml file path. Default yaml file is 'cpdbr-ckpt-cm-uninitialized-patch.yaml'"
      echo "        --dry-run : if flag is set, then do not directly patch the configmap"
    echo ""
    echo "     NOTE: User must be logged into the Openshift cluster from the oc command line."
    echo ""

}

# main login
if [ $# -eq 0 ]; then
    echo "No parameters provided"
    help
    exit 1
fi

while (( $# )); do
    case "$1" in    
        -n| --namespace)
            if [ -n "$2" ] && [ ${2:0:1} != "-" ]; then
                NAMESPACE=$2
                shift 2
            else
                echo "Invalid --namespace): ${2}"
                help
                exit 1
            fi
            ;;
        --tenant-operator-namespace)
            if [ -n "$2" ] && [ ${2:0:1} != "-" ]; then
				CPD_OPERATOR_NAMESPACE=$2
				shift 2
			else
			    echo "Invalid --tenant-operator-namespace): ${2}"
				help
				exit 1
			fi
			;;
        --out)
            if [ -n "$2" ] && [ ${2:0:1} != "-" ]; then
				OUTPUT_FILE_PATH=$2
				shift 2
			else
			    echo "Invalid --out): ${2}"
				help
				exit 1
			fi
			;;

        --dry-run)
            if [ -n "$1" ]; then
				DRY_RUN=1
				shift 1
            fi
			;;
        help|-h|--h|-help|--help) # help
			help
			exit 0
			;;
		-*|--*=) # unsupported flags
			echo "Invalid parameter $1" >&2
			help
			exit 1
			;;
		*) # preserve positional arguments
			PARAMS="$PARAMS $1"
			shift
			;;
    esac
done 

if [ -z "$NAMESPACE" ]; then
    echo "--namespace has to be defined"
    help
    exit 1
fi

if [ -z "${OUTPUT_FILE_PATH}" ]; then
    OUTPUT_FILE_PATH="cpdbr-ckpt-cm-uninitialized-patch.yaml"
    echo "will defaulted to save checkpoint data at ${OUTPUT_FILE_PATH}"
fi


createUninitializedCmContent

echo "will patch configmap data with this content..."
echo "----------------------------------------------"
echo "${CONFIGMAP_DATA}"
echo "----------------------------------------------"

if [ -z "${DRY_RUN}" ]; then
    patchCheckpointData
fi

Test the script by running the following command:

./create-checkpoint-id.sh --namespace ${PROJECT_CPD_INST_OPERANDS} --tenant-operator-namespace ${PROJECT_CPD_INST_OPERATORS} --dry-run

Run the script by removing the --dry-run option:

./create-checkpoint-id.sh --namespace ${PROJECT_CPD_INST_OPERANDS} --tenant-operator-namespace ${PROJECT_CPD_INST_OPERATORS}

Re-run the checkpoint backup post-hooks command.

Unable to back up Cloud Pak for Data operators when OpenPages is installed

Applies to: 4.7.0, 4.7.1, and 4.7.2

Fixed in: 4.7.3

Diagnosing the problem

In IBM Storage Fusion, the backup status of the Cloud Pak for Data operators is Failed snapshot.

In the log, the following entries appear:

time=<timestamp> level=info msg=cmd stdout: 
time=<timestamp> level=info msg=cmd stderr: ksh: /mnt/backup/db2-online-backup.log: cannot create [Permission denied]

Workaround

Modify the openpages-<instance_name>-aux-ckpt-cm configmap.

Edit the configmap:

oc edit configmap -n ${PROJECT_CPD_INST_OPERANDS} openpages-<instance_name>-aux-ckpt-cm

Locate the following line:

"ksh -lc '/mnt/backup/online/db2-online-backup.sh max_no_of_days_between_full_backup=7 > /mnt/backup/db2-online-backup.log'"

Replace the line with:

"ksh -lc 'mkdir -p /mnt/backup/online/logs && /mnt/backup/online/db2-online-backup.sh max_no_of_days_between_full_backup=7 > /mnt/backup/online/logs/db2-online-backup.log'"

Db2 Data Management Console is not successfully restored

Applies to: 4.7.1

Fixed in: 4.7.2

Diagnosing the problem: When Cloud Pak for Data users open the My instance page, the Db2 Data Management Console instance is in a Pending state.
This problem occurs only when Cloud Pak for Data was upgraded from version 4.5.x or 4.6.x.
Workaround: Ask the Db2 Data Management Console administrator to delete the instance and reprovision it.

Restoring Cognos Dashboards remains in progress

Applies to: 4.7.0 and 4.7.1

Fixed in: 4.7.2

Diagnosing the problem

When you use IBM Storage Fusion Version 2.6 to back up a Cloud Pak for Data deployment that includes Cognos® Dashboards, and restore the deployment to a different cluster, the Cognos Dashboards restore operation remains in an InProgress state.

Workaround

To ensure that the Cognos Dashboards restore operation completes as expected, run the following commands before you create the backup.

Run the cpd-cli manage login-to-ocp command to log in to the cluster as a user with sufficient permissions to complete this task. For example:
```
cpd-cli manage login-to-ocp \
--username=${OCP_USERNAME} \
--password=${OCP_PASSWORD} \
--server=${OCP_URL}
```
Tip: The login-to-ocp command takes the same input as the oc login command. Run oc login --help for details.

Run the following commands:

oc -n ${PROJECT_CPD_INST_OPERANDS} label service c-dashboard-redis-p icpdsupport/ignore-on-nd-backup=true
oc -n ${PROJECT_CPD_INST_OPERANDS} label secret dashboard-redis-cert icpdsupport/ignore-on-nd-backup=true

For more information, see Prepare Cognos Dashboards.

Online backup and restore with the OADP backup and restore utility issues

The ZenService custom resource is in a Failed state after restoring an online backup
IBM Match 360 encounters errors during pre-backup and post-restore operations
Unable to create online backup when Watson Knowledge Catalog and IBM Match 360 are installed in the same operand project
After restoring IBM Match 360 from an online backup, the associated Redis pods can enter a CrashLoopBackOff state
Watson Knowledge Catalog profiling operations fail after you restore an online backup
Following an upgrade, some Db2 Data Management Console pods are not running after you restore an online backup

The ZenService custom resource is in a `Failed` state after restoring an online backup

Applies to: 4.7.2 and later

Diagnosing the problem

This problem occurs when Cloud Pak for Data is integrated with the Identity Management Service.

Get the Identity Management Service operator logs.

Get the pod name for ibm-iam-operator:

oc get pod -A | grep -e ibm-iam-operator

View the Identity Management Service operator log output:

oc logs ibm-iam-operator-<pod_name> -n ${PROJECT_CPD_INST_OPERATORS}

The log contains an entry like in the following example:

{"level":"info","ts":1693307569.5649076,"logger":"leader","msg":"Leader lock configmap must have exactly one owner reference.","ConfigMap":{"namespace":"cpd-operators","name":"ibm-iam-operator-lock"}}

Workaround

Delete the ibm-iam-operator-lock configmap.

oc delete cm ibm-iam-operator-lock -n ${PROJECT_CPD_INST_OPERATORS}

IBM Match 360 encounters errors during pre-backup and post-restore operations

After a successful IBM Match 360 deployment, the CR status (mdmStatus) enters a Completed state and the operator stops its reconciliation process. During backup and restore operations, this can lead to errors in the pre-backup and post-restore steps.

Applies to: 4.7.4

Resolving the problem

After installing the IBM Cloud Pak for Data platform with the IBM Match 360 service, run the provided script to resolve this issue by ensuring that the operator checks for the required backup and restore annotations. This enables reconciliation to complete after successful installations of IBM Match 360so that the pre-backup and post-restore steps can complete.

Required role: To complete this task, you must be a cluster administrator.

To resolve this issue, complete the following steps:

Ensure that the IBM Match 360 CR is in a Completed state:

oc get MasterDataManagement mdm-cr -o jsonpath='{.status.mdmStatus} {"\n"}' -n ${PROJECT_CPD_INST_OPERATORS}

The returned status should be Completed.

Copy and save the following script as a file with the name mdm-br-fix.sh.

mdm-br-fix.sh script (click to expand)

set -e
if [ -z "$1" ]
then
      echo "ERROR: Please provide value for operator namespace"
      exit
fi
if [ -z "$2" ]
then
      echo "ERROR: Please provide value for operand namespace"
      exit
fi
cat << EOF >> verify_completed_state.yml
#
# _____{COPYRIGHT-BEGIN}_____
# IBM Confidential
# OCO Source Materials
#
# 5725-E59
#
# (C) Copyright IBM Corp. 2021-2023 All Rights Reserved.
#
# The source code for this program is not published or otherwise
# divested of its trade secrets, irrespective of what has been
# deposited with the U.S. Copyright Office.
# _____{COPYRIGHT-END}_____
#

# Switch status from Completed to InProgress if:
#  1. The instance identifier cannot be found
#  2. The targeted operand version does not match the currently installed operand (upgrade scenario)
#  3. Any dependency or service is in an unavailable state

- name: Get mdm CR
  k8s_info:
    api_version: mdm.cpd.ibm.com/v1
    kind: MasterDataManagement
    name: "{{ ansible_operator_meta.name }}"
    namespace: "{{ ansible_operator_meta.namespace }}"
  register: mdm_cr

- name: The current operand version
  debug:
    var: mdm_current_version
  vars:
    mdm_resource: "{{ (mdm_cr.resources | default([]))[0] | default({}) }}"
    mdm_current_version: "{{ (((mdm_resource.status | default({})).versions | default([]))[1] | default({})).version | default(None) }}"

- name: The targeted operand version
  debug:
    var: mdm_target_version
  vars:
    mdm_resource: "{{ (mdm_cr.resources | default([]))[0] | default({}) }}"
    mdm_target_version: "{{ (mdm_resource.spec | default({})).version | default(None) }}"

- name: Set var if Backup annotations are present
  set_fact:
    bkp_annotation: "{% if (mdm_cr.resources[0].metadata.annotations['mdm.cpd.ibm.com/backup-trigger'] is defined) %} True {% else %} False {% endif %}"

- name: Set var if Restore annotations are present
  set_fact:
    br_annotation: "{% if (mdm_cr.resources[0].metadata.annotations['mdm.cpd.ibm.com/restore-trigger'] is defined or mdm_cr.resources[0].metadata.annotations['mdm.cpd.ibm.com/restore-trigger-offline'] is defined) %} True {% else %} False {% endif %}"

- name: Check if Operator is in reconciliation state
  set_fact:
    reconcile_state: "{% if (mdm_cr.resources | length>0 and (mdm_cr.resources[0].status.conditions[2].message == 'Running reconciliation' and mdm_cr.resources[0].status.conditions[2].status == 'True')) %} True {% else %} False {% endif %}"
  when: (br_annotation is defined and br_annotation == " False ") or (bkp_annotation is defined and bkp_annotation == " False ")

- name: Set CR status to InProgress if the instance identifier cannot be found or the targeted operand version does not match the installed version or operator is in reconcile state
  operator_sdk.util.k8s_status:
    api_version: "mdm.cpd.ibm.com/v1"
    kind: "MasterDataManagement"
    name: "{{ ansible_operator_meta.name }}"
    namespace: "{{ ansible_operator_meta.namespace }}"
    status:
      mdmStatus: "InProgress"
  vars:
    mdm_resource: "{{ (mdm_cr.resources | default([]))[0] | default({}) }}"
    mdm_instance_id: "{{ (mdm_resource.status | default({})).instance_id | default(None) }}"
    mdm_target_version: "{{ (mdm_resource.spec | default({})).version | default(None) }}"
    mdm_current_version: "{{ (((mdm_resource.status | default({})).versions | default([]))[1] | default({})).version | default(None) }}"
  when: (mdm_instance_id is not defined and instance_identifier is not defined) or (mdm_target_version is defined and mdm_current_version is defined and mdm_target_version != mdm_current_version) or (reconcile_state is defined and reconcile_state == " True ")

- block:
  - name: Initialize all_services_available to true
    set_fact:
      all_services_available: true
  - name: Check availability of services post-install
    include_tasks: check_services.yml
  - name: Set CR status to InProgress if not all services are available
    operator_sdk.util.k8s_status:
      api_version: "mdm.cpd.ibm.com/v1"
      kind: "MasterDataManagement"
      name: "{{ ansible_operator_meta.name }}"
      namespace: "{{ ansible_operator_meta.namespace }}"
      status:
        mdmStatus: "InProgress"
    when: not all_services_available
  when:
    - instance_identifier is defined

EOF

cat << EOF >> skip_reconcile.yml 
#
# _____{COPYRIGHT-BEGIN}_____
# IBM Confidential
# OCO Source Materials
#
# 5725-E59
#
# (C) Copyright IBM Corp. 2021-2023 All Rights Reserved.
#
# The source code for this program is not published or otherwise
# divested of its trade secrets, irrespective of what has been
# deposited with the U.S. Copyright Office.
# _____{COPYRIGHT-END}_____
#

- name: "Check saved and actual CR spec section data to determine if reconcile can be skipped"
  block:
    - name: Get mdm CR
      k8s_info:
        api_version: mdm.cpd.ibm.com/v1
        kind: MasterDataManagement
        name: "{{ ansible_operator_meta.name }}"
        namespace: "{{ ansible_operator_meta.namespace }}"
      register: mdm_cr

    - name: Get mdm-cr-cm 
      k8s_info:
        api_version: v1
        kind: ConfigMap
        name: "mdm-{{ ansible_operator_meta.name }}-cm"
        namespace: "{{ ansible_operator_meta.namespace }}"
      register: mdm_cm

    - name: Compare data only when mdm-cr-cm and mdm-cr is present    
      block:
        - name: Save current CR spec
          set_fact:
            current_spec: "{{ mdm_cr.resources[0].spec }}"

        - name: Get mdm-cr-cm configmap data
          set_fact:
            cm_data: "{{ mdm_cm.resources[0].data['instance.json'] }}" 

        - name: Retrive Saved spec metadata from mdm-cr-cm
          set_fact:
            saved_spec: "{{ cm_data.create_arguments.metadata }}"
          when: cm_data is defined and cm_data | length>0

        - name: Set var if BR annotations are present
          set_fact:
            br_annotation: "{% if (mdm_cr.resources[0].metadata.annotations['mdm.cpd.ibm.com/restore-trigger'] is defined or mdm_cr.resources[0].metadata.annotations['mdm.cpd.ibm.com/restore-trigger-offline'] is defined) %} True {% else %} False {% endif %}"

        - name: Set var if Backup annotations are present
          set_fact:
            bkp_annotation: "{% if (mdm_cr.resources[0].metadata.annotations['mdm.cpd.ibm.com/backup-trigger'] is defined) %} True {% else %} False {% endif %}"

        - debug:
            msg: br_annotation {{ br_annotation }} bkp_annotation {{ bkp_annotation }}
      when: (mdm_cr.resources is defined and mdm_cr.resources | length>0) and (mdm_cm.resources is defined and mdm_cm.resources | length>0)

    - name: End Reconciliation if spec is unchanged and mdmStatus is Completed
      block:
        - debug: 
            msg: "Previously saved and actual data from the MDM CR spec section are the same and CR status is Completed. Skipping reconcile by ending play"
        - meta: end_play
      when: (mdm_cr.resources | length>0) and (saved_spec is defined and current_spec is defined) and (saved_spec == current_spec) and (mdm_cr.resources[0].status is defined and (mdm_cr.resources[0].status.mdmStatus is defined and mdm_cr.resources[0].status.mdmStatus == "Completed")) and ((br_annotation is defined and br_annotation == " False ") or (bkp_annotation is defined and bkp_annotation == " False "))

EOF

MDM_OPERATOR_POD=`oc get pods -n $1|grep ibm-mdm-operator-controller|awk '{print $1}'`
MDM_CR_NAME=`oc get mdm -n $2|awk 'FNR ==2 {print $1}'`

if [ -z "$MDM_OPERATOR_POD" ]
then
      echo "ERROR: MDM operator pod is not present"
      exit
fi
if [ -z "$MDM_CR_NAME" ]
then
      echo "ERROR: MDM operand CR is not present"
      exit
fi

oc cp verify_completed_state.yml $MDM_OPERATOR_POD:/opt/ansible/roles/3.2.35/mdm_cp4d/tasks/verify_completed_state.yml -n $1
oc cp skip_reconcile.yml $MDM_OPERATOR_POD:/opt/ansible/roles/3.2.35/mdm_cp4d/tasks/skip_reconcile.yml -n $1
oc patch mdm $MDM_CR_NAME --type=merge -p '{"spec":{"onboard":{"timeout_seconds":"840"}}}' -n $2

rc=$?
if [ $rc != 0 ]; 
  then
   echo "Error patching mdm-operator pod"
   exit $exitrc
  else
   echo "Patch successful"
fi
rm -Rf verify_completed_state.yml
rm -Rf skip_reconcile.yml

Give the script the permissions that it needs to run:
```
chmod 777 mdm-br-fix.sh
```
Run the script using the following command, which includes two additional parameters: the Operator namespace and the Operand namespace.
```
./mdm-br-fix.sh ${PROJECT_CPD_INST_OPERATORS} ${PROJECT_CPD_INST_OPERANDS}
```
When the script is successful, the resulting message is similar to the following example:
```
masterdatamanagement.mdm.cpd.ibm.com/mdm-cr patched
Patch successful
```
Proceed with the backup and restore operations. For more information, see Backing up and restoring Cloud Pak for Data.

Unable to create online backup when Watson Knowledge Catalog and IBM Match 360 are installed in the same operand project

Applies to: 4.7.0, 4.7.1, and 4.7.2

Fixed in: 4.7.3

Diagnosing the problem

Running the cpd-cli oadp checkpoint create command fails, and you see an error such as in the following example:

Error: error running checkpoint exec hooks: error processing configmap "wkc-foundationdb-cluster-aux-checkpoint-cm": component conflict detected for "fdb" defined in configmap "mdm-foundationdb-1691204250918697-aux-checkpoint-cm"
[ERROR] <timestamp> RunPluginCommand:Execution error: exit status 1

This problem occurs when both IBM Match 360 and Watson Knowledge Catalog are installed in the same operand project (namespace), and Watson Knowledge Catalog has enabled Manta.

Workaround

Modify the backup and restore configmap of one of the services, such as mdm-foundationdb-<xxxxxxxxxxxxxxxx>-aux-checkpoint-cm.

In the aux-meta section, change the component: value to fdb-<service_name>.
Locate and change the cpdfwk.component: value to the same value from the previous step.

After restoring IBM Match 360 from an online backup, the associated Redis pods can enter a `CrashLoopBackOff` state

Applies to: 4.7.1, 4.7.2, and 4.7.3

Fixed in: 4.7.4

Diagnosing the problem

When you restore IBM Match 360 from backup, the associated Redis pods can fail to come up, showing a status of CrashLoopBackOff.

Workaround

To fix this problem, complete the following steps to clean up the Redis pods:

Required role: To complete this task, you must be a cluster administrator.

Get all of the Redis recipes in the current namespace:

oc get recipes.redis.databases.cloud.ibm.com -n ${PROJECT_CPD_INST_OPERANDS}

Delete each of the listed recipes:

oc delete recipes.redis.databases.cloud.ibm.com <RECIPE_NAME> -n ${PROJECT_CPD_INST_OPERANDS}

Get the Redis CR name:

oc get redissentinel -n ${PROJECT_CPD_INST_OPERANDS}

Back up the Redis CR and name it redissentinels.yaml:

oc get redissentinels <CR_NAME> -o yaml > redissentinels.yaml

Delete the current Redis CR:
```
 oc delete redissentinels <CR_NAME>
```
Restore the Redis CR from the backup that you just created:
```
oc apply -f redissentinels.yaml
```
Wait for reconciliation to complete. The Redis pods should now up and enter a running state.
Refresh the IBM Match 360 configuration UI pod (mdm-config-ui) to restore its Redis connection.
1. Get the name of the mdm-config-ui pod:
```
oc get pod -n ${PROJECT_CPD_INST_OPERANDS} | grep mdm-config-ui
```
2. Delete the mdm-config-ui pod using the name you retrieved in the previous step.
```
oc delete pod <CONFIG-UI-POD-NAME> -n ${PROJECT_CPD_INST_OPERANDS}
```
3. Wait for the mdm-config-ui pod to be recreated and in a running state.

Watson Knowledge Catalog profiling operations fail after you restore an online backup

Applies to: 4.7.0 and 4.7.1

Fixed in: 4.7.2

Diagnosing the problem

Profiling jobs fail with the following error message:

[ERROR   ] {"class_name":"com.ibm.wdp.profiling.impl.messaging.consumer.DataProfileConsumer","method_name":"handleDataProfileFailure","class":"com.ibm.wdp.profiling.impl.messaging.consumer.DataProfileConsumer","method":"handleDataProfileFailure","appname":"wdp-profiling","user":"NONE","thread_ID":"db","trace_ID":"7wmei02iwgd5o70522m5ry8hm","transaction_ID":"NONE","timestamp":"2023-07-10T18:57:40.051Z","tenant":"NONE","session_ID":"NONE","perf":"false","auditLog":"false","loglevel":"SEVERE","message":"THROW. The WDPException is: Internal Server Error Failed to start Humming-Bird job..","msg_ID":"CDIWC2006E","exception":"com.ibm.wdp.service.common.exceptions.WDPException: CDIWC2006E: Internal Server Error Failed to start Humming-Bird job..\n\tat com.ibm.wdp.profiling.impl.messaging.consumer.DataProfileConsumer.handleDataProfileFailure(DataProfileConsumer.java:429)\n\tat com.ibm.wdp.profiling.impl.messaging.consumer.DataProfileConsumer.handleHummingbirdEvents(DataProfileConsumer.java:275)\n\tat com.ibm.wdp.profiling.impl.messaging.consumer.DataProfileConsumer.processDelivery(DataProfileConsumer.java:223)\n\tat com.ibm.wdp.service.common.rabbitmq.ConsumerManager.handle(ConsumerManager.java:308)\n\tat com.rabbitmq.client.impl.recovery.AutorecoveringChannel$4.handleDelivery(AutorecoveringChannel.java:642)\n\tat com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:149)\n\tat com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:111)\n\tat java.base\/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\n\tat java.base\/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\n\tat java.base\/java.lang.Thread.run(Unknown Source)\n","component_ID":"wdp-profiling","message_details":"THROW. The WDPException is: Internal Server Error Failed to start Humming-Bird job.."}
CDIWC2006E: Internal Server Error Failed to start Humming-Bird job..

Workaround

Restart the wdp-profiling-<xxx> pod.

Get the pod name.

oc get pod -l app=wdp-profiling --no-header

Restart the pod:
```
oc delete pod wdp-profiling-<xxx>
```

Following an upgrade, some Db2 Data Management Console pods are not running after you restore an online backup

Applies to: 4.7.0 and 4.7.1

Fixed in: 4.7.2

Diagnosing the problem

After you restore an online backup, some Db2 Data Management Console pods are not running even though the custom resource status is Completed.

This problem occurs after Cloud Pak for Data is upgraded to version 4.7.0.

Workaround

To work around the problem, do the following steps:

Get the list of recipes:

oc get recipes.redis.databases.cloud.ibm.com -n ${PROJECT_CPD_INST_OPERANDS}

Delete each recipe by running the following command:

oc delete recipes.redis.databases.cloud.ibm.com <recipe_name> -n ${PROJECT_CPD_INST_OPERANDS}

Get the custom resource name:

oc get redissentinel -n ${PROJECT_CPD_INST_OPERANDS}

Back up the custom resource:

oc get redissentinels <CR_NAME> -o yaml > redissentinels.yaml

Delete the custom resource:
```
oc delete redissentinels <CR_NAME>
```
Reapply the custom resource that you previously backed up:
```
oc apply -f redissentinels.yaml
```

Offline backup and restore with the OADP backup and restore utility issues

Creating an offline backup in REST mode stalls
Offline restic backup of Watson Knowledge Catalog fails
After restoring IBM Match 360 from an offline backup, Redis pods can enter a CrashLoopBackOff state
Restoring an offline backup can fail when MANTA Automated Data Lineage is enabled
Cannot access the Cloud Pak for Data user interface after you restore Cloud Pak for Data
Some Db2 Data Management Console pods are stuck during restore

Creating an offline backup in REST mode stalls

Applies to: 4.7.0 and later

Diagnosing the problem

This problem occurs when you try to create an offline backup in REST mode by using a custom --image-prefix value. The offline backup stalls with cpdbr-vol-mnt pods in the ImagePullBackOff state.

Cause of the problem

When you specify the --image-prefix option in the

cpd-cli oadp backup
create

command, the default prefix registry.redhat.io/ubi9 is always used.

Resolving the problem

To work around the problem, create the backup in Kubernetes mode instead. To change to this mode, run the following command:

cpd-cli oadp client config set runtime-mode=

Offline restic backup of Watson Knowledge Catalog fails

Applies to: 4.7.0, 4.7.1, and 4.7.2

Fixed in: 4.7.3

Diagnosing the problem

The CPD-CLI*.log file has a message that shows that the wkc-unquiesce-job did not complete.

A log entry for wkc-unquiesce-job has the following message:

This jobs is waiting for wkc-catalog-api-jobs:

   ===============================  ======  =====
<timestamp> INFO: Yet to Scale Up 1 objects
   wkc-catalog-api-jobs             deploy  0/1

Workaround

To fix this problem, do the following steps:

Create a JSON file named specpatchreadiness.json with the following contents:

[
  {
    "op": "replace",
    "path": "/spec/containers/0/readinessProbe/exec/command",
    "value": 
      [
        "sh",
        "-c",
        "curl --fail -G -sS -k --max-time 30 https://localhost:8081/actuator/health\n"
      ]
  }
]

Copy the file to one of the following directories:
- cpd-cli-workspace/olm-utils-workspace/work/rsi/
- If you defined $CPD_CLI_MANAGE_WORKSPACE, $CPD_CLI_MANAGE_WORKSPACE/work/rsi

cpd-cli manage login-to-ocp --server=${OCP_URL} -u ${OCP_USERNAME} -p ${OCP_PASSWORD}

Install the resource specification injection (RSI) feature:

cpd-cli manage install-rsi --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}

Enable RSI:

cpd-cli manage enable-rsi --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}

Apply the following patch:

cpd-cli manage create-rsi-patch --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}
--patch_type=rsi_pod_spec \
  --patch_name=camsreadinessrsi \
  --description=\"This is spec patch for Catalog readiness Probe fix\" \
  --selector=app:wkc-catalog-api-jobs \
  --state=active \
  --spec_format=json \
  --patch_spec=/tmp/work/rsi/specpatchreadiness.json

After restoring IBM Match 360 from an offline backup, Redis pods can enter a `CrashLoopBackOff` state

Applies to: 4.7.0, 4.7.1, 4.7.2, and 4.7.3

Fixed in: 4.7.4

Diagnosing the problem

After you restore the IBM Match 360 service from an offline backup, the corresponding Redis pods can go into a CrashLoopBackOff state.

Workaround

To fix this problem, run the following command to delete and refresh the Redis pods:

oc get pod -n ${PROJECT_CPD_INST_OPERANDS} | grep mdm-redis | awk '{ print $1 }' | xargs oc delete pod -n ${PROJECT_CPD_INST_OPERANDS}

Restoring an offline backup can fail when MANTA Automated Data Lineage is enabled

Applies to: 4.7.0

Fixed in: 4.7.1

Diagnosing the problem

Restoring a Cloud Pak for Data offline backup that includes Watson Knowledge Catalog can fail if MANTA Automated Data Lineage is enabled.

Workaround

To work around the problem, shut down the MANTA application before you create the backup, and manually start it after you restore the backup.

To shut down the MANTA application, do the following steps.

Log in to Red Hat OpenShift Container Platform as a cluster administrator.
```
oc login ${OCP_URL}
```
Change to the appropriate project (namespace).
For example,
```
oc project wkc
```
Edit the mantaflow custom resource.
```
oc edit mantaflow mantaflow-wkc
```
Locate spec.
Update the following line.
```
shutdown : "force"
```
Wait until the MANTA pods are removed.
Create the offline backup.

After the backup is restored, manually start the MANTA application by doing the following steps.

Log in to Red Hat OpenShift Container Platform as a cluster administrator.
```
oc login ${OCP_URL}
```
Change to the appropriate project (namespace).
For example,
```
oc project wkc
```
Edit the mantaflow custom resource.
```
oc edit mantaflow mantaflow-wkc
```
Locate spec.
Update the following line.
```
shutdown : "false"
```
Wait until the MANTA pods are up.

Some Db2 Data Management Console pods are stuck during restore

Applies to: 4.7.0 and later

Diagnosing the problem

During restore, some Db2 Data Management Console pods remain stuck.

Workaround

Delete the Redis pods. The pods will then reconcile to a successful state.

Get the list of Redis pods:

oc get po -n ${PROJECT_CPD_INST_OPERANDS} | grep redis

Delete each pod by running the following command:
```
oc delete po <podname>
```

Cannot access the Cloud Pak for Data user interface after you restore Cloud Pak for Data

Applies to: 4.7.0 and 4.7.1

Fixed in: 4.7.2

Diagnosing the problem

When you try to access the Cloud Pak for Data user interface after you restore an offline backup to a different cluster, a 502 bad gateway error page appears. This problem occurs when Cloud Pak for Data is integrated with the Identity Management Service.

Workaround

To resolve the problem, complete the following steps after you restore your Cloud Pak for Data instance.

Save the NGINX deployment replicas as an environment variable.

NGINX_REPLICAS=`oc get deploy ibm-nginx -o jsonpath='{.spec.replicas}'`

Scale the NGINX deployment replicas to 0.
```
oc scale deploy ibm-nginx --replicas=0
```
Scale the NGINX deployment replicas back up to the value that you saved in step 1.
```
oc scale deploy ibm-nginx --replicas=${NGINX_REPLICAS}
```

Cloud Pak for Data API issues

Calls to the users or currentUserInfo API methods without pagination might crash the zen-metastore-edb pods.

Calls to the `users` or `currentUserInfo` API methods without pagination might crash the `zen-metastore-edb` pods.

Applies to: 4.7.0

Diagnosing the problem: When you run the /api/v1/usermgmt/v1/usermgmt/users API method without pagination in a small scale environment, the zen-metastore-edb pods goes in to Crash mode or Not ready mode; and you cannot recover the pod.
Resolving the problem: When you run the /api/v1/usermgmt/v1/usermgmt/users API method, you must use pagination.; If the zen-metastore-edb pod is unrecoverable, contact IBM Support.

Service issues

The following issues are specific to services.

Known issues and limitations for IBM Cloud Pak for Data

Customer-reported issues

General issues

Services with a dependency on Db2 as a service crash due to an npm EACCES error

Intermittent login issues when Cloud Pak for Data is integrated with the Identity Management Service

The create-rsi-patch command fails

Common core services is not aligned on the New diagnostics job page

Critical alerts might appear on the home page after installation

Elasticsearch pods shut down when they reach their ephemeral storage

Storage volume pods cannot start when the persistent volume has a lot of files

Installation and upgrade issues

The apply-cluster-components command fails when another IBM Cloud Pak is installed on the cluster

Installs and upgrades fail when you use a proxy server

The apply-cr command fails when installing the zen component

The apply-cr command fails when installing services with a dependency on Db2U

The apply-cr command fails if the --components list includes the scheduler component

Cannot update operators with a dependency on etcd or RabbitMQ

The setup-instance-topology command fails

Service upgrades fail when the ibm-zen-operator pod runs out of memory

The apply-cr command fails when services with a dependency on the common core services get stuck in the InProgress state during upgrade

After you upgrade a Red Hat OpenShift Container Platform cluster, the FoundationDB resource can become unavailable

Inaccurate status message from the command line after you upgrade

Secrets are not visible in connections after upgrade

Security issues

Security scans return an Inadequate Account Lockout Mechanism message

Backup and restore issues (all methods)

Online backup of Watson Discovery fails at checkpoint stage

Restore job fails at Db2 workload step

Online backup and restore with IBM Storage Fusion issues

Unable to run checkpoint backup post-hooks command after a failed online backup

Unable to back up Cloud Pak for Data operators when OpenPages is installed

Db2 Data Management Console is not successfully restored

Restoring Cognos Dashboards remains in progress

Online backup and restore with the OADP backup and restore utility issues

The ZenService custom resource is in a Failed state after restoring an online backup

IBM Match 360 encounters errors during pre-backup and post-restore operations

Unable to create online backup when Watson Knowledge Catalog and IBM Match 360 are installed in the same operand project

After restoring IBM Match 360 from an online backup, the associated Redis pods can enter a CrashLoopBackOff state

Watson Knowledge Catalog profiling operations fail after you restore an online backup

Following an upgrade, some Db2 Data Management Console pods are not running after you restore an online backup

Offline backup and restore with the OADP backup and restore utility issues

Creating an offline backup in REST mode stalls

Offline restic backup of Watson Knowledge Catalog fails

After restoring IBM Match 360 from an offline backup, Redis pods can enter a CrashLoopBackOff state

Restoring an offline backup can fail when MANTA Automated Data Lineage is enabled

Some Db2 Data Management Console pods are stuck during restore

Cannot access the Cloud Pak for Data user interface after you restore Cloud Pak for Data

Cloud Pak for Data API issues

Calls to the users or currentUserInfo API methods without pagination might crash the zen-metastore-edb pods.

Service issues

Services with a dependency on Db2 as a service crash due to an `npm EACCES` error

The `create-rsi-patch` command fails

The `apply-cluster-components` command fails when another IBM Cloud Pak is installed on the cluster

The `apply-cr` command fails when installing the `zen` component

The `apply-cr` command fails when installing services with a dependency on Db2U

The `apply-cr` command fails if the `--components` list includes the `scheduler` component

Service upgrades fail when the `ibm-zen-operator` pod runs out of memory

The `apply-cr` command fails when services with a dependency on the common core services get stuck in the `InProgress` state during upgrade

Security scans return an `Inadequate Account Lockout Mechanism` message

The ZenService custom resource is in a `Failed` state after restoring an online backup

After restoring IBM Match 360 from an online backup, the associated Redis pods can enter a `CrashLoopBackOff` state

After restoring IBM Match 360 from an offline backup, Redis pods can enter a `CrashLoopBackOff` state

Calls to the `users` or `currentUserInfo` API methods without pagination might crash the `zen-metastore-edb` pods.