Troubleshooting
Collect specific data about your environment and your Cloud Pak installation before you contact IBM support for assistance with a Cloud Pak for Business Automation issue. Always provide a detailed description of the problem and your environment.
When you run diagnostic commands, run them from an empty directory to package the files more cleanly. Run the commands from the namespace in which you observe the problematic container or component. For more information, see Mustgather: Collecting data to diagnose issues.
The OpenShift MustGather CLI command collects information from your cluster, which can be used to
debug issues. You can specify one or more images when you run the command by including the
--image
argument. When you specify an image, the tool collects data that is related
to that image.
The ibm-cp4a-operator
locates the Cloud Pak base images and has Ansible® roles to handle the
reconciliation logic and declare a set of playbook tasks for each component. The roles declare all
the variables and defaults for how the role is executed.
The operator deployment creates a container on your cluster for the operator. The following diagram shows how the operator watches for events, triggers an Ansible role when a custom resource changes, and then reconciles the resources for the deployed applications.

Use the following sections to find the information that you are looking for.
Collecting data
Depending on the type of operator, different logs are more useful. Use the following table to choose the Ansible or Go logs.
Capability | Type of operator | Operator name |
---|---|---|
CP4BA (multi-pattern) | Ansible | ibm-cp4a-operator |
CP4BA FileNet Content Manager | Ansible | ibm-content-operator |
CP4BA Operational Decision Manager | Ansible | ibm-odm-operator |
CP4BA Automation Document Processing | Ansible | ibm-dpe-operator |
CP4BA Automation Decision Service | Go | ibm-ads-operator |
CP4BA Workflow Process Server | Go | ibm-cp4a-wfps-operator |
CP4BA Process Federation Server | Go | ibm-cp4a-pfs-operator |
CP4BA Workflow Runtime and Workstreams Services | Go | ibm-workflow-operator |
CP4BA Business Automation Insights | Ansible | ibm-insights-engine-operator |
The following describes how to get additional logs, information about pods, secrets, and events that might help with troubleshooting.
- Getting the logs of the Ansible-based operators
- To get the log of the latest reconciliation for Ansible-based operators, run the
following command:
# <Must set> Set your project name here export project_name=$your_project_name # <Must set> Set target operator name here export operator_name=$operator_name operator_pod_name=$(kubectl get pod|grep $operator_name | awk '{print $1}') kubectl exec -i $operator_pod_name -n $project_name -- /bin/bash -c 'cat /tmp/ansible-operator/runner/icp4a.ibm.com/v1/*/*/*/artifacts/latest/stdout' > operator-ansible.log
Optional: Export the history of the Ansible logs.
Ansible operators keep a backup of the logs under /logs/$operator_pod_name/ansible-operator/runner/<group>/<version>/<kind>/<namespace>/<name>/artifacts. The log contains information on the first 10 reconciles, including the latest reconcile. The following commands copy the logs to a local directory. Select the operator name for which you want to export the log.
# <Must set> Set your project name here export project_name=$your_project_name export deployment_name=$(kubectl get icp4acluster | awk '{print $1}' | grep -v "NAME") # Below can export CP4BA Operator's Ansible log to /tmp/$operator_pod_name-log, do not need this when you install from Content Operator export operator_pod_name=$(kubectl get pod|grep ibm-cp4a-operator | awk '{print $1}') kubectl cp $project_name/$operator_pod_name:/logs/$operator_pod_name/ansible-operator/runner/icp4a.ibm.com/v1/ICP4ACluster/$project_name/$deployment_name/artifacts /tmp/$operator_pod_name-log # Below can export Content Operator's Ansible log to /tmp/$operator_pod_name-log, only need this when Content pattern involved. export operator_pod_name=$(kubectl get pod|grep ibm-content-operator | awk '{print $1}') kubectl cp $project_name/$operator_pod_name:/logs/$operator_pod_name/ansible-operator/runner/icp4a.ibm.com/v1/Content/$project_name/$deployment_name/artifacts /tmp/$operator_pod_name-log # Below can export Foundation Operator's Ansible log to /tmp/$operator_pod_name-log, do not need this when you install from CP4BA Operator export operator_pod_name=$(kubectl get pod|grep icp4a-foundation-operator | awk '{print $1}') kubectl cp $project_name/$operator_pod_name:/logs/$operator_pod_name/ansible-operator/runner/icp4a.ibm.com/v1/Foundation/$project_name/$deployment_name/artifacts /tmp/$operator_pod_name-log
Note: If you see "Cannot stat: No such file or directory" when you export the Ansible logs, it means that either no log that is generated from the current operator or the current operator is in its first reconcile.Optional: Edit the verbosity of the Ansible logs.
If the operator log does not provide the level of detail that you need, you can gather more details by adding an annotation like the following example to your custom resource YAML:
metadata: ... annotations: ansible.sdk.operatorframework.io/verbosity: "3" spec:
For the verbosity value, the normal rules for Ansible verbosity apply, where higher values mean more output. Acceptable values range from 0 (only the most severe messages are output) to 7 (all debugging messages are output). After you update the custom resource YAML, reapply the YAML for the changes to take effect.
- Getting the logs of the Go-based operators
- To get the log for go-based operators, run the following command:
kubectl logs deployment/$operator_name -n $project_name > operator.log
- Getting information about pending pods
- If some pods are pending, choose one of the pods, and run the following command to get more
information.
kubectl describe pod <podname>
- Getting information about secrets
- Kubernetes secrets are used extensively, so output about them might also be
useful.
kubectl get secrets
- Getting information about events
- Kubernetes events are objects that provide more insight into what is happening inside a cluster,
such as what decisions the scheduler makes or why some pods are evicted from a node. To get
information about these events, run the following command.
kubectl get events > events.log
You can also add the verbose parameter to any kubectl command.
kubectl -v=9 get pods
- Enabling Liberty tracing for Liberty-based CP4BA pods
-
For FNCM, BAN, and ADP pods use the following steps to enable a WebSphere® Application Server Liberty logging trace specification:
- Create a custom_server.xml file with a custom Liberty trace specification. A Liberty trace specification can vary and depends on why you are enabling it. The specification might come from IBM support or Liberty support.
- Copy the custom_server.xml file into the target pod under the
/opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides directory. This
directory is mapped to a PVC where the configuration file can be persisted.
WebSphere Application Server Liberty immediately detects the server configuration file and creates a trace.log file in the default directory, or in a custom directory if you specified one in the custom_server.xml file.
Resolving issues
- EDB Postgres instance is in a fenced state
-
If the EDB Postgres instance (
postgres-cp4ba
) is in theFenced
status, you see the following message in the logs:"msg":"Instance is fenced, won't start postgres right now","logging_pod":"postgres-cp4ba-1"
Run the following command to view the logs:
oc logs -n <CP4BA_namespace> -l k8s.enterprisedb.io/cluster=postgres-cp4ba
You can resolve the issue by running the following command, where the minus "
-
" sign at the end is thekubectl
command to remove the "k8s.enterprisedb.io/fencedInstances
" annotation.kubectl annotate cluster.postgresql.k8s.enterprisedb.io postgres-cp4ba k8s.enterprisedb.io/fencedInstances-
For more information, see Fencing on the EDB docs.
- Access routes return a 404 error
- If the URLs for the installed Cloud Pak for Business Automation components in the
cp4ba-access-info ConfigMap return a 404 error despite the operator logs
showing no errors, then it is possible that a Zen extension did not start properly. The issue can be
resolved by deleting the uncompleted Zen extensions and let the operators restart them. To get the
list of installed Zen extensions, run the following
command.
oc get zenextension
The following command provides an example of how to delete a Zen extension.
oc delete zenextension icp4adeploy-<component>-zen-extension
- Cannot connect to the web client when accessing Navigator
- If you see an error message that states a client cannot connect to the web client, then refresh
your browser and the connection message goes away.
The cannot connect message appears during a relatively short window of time when the back-end Navigator pod is rescheduled. For example, when you make an update to the Navigator admin desktop properties, or you create a new role or policy. Sometimes these actions prompt connection errors, but usually it writes a message that states the server is unavailable.
- Re-creating the image pull secret
- If your Docker registry secret expires, you can delete the secret and re-create it:
oc delete secret ibm-entitlement-key -n <namespace> oc create secret docker-registry ibm-entitlement-key --docker-server=image-registry.openshift-image-registry.svc:5000 --docker-username=kubeadmin --docker-password=$(oc whoami -t)
- Applying changes by restarting pods
- Sometimes, changes that you make in the custom resource YAML by using the operator or directly
in the environment are not automatically propagated to all pods. For example, modifications to data
source information or changes to Kubernetes secrets are not seen by running pods until the pods are
restarted.
If changes applied by the operator or other modifications that are made in the environment do not provide the expected result, restart the pods by scaling the impacted deployments down to 0 then up to the number that you want to have. Kubernetes (OpenShift) terminates the existing pods and creates new ones.
- CrashLoopBackOff status when an ODF storage class is used
- If you install a CP4BA instance that uses an ODF storage class, you might see some pods that
fail to be ready after the OCP cluster is rebooted.
To resolve the issue, manually restart the pods that fail to be ready.
- Directory mount failure prevents pod readiness
- If a pod stays in a CreateContainerError state, and the description of the
problem includes similar text to the following message then remove the failing mounted
path.
Warning Failed 43m kubelet Error: container create failed: time="2021-03-03T07:26:47Z" level=warning msg="unable to terminate initProcess" error="exit status 1" time="2021-03-03T07:26:47Z" level=error msg="container_linux.go:366: starting container process caused: process_linux.go:472: container init caused: rootfs_linux.go:60: mounting \"/var/lib/kubelet/pods/473b091d-acff-437b-b568-2383604dac01/volume-subpaths/config-volume/icp4adeploy-cmis-deploy/3\" to rootfs at **\"/var/lib/containers/storage/overlay/d011608f6df4bbfcc26c7d60568915caf7932124e61924b1a75802e6884ea060/merged/opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/ibm_oidc_rp.xml\" caused: not a directory"**
The problem occurs when a folder is generated instead of an XML file. A null folder is created to mount the file to the deployment and this raises the error.
You can remove a problematic folder from a deployment in two ways:
- If you can access the persistent volume, go to the mounted path and delete it. You can get the
path to the folder by running the following command.
oc describe pv $pv_name
- If you cannot access the persistent volume, edit the deployment by removing the failed mount.
- Edit the deployment by running the
oc edit deployment <deployment_name>
command. The following lines show an examplemountPath
:- mountPath: /opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/ibm_oidc_rp.xml name: config-volume subPath: ibm_oidc_rp.xml
- You can then access the pod when it is Running by using the
oc exec -it
command.oc exec -it icp4adeploy-cmis-deploy-5cd4774f78-mg6pw bash
- Delete the file with the
rm
command.rm /opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/ibm_oidc_rp.xml
- Edit the deployment by running the
When the folder is removed, you can wait for the operator to reconcile the change or add the removed mount path back manually to fix it.
- If you can access the persistent volume, go to the mounted path and delete it. You can get the
path to the folder by running the following command.
- Cannot log in to the Zen console
- After installation, you might not be able to log in to the Zen console by using the default
cluster administrator
cpadmin
username. The cause of this problem is that the namecpadmin
might also exist in the LDAP directory.To resolve the login issue, use the following steps.
- Change the name of the
cpadmin
user in theplatform-auth-idp-credentials
secret. - Change the cluster-wide role binding
oidc-admin-binding
to the new username. - Log in to the Zen console UI by using the OpenShift Credentials.
- Add new admin users in the console.
For more information, see Changing the Cloud Pak administrator username.
- Change the name of the
- Issues trying to install after you uninstalled
- If you see issues when you install a new instance on a cluster that you already used for a Cloud
Pak deployment, check if the foundational services dependencies are properly deleted.
For more information, see Uninstallation does not remove all components.
- Profile size does not scale down
-
When you decrease the pattern profile size after installation, from large to medium or from medium to small, Cloud Pak foundational services do not scale down with the profile size change. This behavior is expected. For more information about profile sizes, see System requirements.
- Operator pod in OOMKilled status
- If you see the Cloud Pak for Business Automation operator or any
operator pod with a status OOMKilled, it means that the resources that are allocated
to the operator pod is not enough for the workload. You can modify the
csv
to give the operator more resources. The following example can be adjusted to get the operator pod up and running again. You can find the csv name by "oc get csv -n $operator_namespace" and identify which one needs to change.oc patch csv ibm-cp4a-operator.v24.0.0 --type=json -p '[ { "op":"replace", "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/cpu", "value": "4" }, { "op":"replace", "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory", "value": "8Gi" }, { "op":"replace", "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/requests/cpu", "value": "1500m" }, { "op":"replace", "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/requests/memory", "value": "1600Mi" }, ]'
- Business Performance Center dashboards are missing
- If Business Performance Center
dashboards do not appear when you log in to the Business Performance Center web page:
- Get the cockpit pod by running the following
command:
kubectl get pod |grep insights-engine-cockpit
- Delete the cockpit pod.
- Delete cp4ba operator pod.
- Get the cockpit pod by running the following
command:
- Business Teams Service (BTS) cannot be installed
-
If BTS fails to install and you see the following messages in the logs, then it means that the
ibm-bts-oidc-client
secret cannot be created.oc logs ibm-bts-operator-controller-manager-instance-name|tail -10 IAM Client secret not yet found, retry after 5 seconds...
To resolve the issue, create a shell script with an
oc exec -it
command, and then run the script in the namespace where your foundational services are installed. For more information, see Steps to follow if certificate import is an issue. - Operator pods get evicted due to the lack of ephemeral storage
-
Some of the operator pods might get evicted when the pre-defined ephemeral storage is full. The causes to this problem might include, but are not limited to:
- Debug logging is turned on.
- Retrying logic of the operator is waiting for some resources to be available.
Typically, the pod might show the following error when you get the description of the pod:Status: Failed Reason: Evicted Message: Pod ephemeral local storage usage exceeds the total limit of containers 500Mi.
Follow these steps to resolve the problem:- Determine the CSV that the operator belongs to. Typically, you can predict the CSV name by
looking at the operator's pod name. For example, if the operator's pod name is
ibm-dpe-operator-59797bd587-clqv7
, then you can find its CSV with the following command:CSV=$(oc get csv |grep ibm-dpe-operator | awk '{print $1}') echo $CSV
- Update the ephemeral storage to a specific value. In the following example, the new ephemeral
storage size is
800Mi:
oc patch csv "$CSV" --type="json" -p="[{"op": "replace","path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/ephemeral-storage", "value": "800Mi"}]"
After updating the value, the operator pod is restarted.
- CP4BA operators cannot connect to the OCP API endpoint
-
If you see the following error on your cluster, then change the value of the namespaces for the API Server parameters (sc_api_namespace) in the
sc_egress_configuration
section of the CP4BA custom resource to "{}
"."stderr": "Error from server (InternalError): an error on the server (\"dial tcp XXX.XX.X.X:443: i/o timeout\") has prevented the request from succeeding", "stderr_lines": ["Error from server (InternalError): an error on the server (\"dial tcp XXX.XX.X.X:443: i/o timeout\") has prevented the request from succeeding"]
For more information, see Shared configuration parameters.
Troubleshooting capabilities
The custom resource can be configured to enable and disable specific logging parameters, log levels, log formats, and where these logs are stored for the various capabilities. If you need more information about specific Cloud Pak capabilities, go to the relevant troubleshooting topics.