Troubleshooting

It is necessary to collect specific information about your deployments to measure their well-being and to diagnose a problem before you open a support case on Cloud Pak for Business Automation.

Before you begin

You must collect specific data about your environment and your Cloud Pak installation before you contact IBM support for assistance with a Cloud Pak for Business Automation issue. You must provide a detailed description of the problem and your environment.

When you run diagnostic commands, run them from an empty directory to package the files more cleanly. Run the commands from the namespace in which you observe the problematic container or component. For more information, see Collecting data to diagnose issues.

The OpenShift must-gather CLI command collects information from your cluster, which can be used to debug issues. You can specify one or more images when you run the command by including the --image argument. When you specify an image, the tool collects data that is related to that image.

A must-gather extension image for all IBM Cloud Paks is also available at: opencloudio/must-gather.

You can collect logs by running the following command:

oc adm must-gather --image=quay.io/opencloudio/must-gather:latest

For more information about collecting the logs, see Collecting support information about the cluster.

About this task

The ibm-cp4a-operator locates the Cloud Pak base images and has Ansible roles to handle the reconciliation logic and declare a set of playbook tasks for each component. The roles declare all the variables and defaults for how the role is executed.

The operator deployment creates a container on your cluster for the operator. The following diagram shows how the operator watches for events, triggers an Ansible role when a custom resource changes, and then reconciles the resources for the deployed applications.

Operator workflow
Getting the operator logs (operator log)

The operator logs contain much more information about the operator than Kubernetes does. To see the logs of the operator container, run the following command.

kubectl logs deployment/ibm-cp4a-operator -c operator > operator.log
Copying the latest logs from the log volume (Ansible log)
If the operator log does not provide the level of detail that you need, you can gather more details by adding an annotation like the following example to your custom resource YAML:
metadata:
 ...
 annotations:
  "ansible.operator-sdk/verbosity": "3"
spec:

For verbosity value, the normal rules for Ansible verbosity apply, where higher values mean more output. Acceptable values range from 0 (only the most severe messages are output) to 7 (all debugging messages are output).

After you update the custom resource YAML, reapply the YAML for the changes to take effect.

For troubleshooting purposes, you can also copy the logs from the log volume /logs/$operator_pod_name/ansible-operator/runner/<group>/<version>/<kind>/<namespace>/<name>/artifacts. The log contains information on the first 10 reconciles and the latest reconcile. The following commands get the operator pod name and make a copy of the logs to a local directory.

deployment_name=$(kubectl get icp4acluster | awk '{print $1}' | grep -v "NAME")
operator_pod_name=$(kubectl get pod|grep ibm-cp4a-operator | awk '{print $1}')
kubectl cp $operator_pod_name:/logs/$operator_pod_name/ansible-operator/runner/icp4a.ibm.com/v1/ICP4ACluster/<namespace>/$deployment_name/artifacts /<local_logpath>
Getting information about pending pods
If some pods are pending, choose one of the pods and run the following command to get more information.
kubectl describe pod <podname> 
Getting information about secrets
Kubernetes secrets are used extensively, so output about them might also be useful.
kubectl get secrets
Getting information about events
Kubernetes events are objects that provide more insight into what is happening inside a cluster, such as what decisions the scheduler makes or why some pods are evicted from a node. To get information about these events, run the following command.
kubectl get events > events.log

You can also add the verbose parameter to any kubectl command.

kubectl -v=9 get pods
Recreating the image pull secret
If your Docker registry secret expires, you can delete the secret and re-create it:
oc delete secret admin.registrykey -n <namespace>
oc create secret docker-registry admin.registrykey --docker-server=image-registry.openshift-image-registry.svc:5000 --docker-username=kubeadmin --docker-password=$(oc whoami -t)
Applying changes by restarting pods
In some cases, changes that you make in the custom resource YAML by using the operator or directly in the environment are not automatically propagated to all pods. For example, modifications to data source information or changes to Kubernetes secrets are not seen by running pods until the pods are restarted.

If changes applied by the operator or other modifications that are made in the environment do not provide the expected result, restart the pods by scaling the impacted deployments down to 0 then up to the number that you want to have Kubernetes (OpenShift) terminate the existing pods and create new ones.

Directory mount failure prevents pod readiness
If a pod stays in a CreateContainerError state, and the description of the problem includes similar text to the following message then remove the failing mounted path.
Warning  Failed  43m  kubelet  Error: container create failed: time="2021-03-03T07:26:47Z" level=warning msg="unable to terminate initProcess" error="exit status 1"
time="2021-03-03T07:26:47Z" level=error msg="container_linux.go:366: starting container process caused: process_linux.go:472: container init caused: rootfs_linux.go:60: mounting \"/var/lib/kubelet/pods/473b091d-acff-437b-b568-2383604dac01/volume-subpaths/config-volume/icp4adeploy-cmis-deploy/3\" to rootfs at **\"/var/lib/containers/storage/overlay/d011608f6df4bbfcc26c7d60568915caf7932124e61924b1a75802e6884ea060/merged/opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/ibm_oidc_rp.xml\" caused: not a directory"**

The problem occurs when a folder is generated instead of an XML file. A null folder is created to mount the file to the deployment and this raises the error.

You can remove a problematic folder from a deployment in two ways:

  • If you can access the persistent volume, go to the mounted path and delete it. You can get the path to the folder by running the following command.
    oc describe pv $pv_name
  • If you cannot access the persistent volume, edit the deployment by removing the failed mount.
    1. Edit the deployment by running the oc edit deployment <deployment_name> command. The following lines show an example mountPath:
      - mountPath: /opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/ibm_oidc_rp.xml
                name: config-volume
                subPath: ibm_oidc_rp.xml
    2. You can then access the pod when it is Running by using the oc exec -it command.
      oc exec -it icp4adeploy-cmis-deploy-5cd4774f78-mg6pw bash
    3. Delete the file with the rm command.
      rm /opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/ibm_oidc_rp.xml

When the folder is removed, you can wait for the operator to reconcile the change or add the removed mount path back manually to fix it.

Cannot log in to the Zen console
After installation, you might not be able to log in to the Zen console by using the default cluster administrator admin username. The cause of this problem is that the name admin also exists in the LDAP directory.

To resolve the login issue, change the username admin in your LDAP to a different username.

Issues trying to install after you uninstalled
If you see issues when you install a new instance on a cluster that you already used for a Cloud Pak deployment, check if the IBM Automation Foundation dependencies are properly deleted.

For more information, see Uninstallation does not remove all components.

What to do next

The custom resource can be configured to enable and disable specific logging parameters, log levels, log formats, and where these logs are stored for the various capabilities. If you need more information about specific Cloud Pak capabilities, go to the relevant troubleshooting topics.