Troubleshooting
If your "starter" deployment is not working as you expect, check out the listed issues and try the mitigation or workarounds.
The troubleshooting information is divided into the following sections:
- BTS operator stuck because the OIDC Client secret is not deployed
- PostgreSQL cannot read the SSL key file
- Directory mount failure prevents pod readiness
- Cluster admin setup script issues
- Case init job failure
BTS operator stuck because the OIDC Client secret is not deployed
If you see the following message repeatedly in the Business Teams Service
(ibm-bts-operator-controller-manager
pod) operator logs.
INFO controllers.BusinessTeamsService
IAM Client secret ibm-bts-oidc-client-secret not yet found, retry after 5 seconds...
And the OIDC Client custom resource (CR) shows the status as False
.
status:
conditions:
- lastTransitionTime: '2023-10-17T18:09:02Z'
message: OIDC client registration create failed
reason: CreateClientFailed
status: 'False'
type: Ready
Then take the following actions:
- Restart the pods of the Identity Management (IM) service
(
platform-auth-service
).oc get pod oc delete pod platform-auth-service-xxxxxxxxxx-xxxxx
- Wait for the pods to be ready.
- Trigger a reconcile of the OIDC Client.
It can be triggered by adding a dummy label, for example, by running the following command.
oc patch client ibm-bts-oidc-client --patch '{"metadata": {"labels": {"dummyLabel": "dummyLabel"}}}' --type=merge
After a couple of minutes, the secret
ibm-bts-oidc-client-secret
appears and the Business Teams Service pods are created.
PostgreSQL cannot read the SSL key file
An intermittent PostgreSQL issue can occur reading the SSL key in any of the Cloud Pak for Business Automation pods. The following message informs you about the issue:
SEVERE: CWLPS1105: An error occurred while call the db initialize: org.postgresql.util.PSQLException: Could not read SSL key file /client-cert/tls.key
If you see the message in the logs, take the following actions:
- Delete the secret
icp4adeploy-pg-client-cert-secret
. - Delete the configMap
icp4adeploy-prereq-config
.
The CP4BA operator then re-creates the secret and the configMap, which normally resolves the issue.
Directory mount failure prevents pod readiness
If a pod stays in a CreateContainerError state, and the description of the problem includes similar text to the following message then remove the problematic mounted path.
Warning Failed 43m kubelet Error: container create failed: time="2021-03-03T07:26:47Z" level=warning msg="unable to terminate initProcess" error="exit status 1"
time="2021-03-03T07:26:47Z" level=error msg="container_linux.go:366: starting container process caused: process_linux.go:472: container init caused: rootfs_linux.go:60: mounting \"/var/lib/kubelet/pods/473b091d-acff-437b-b568-2383604dac01/volume-subpaths/config-volume/icp4adeploy-cmis-deploy/3\" to rootfs at **\"/var/lib/containers/storage/overlay/d011608f6df4bbfcc26c7d60568915caf7932124e61924b1a75802e6884ea060/merged/opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/ibm_oidc_rp.xml\" caused: not a directory"**
The problem occurs when a folder is generated instead of an XML file. A null folder is created to mount the file to the deployment and this raises the error.
You can remove a problematic folder from a deployment in two ways:
- If you can access the persistent volume, go to the mounted path and delete it. You can get the
path to the folder by running the following command.
oc describe pv $pv_name
- If you cannot access the persistent volume, edit the deployment by removing the failed mount.
- Edit the deployment by running the
oc edit deployment <deployment_name>
command. The following lines show an examplemountPath
:- mountPath: /opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/ibm_oidc_rp.xml name: config-volume subPath: ibm_oidc_rp.xml
- You can then access the pod when it is Running by using the
oc exec -it
command.oc exec -it icp4adeploy-cmis-deploy-5cd4774f78-mg6pw bash
- Delete the file with the
rm
command.rm /opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/ibm_oidc_rp.xml
- Edit the deployment by running the
When the folder is removed, you can wait for the operator to reconcile the change or add the removed mount path back manually to fix it.
Cluster admin setup script issues
During the execution of the cp4a-clusteradmin-setup.sh script, the CRD fails to deploy. If the following message is seen in the output, the user ('XYZ' in the example) does not have cluster-admin permission:
Start to create CRD, service account and role ...
Error from server (Forbidden): error when retrieving current configuration of: "/root/git/cert-kubernetes/descriptors/ibm_cp4a_crd.yaml":
customresourcedefinitions.apiextensions.k8s.io "icp4aclusters.icp4a.ibm.com" is forbidden:
User "XYZ" cannot get customresourcedefinitions.apiextensions.k8s.io at the cluster scope:
no RBAC policy matched
- Log out of the current session (non-admin).
- Log in to OCP with the OCP cluster admin user. Using the OpenShift
CLI:
oc login -u dbaadmin
Where
dbaadmin
is the cluster admin user.
Case init job failure
- If the Case init job restarts several times but fails, do the following steps.
- Check the Case init job pod logs by running a command similar to the following command:
oc logs --previous case init job pod
If the result has the following error, the Case init job is running into a Content Platform Engine timeout.
CPE_URL=https://bawps-cpe-svc:9443/wsi/FNCEWS40MTOM Certificate was added to keystore log4j:WARN No appenders could be found for logger (filenet_error.api.com.filenet.apiimpl.util.ConfigValueLookup). log4j:WARN Please initialize the log4j system properly. CPE URI :https://bawps-cpe-svc:9443/wsi/FNCEWS40MTOM [Perf Log] No interval found. Auditor disabled. P8DOMAIN starting setup DOS and TOS executing setupTOS java.lang.RuntimeException: The case management add-ons cannot be installed in Content Engine. The installation of the AddOn 20.0.0.1 Case Management Target Object Store Extensions into the object store TARGET failed. The installation report follows: <ImportErrors><ClassDefinitions><ReplicableClassDefinition><Id>6d18ffeb-7be8-41ac-9322-38a72743a10d</Id><Name>Health Condition</Name><ExceptionMessage>The database access failed with the following error: ErrorCode 0, Message 'addSync: caught Exception' ObjectStore: "TARGET", SQL: "SELECT security_id FROM OS2USER.TableDefinition WHERE (object_id = ?)"</ExceptionMessage><ExceptionCode>DB_ERROR</ExceptionCode><HRESULT>0x800710d9</HRESULT></ReplicableClassDefinition></ClassDefinitions></ImportErrors> The case management add-ons cannot be installed in Content Engine. The installation of the AddOn 20.0.0.1 Case Management Target Object Store Extensions into the object store TARGET failed. The installation report follows: <ImportErrors><ClassDefinitions><ReplicableClassDefinition><Id>6d18ffeb-7be8-41ac-9322-38a72743a10d</Id><Name>Health Condition</Name><ExceptionMessage>The database access failed with the following error: ErrorCode 0, Message 'addSync: caught Exception' ObjectStore: "TARGET", SQL: "SELECT security_id FROM OS2USER.TableDefinition WHERE (object_id = ?)"</ExceptionMessage><ExceptionCode>DB_ERROR</ExceptionCode><HRESULT>0x800710d9</HRESULT></ReplicableClassDefinition></ClassDefinitions></ImportErrors> java.lang.RuntimeException: The case management add-ons cannot be installed in Content Engine. The installation of the AddOn 20.0.0.1 Case Management Target Object Store Extensions into the object store TARGET failed. The installation report follows: <ImportErrors><ClassDefinitions><ReplicableClassDefinition><Id>6d18ffeb-7be8-41ac-9322-38a72743a10d</Id><Name>Health Condition</Name><ExceptionMessage>The database access failed with the following error: ErrorCode 0, Message 'addSync: caught Exception' ObjectStore: "TARGET", SQL: "SELECT security_id FROM OS2USER.TableDefinition WHERE (object_id = ?)"</ExceptionMessage><ExceptionCode>DB_ERROR</ExceptionCode><HRESULT>0x800710d9</HRESULT></ReplicableClassDefinition></ClassDefinitions></ImportErrors> at com.ibm.casemgmt.config.ContentEngineHelper.setUpCMTOS(ContentEngineHelper.java:1833) at com.ibm.ecm.icm.config.init.repository.ConfigureObjectStore.setupTOS(ConfigureObjectStore.java:99) at com.ibm.ecm.icm.config.init.test.ConfigureContentEngine.installAddons(ConfigureContentEngine.java:48) at com.ibm.ecm.icm.config.init.test.InitCaseManager.main(InitCaseManager.java:19)
The timeout might be caused by a network problem due to the database remote location or database access problems. Check the database logs for the source of the problem.
- Create a z.xml Liberty configuration file to overwrite the timeout, with
the following content:
<server> <transaction clientInactivityTimeout="1800s" propogatedOrBMTTranLifetimeTimeout="1800s" totalTranLifetimeTimeout="1800s"/> </server>
Then run:
podname=$(oc get pod | grep cpe-deploy | awk '{print $1}') echo $podname oc cp z.xml $podname:/opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides
- Restart the
icp4adeploy-cpe-deploy-nnn
pod. You might also need to restart theicp4adeploy-bawins1-baw-case-init-job-nnnnnn
pod. - If the Case init job stops generating new pods, delete the Case init job and let the operator re-create it.
- Check the Case init job pod logs by running a command similar to the following command: