Troubleshooting installation issues
You might encounter an issue during WebSphere Automation installation. Learn how to fix the most common installation issues.
- Update to WebSphere Automation 1.8.0 fails due to missing operator in catalog
- Update to WebSphere Automation 1.8.0 fails with wsa-mongo pod stuck in init state
- Update to WebSphere Automation 1.6.4 or 1.6.3 fails due to undefined storage classes
- Installation stalls due to incorrectly configured pull secret
- Installation stalls during an update
- Operator pods crash during all-namespaces installation
Update to WebSphere Automation 1.8.0 fails due to missing operator in catalog
When you update to WebSphere Automation 1.8.0, the WebSphere Automation operator fails with the following error message.
ResolutionFailed
True
-
ConstraintsNotSatisfiable
constraints not satisfiable: no operators found in channel stable-v1.22 of package cloud-native-postgresql in the catalog referenced by subscription cloud-native-postgresql, subscription cloud-native-postgresql exists
The cloud-native-postgresql operator that is required for the update is not
available in the specified channel within the catalog, preventing the update process from completing
successfully.
To ensure that the required operator versions are available in the catalog and compatible with the update process. Complete the following steps.
- Attempt the update process as described in Updating an installation in an air gap environment.
This initial attempt helps identify whether the failure occurs due to the outdated operator catalog.
- In the Updating an installation in an air gap environment, when
step 7fails, proceed tostep 8. -
Step 8in Updating an installation in an air gap environment is to Update the prerequisites for WebSphere Automation 1.8.0.Update the prerequisites for WebSphere Automation 1.8.0. Running the prerequisite update script ensures that the
cloud-native-postgresqloperator catalog is updated to the latest stable version.
Update to WebSphere Automation 1.8.0 fails with wsa-mongo pod stuck in init state
If you did not run the update-datastore.sh script before updating to WebSphere Automation 1.8.0, you might see a wsa-mongo pod
stuck in the init state.
wsa-mongo-0 1/1 Running 0 43h
wsa-mongo-1 1/1 Running 0 43h
wsa-mongo-2 0/1 Init:1/2 0 13m
First, find the cause of the problem by analyzing the logs of the bootstrap
container of the pod that is stuck.
oc exec -it wsa-mongo-2 -c bootstrap /bin/bash
Verify that the error is similar to the following:
Invalid featureCompatibilityVersion document in admin.system.version:{ _id: \"featureCompatibilityVersion\", version: \"4.4\" }.
If so, do not run the update-datastore.sh to rectify the problem. The script
fails with the error Datastore is not ready. Exiting. Instead, use the following
instructions:
- Get the mongodb admin credentials.
WSA_INSTANCE_NAMESPACE=<instance namespace> POD_NAME=<any running mongo pod> credentials_file='/work-dir/credentials.txt' admin_user=$(oc exec -n "$WSA_INSTANCE_NAMESPACE" "$POD_NAME" -- head -n 1 "$credentials_file") admin_password=$(oc exec -n "$WSA_INSTANCE_NAMESPACE" "$POD_NAME" -- tail -n 1 "$credentials_file") admin_args=(-u "$admin_user" -p "$admin_password") - Find the primary
pod.
tls_args=(--tls --tlsCAFile /data/configdb/tls.crt --tlsCertificateKeyFile /work-dir/mongo.pem) oc exec -n "$WSA_INSTANCE_NAMESPACE" "$POD_NAME" -- mongo --host localhost ${tls_args[@]} ${admin_args[@]} --eval "db.isMaster()" - Verify that the output of the previous step shows
ismaster:trueif the POD_NAME is primary. If it shows asfalse, check the value underprimary:, which shows the full primary pod name. For example:primary: 'wsa-mongo-1.wsa-mongo.websphere-automation.svc.cluster.local:27017' - Set the primary pod variable. For the example in the previous
step:
PRIMARY_POD=wsa-mongo-1 - Set the
FCVto5.0.
This command returns 1 if the change is successful.oc exec -n "$WSA_INSTANCE_NAMESPACE" "$PRIMARY_POD" -- mongo --host localhost ${tls_args[@]} ${admin_args[@]} --eval "db.adminCommand({setFeatureCompatibilityVersion: "5.0"}).ok" - Scale the
dataStorereplicas to0.WSA_AUTOMATION_CR=$(oc get websphereautomation -o name -n $WSA_INSTANCE_NAMESPACE | cut -d/ -f2) oc patch websphereautomation $WSA_AUTOMATION_CR -p '{"spec":{"dataStore":{"replicas":0}}}' --type merge -n $WSA_INSTANCE_NAMESPACE - Find and delete the persistent volume claim (PVC) bound to the pod that is stuck in the
initstate (in this case,wsa-mongo-2).oc get pvc oc delete pvc <pvc_bound_to_pod_in_init_state> - Scale the
dataStorereplicas back to 3.oc patch websphereautomation $WSA_AUTOMATION_CR -p '{"spec":{"dataStore":{"replicas":3}}}' --type merge -n $WSA_INSTANCE_NAMESPACE
Update to WebSphere Automation 1.6.4 or 1.6.3 fails due to undefined storage classes
During an update from WebSphere Automation 1.6.2 to 1.6.3 or 1.6.4, the update fails with the following message:
Both storageClass and fileStorageClass are not defined Error found in prep_playrun.yaml from zen-ansible-utils
The problem is in the ZenService custom resource (CR). To view the ZenService CR, run the following command:
oc get zenservice iaf-zen-cpdservice -o yaml
To work around the problem, edit the ZenService custom resource:
oc edit zenservice iaf-zen-cpdservice
Add the following lines to the spec section:
spec:
blockStorageClass: RWO_STORAGE_CLASS
fileStorageClass: RWX_STORAGE_CLASS
Replace RWO_STORAGE_CLASS with a storage class that supports the
ReadWriteOnce storage volume type, and replace RWX_STORAGE_CLASS
with a storage class that supports the ReadWriteMany storage volume type.
Installation stalls due to incorrectly configured pull secret
If the installation starts the process of setting up the data store and does not proceed, the cause of the problem could be an incorrectly configured entitlement key secret or global pull secret. In this instance, the symptoms are as follows:
- In the Red Hat OpenShift console, navigate to .
- Click the WebSphereAutomation tab and select the instance that you are installing.
- Scroll to the end of the Details page to view the Conditions list.
- If the
DataStoreReadycondition isFalse, it is possible that the pull secret is not defined.
To confirm that the pull secret is not defined, check the WebSphere Automation Mongo pod:
- Navigate to .
- Filter the pods to the instance namespace.
- Search for
wsa-mongo. - Check the status of the
wsa-mongopod.If the status is
Init:ImagePullBackOff, it is likely that the pull secret is not set properly in the cluster.
For more information, see Creating the entitlement key secret or updating the global pull secret.
To resolve the problem:
- Correctly configure the entitlement key secret or global pull secret.
- Delete any pods in the WebSphere Automation instance namespace that
have the
Init:ImagePullBackOffstatus. - Check the IBM foundational services namespace for any pods that have the
Init:ImagePullBackOffstatus. If so, delete them.New pods are automatically created to replace the deleted ones.
- Confirm that the pods achieve a state of
CompletedorRunning.
Installation stalls during an update
When you update WebSphere Automation, the installation requests more
storage with the ReadWriteMany (RWX) access mode at startup. If the default storage
class does not support RWX mode, the update process might stall because WebSphere Automation waits for the storage to be provisioned indefinitely.
To complete the update process, update the WebSphereSecure custom resource (CR) to use a storage class that supports RWX access mode. Then, delete the existing persistent volume claim (PVC) so the operator creates a new PVC with the specified storage class. Complete the following steps.
- Update the WebSphereSecure CR with the
ibmc-file-gold-gidstorage class, which supports the RWX access mode, as shown in the following example.fileStore: storage: class: ibmc-file-gold-gid - To bypass the resulting Storage class is immutable error, delete the existing
wsa-secure-dataPVC by running the following command.oc delete pvc wsa-secure-dataAfter some time, the operator re-creates the PVC with the specified storage class that supports RWX access mode and the update completes.
Operator pods crash during all-namespaces installation
Operator pods might crash during installation in all-namespaces, or cluster-wide
installation. IBM® Cloud Pak foundational
services pods crash with the
CrashLoopBackOff error and never become stable or Ready.
The Operator pods crash because your environment lacks adequate resources to complete the installation successfully.
To complete the installation successfully, update the following subscriptions for the operators to modify the default CPU and memory allocated.
ibm-automation-v1.0-ibm-operator-catalog-openshift-marketplaceibm-automation-eventprocessing-v1.0-ibm-operator-catalog-openshift-marketplaceibm-automation-core-v1.0-ibm-operator-catalog-openshift-marketplace
You can use the spec.config section of the Subscription to
modify the default cpu and memory allocated.
spec.config is not in the Subscription by default. You must add it
to the YAML. The following YAML snippet shows the resources.limits and
resources.requests that you can modify to create sufficient resources for the
installation to complete successfully.
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: ibm-automation-core-v1.0-ibm-operator-catalog-openshift-marketplace
namespace: openshift-operators
labels:
operators.coreos.com/ibm-automation-core.openshift-operators: ''
spec:
channel: v1.0
config:
resources:
limits:
cpu: 200m
memory: 300Mi
requests:
cpu: 150m
memory: 200Mi
installPlanApproval: Automatic
name: ibm-automation-core
source: ibm-operator-catalog
sourceNamespace: openshift-marketplace
startingCSV: ibm-automation-core.v1.0.2
For more information, see Operator pods crashing during installation in the IBM Cloud Paks documentation.