Troubleshooting installation issues

You might encounter an issue during WebSphere Automation installation. Learn how to fix the most common installation issues.

Update to WebSphere Automation 1.8.0 fails due to missing operator in catalog

When you update to WebSphere Automation 1.8.0, the WebSphere Automation operator fails with the following error message.

ResolutionFailed
True
-
ConstraintsNotSatisfiable
constraints not satisfiable: no operators found in channel stable-v1.22 of package cloud-native-postgresql in the catalog referenced by subscription cloud-native-postgresql, subscription cloud-native-postgresql exists

The cloud-native-postgresql operator that is required for the update is not available in the specified channel within the catalog, preventing the update process from completing successfully.

To ensure that the required operator versions are available in the catalog and compatible with the update process. Complete the following steps.

Attempt the update process as described in Updating an installation in an air gap environment.
This initial attempt helps identify whether the failure occurs due to the outdated operator catalog.
In the Updating an installation in an air gap environment, when step 7 fails, proceed to step 8.
Step 8 in Updating an installation in an air gap environment is to Update the prerequisites for WebSphere Automation 1.8.0.

Update the prerequisites for WebSphere Automation 1.8.0. Running the prerequisite update script ensures that the cloud-native-postgresql operator catalog is updated to the latest stable version.

Update to WebSphere Automation 1.8.0 fails with wsa-mongo pod stuck in init state

If you did not run the update-datastore.sh script before updating to WebSphere Automation 1.8.0, you might see a wsa-mongo pod stuck in the init state.

wsa-mongo-0                                                 1/1     Running     0             43h
wsa-mongo-1                                                 1/1     Running     0             43h
wsa-mongo-2                                                 0/1     Init:1/2    0             13m

First, find the cause of the problem by analyzing the logs of the bootstrap container of the pod that is stuck.

oc exec -it wsa-mongo-2 -c bootstrap /bin/bash

Verify that the error is similar to the following:

Invalid featureCompatibilityVersion document in admin.system.version:{ _id: \"featureCompatibilityVersion\", version: \"4.4\" }.

If so, do not run the update-datastore.sh to rectify the problem. The script fails with the error Datastore is not ready. Exiting. Instead, use the following instructions:

Get the mongodb admin credentials.

WSA_INSTANCE_NAMESPACE=<instance namespace>
POD_NAME=<any running mongo pod>
credentials_file='/work-dir/credentials.txt'
admin_user=$(oc exec -n "$WSA_INSTANCE_NAMESPACE" "$POD_NAME" -- head -n 1 "$credentials_file")
admin_password=$(oc exec -n "$WSA_INSTANCE_NAMESPACE" "$POD_NAME" -- tail -n 1 "$credentials_file")
admin_args=(-u "$admin_user" -p "$admin_password")

Find the primary pod.

tls_args=(--tls --tlsCAFile /data/configdb/tls.crt --tlsCertificateKeyFile /work-dir/mongo.pem)
oc exec  -n "$WSA_INSTANCE_NAMESPACE" "$POD_NAME" -- mongo --host localhost ${tls_args[@]} ${admin_args[@]} --eval "db.isMaster()"

Verify that the output of the previous step shows ismaster:true if the POD_NAME is primary. If it shows as false, check the value under primary:, which shows the full primary pod name. For example:
```
primary:
      'wsa-mongo-1.wsa-mongo.websphere-automation.svc.cluster.local:27017'
```
Set the primary pod variable. For the example in the previous step:
```
PRIMARY_POD=wsa-mongo-1
```

Set the FCV to 5.0.

oc exec  -n "$WSA_INSTANCE_NAMESPACE" "$PRIMARY_POD" -- mongo --host localhost ${tls_args[@]} ${admin_args[@]} --eval "db.adminCommand({setFeatureCompatibilityVersion: "5.0"}).ok"

This command returns 1 if the change is successful.

Scale the dataStore replicas to 0.

WSA_AUTOMATION_CR=$(oc get websphereautomation -o name -n $WSA_INSTANCE_NAMESPACE | cut -d/ -f2) 
oc patch websphereautomation $WSA_AUTOMATION_CR -p '{"spec":{"dataStore":{"replicas":0}}}' --type merge -n $WSA_INSTANCE_NAMESPACE

Find and delete the persistent volume claim (PVC) bound to the pod that is stuck in the init state (in this case, wsa-mongo-2).
```
oc get pvc
oc delete pvc <pvc_bound_to_pod_in_init_state>
```

Scale the dataStore replicas back to 3.

oc patch websphereautomation $WSA_AUTOMATION_CR -p '{"spec":{"dataStore":{"replicas":3}}}' --type merge -n $WSA_INSTANCE_NAMESPACE

Update to WebSphere Automation 1.6.4 or 1.6.3 fails due to undefined storage classes

During an update from WebSphere Automation 1.6.2 to 1.6.3 or 1.6.4, the update fails with the following message:

Both storageClass and fileStorageClass are not defined
Error found in prep_playrun.yaml from zen-ansible-utils

The problem is in the ZenService custom resource (CR). To view the ZenService CR, run the following command:

oc get zenservice iaf-zen-cpdservice -o yaml

To work around the problem, edit the ZenService custom resource:

oc edit zenservice iaf-zen-cpdservice

Add the following lines to the spec section:

spec:
  blockStorageClass: RWO_STORAGE_CLASS
  fileStorageClass: RWX_STORAGE_CLASS

Replace RWO_STORAGE_CLASS with a storage class that supports the ReadWriteOnce storage volume type, and replace RWX_STORAGE_CLASS with a storage class that supports the ReadWriteMany storage volume type.

Installation stalls due to incorrectly configured pull secret

If the installation starts the process of setting up the data store and does not proceed, the cause of the problem could be an incorrectly configured entitlement key secret or global pull secret. In this instance, the symptoms are as follows:

In the Red Hat OpenShift console, navigate to Operators > Installed Operators > WebSphere Automation.
Click the WebSphereAutomation tab and select the instance that you are installing.
Scroll to the end of the Details page to view the Conditions list.
If the DataStoreReady condition is False, it is possible that the pull secret is not defined.

To confirm that the pull secret is not defined, check the WebSphere Automation Mongo pod:

Navigate to Workloads > Pods.
Filter the pods to the instance namespace.
Search for wsa-mongo.
Check the status of the wsa-mongo pod.
If the status is Init:ImagePullBackOff, it is likely that the pull secret is not set properly in the cluster.

For more information, see Creating the entitlement key secret or updating the global pull secret.

To resolve the problem:

Correctly configure the entitlement key secret or global pull secret.
Delete any pods in the WebSphere Automation instance namespace that have the Init:ImagePullBackOff status.
Check the IBM foundational services namespace for any pods that have the Init:ImagePullBackOff status. If so, delete them.
New pods are automatically created to replace the deleted ones.
Confirm that the pods achieve a state of Completed or Running.

Installation stalls during an update

When you update WebSphere Automation, the installation requests more storage with the ReadWriteMany (RWX) access mode at startup. If the default storage class does not support RWX mode, the update process might stall because WebSphere Automation waits for the storage to be provisioned indefinitely.

To complete the update process, update the WebSphereSecure custom resource (CR) to use a storage class that supports RWX access mode. Then, delete the existing persistent volume claim (PVC) so the operator creates a new PVC with the specified storage class. Complete the following steps.

Update the WebSphereSecure CR with the ibmc-file-gold-gid storage class, which supports the RWX access mode, as shown in the following example.
```
fileStore:
  storage:
    class: ibmc-file-gold-gid
```
To bypass the resulting Storage class is immutable error, delete the existing wsa-secure-data PVC by running the following command.
```
oc delete pvc wsa-secure-data
```
After some time, the operator re-creates the PVC with the specified storage class that supports RWX access mode and the update completes.

Operator pods crash during all-namespaces installation

Operator pods might crash during installation in all-namespaces, or cluster-wide installation. IBM® Cloud Pak foundational services pods crash with the CrashLoopBackOff error and never become stable or Ready.

The Operator pods crash because your environment lacks adequate resources to complete the installation successfully.

To complete the installation successfully, update the following subscriptions for the operators to modify the default CPU and memory allocated.

ibm-automation-v1.0-ibm-operator-catalog-openshift-marketplace
ibm-automation-eventprocessing-v1.0-ibm-operator-catalog-openshift-marketplace
ibm-automation-core-v1.0-ibm-operator-catalog-openshift-marketplace

You can use the spec.config section of the Subscription to modify the default cpu and memory allocated. spec.config is not in the Subscription by default. You must add it to the YAML. The following YAML snippet shows the resources.limits and resources.requests that you can modify to create sufficient resources for the installation to complete successfully.

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ibm-automation-core-v1.0-ibm-operator-catalog-openshift-marketplace
  namespace: openshift-operators
  labels:
    operators.coreos.com/ibm-automation-core.openshift-operators: ''
spec:
  channel: v1.0
  config:
    resources:
      limits:
        cpu: 200m
        memory: 300Mi
      requests:
        cpu: 150m
        memory: 200Mi
  installPlanApproval: Automatic
  name: ibm-automation-core
  source: ibm-operator-catalog
  sourceNamespace: openshift-marketplace
  startingCSV: ibm-automation-core.v1.0.2

For more information, see Operator pods crashing during installation in the IBM Cloud Paks documentation.