Known limitations

The following are possible issues you may encounter when using IBM Cloud Pak® for Integration (with applicable solutions):

Issues when using the Platform UI

Symptom: You experience issues when you use the Platform UI, for example you see an Application not available message even though the associated pods are running.

Solution: Try restarting the pods.

  1. Find the pods that are in the same namespace as your Platform UI instance. You can view a list of pods in a namespace by running the following command:

    oc get pods -n <namespace>
  2. Use the oc delete command to delete the pods with the following names, where the asterisk (*) is a wildcard:

    • ibm-nginx-*

    • usermgmt-*

    • *-ibm-integration-platform-navigator-*

    Wait for the pods to be re-created.

If the problem remains, check whether another known limitation in this topic could be the cause.

OLM causes Operator failures on OCP 4.10.38 and later

Symptom: Users may observe operators in an unknown state with a log error stating a dependency is not satisfied and the catalog-operator-* pod in the openshift-operator-lifecycle-manager namespace has multiple restarts.

Cause: Operator Lifecycle Manager (OLM) in OpenShift 4.10.38 is limited to processing a single operator request at a time.

Solution: To resolve this, go to the openshift-operator-lifecycle-manager namespace and restart the catalog-operator-* pod, then the olm-operator-* pod. Alternatively this can be done by deleting all of the pods in the openshift-operator-lifecycle-manager namespace, which will trigger them to be recreated.

Unable to access Cloud Pak for Integration user interfaces

Symptom: When users try to access the Cloud Pak for Integration UI routes, they get the message, Application is not available.

Cause: Network traffic has not been allowed between the deployed instance of Platform Navigator and the Ingress Controller, as required. For more information about this policy, see https://docs.openshift.com/container-platform/4.6/networking/network_policy/multitenant-network-policy.html

Solution:

  • Using the CLI

    1. Log in to the Red Hat OpenShift cluster CLI as a Cluster Administrator.

    2. Confirm the endpointPublishingStrategy type of the IngressController:

      oc get --namespace openshift-ingress-operator ingresscontrollers/default \
       --output jsonpath='{.status.endpointPublishingStrategy.type}'; echo
    3. If the type value from the previous step is HostNetwork, ingress traffic must be enabled through the default namespace. Add the following label to the default namespace: network.openshift.io/policy-group=ingress

      oc label namespace default 'network.openshift.io/policy-group=ingress'
  • Using the Red Hat OpenShift web console

    1. Log in to the Red Hat OpenShift web console for the cluster as a Cluster Administrator.

    2. In the navigation pane, click Home > Search. Click to expand Project, select the openshift-ingress-operator namespace, then search for the resource IngressController.

    3. To confirm the value of spec.endpointPublishingStrategy, click to open the default IngressResource and view the YAML.

    4. If the value of spec.endpointPublishingStrategy.type is HostNetwork, ingress traffic must be enabled through the default namespace. In the left navigation pane, click Home > Search. Search for the resource namespace, select the default namespace, and click Edit Labels.

    5. Add the label network.openshift.io/policy-group=ingress, then click Save.

Expired leaf certificates not automatically refreshed

Symptom: User gets an Application Unavailable message when attempting to access the Platform UI or other capabilities in Cloud Pak for Integration.

In addition, the management-ingress pod logs show an error:

Error: exit status 1
2021/01/23 16:56:00 [emerg] 44#44: invalid number of arguments in "ssl_certificate" directive in /tmp/nginx-cfg123456789:123
nginx: [emerg] invalid number of arguments in "ssl_certificate" directive in /tmp/nginx-cfg123456789:123
nginx: configuration file /tmp/nginx-cfg123456789 test failed

Cause:: Self-signed CA certificate refresh does not automatically refresh leaf certificates, resulting in unavailable services.

Solution: For information on how to refresh these certificates, see Replacing default keys and certificates.

Cannot publish an unenforced API to any catalog in the UI (event endpoints only)

Symptoms: If you have Event Endpoint Management with only event endpoints enabled, you cannot publish an unenforced API to any catalog in the UI.

After creating an AsyncAPI document to describe an event source and selecting not to enforce access control to your API through an Event Gateway Service, there are no catalogs to select when publishing the draft API by clicking the Menu icon Menu icon for more options and clicking Publish.

Causes: When publishing a product to a catalog, the UI provides a list of catalogs and spaces to choose from. The list is filtered to only include catalogs and spaces that have registered gateways, which support enforced APIs in the product.

When an unenforced draft API with no x-ibm-configuration.gateway value is published, the draft product that is created for it has a default gateway type of datapower-api-gateway. This gateway type does not match any catalog in Event Endpoint Management because only the gateway type of event-gateway is available when only event endpoints are enabled.

Solution: Edit the newly created product to manually set the gatewayType as follows:

  1. In the API Manager UI, click the Develop button in the navigation bar.

  2. Click Products to list the draft products.

  3. Click the product created when publishing the API to open the product editor.

  4. In the product editor, click Source to show the product document source.

  5. Replace datapower-api-gateway with event-gateway in the gateways list.

  6. Click Save.

The product can now be published to a catalog with a registered Event Gateway.

Clients fail to connect to the Event Gateway Service

Symptoms: After signing up to use an enforced Kafka AsyncAPI in the Developer Portal, an application developer sees their client application failing to connect to the Event Gateway Service. The connection is closed by the gateway service after a 30 second delay.

The logs for the Event Gateway Service provide the following messages at the time the client application attempts to connect, and the client then reports the following error:

...
INFO  Events Gateway - [start:196] Kafka Server starting.
INFO  com.ibm.eem.Kafka Server - [startGatewayVerticle:65] Broker port: <BOOTSTRAP SERVER PORT> Broker hostname: <BOOTSTRAP SERVER>
...
ERROR com.ibm.eem.core.Request - [abortAndCloseConnection:187] Timed out after waiting 30000(ms) for a reply. address: __vertx.reply.<ID>, repliedAddress: ConnectionManager
...

Causes: The Event Gateway Service is attempting to connect to the Kafka cluster using an incorrect SSL configuration. For example, it is trying to connect to Kafka brokers without using TLS when TLS is required.

Solution: In the AsyncAPI editor, review and correct the Gateway > Invoke settings for this API to match the TLS settings of your Kafka cluster.

If the cluster is using TLS:

  • Set Security Protocol to SASL_SSL or SSL, depending on whether you are using SASL authentication and authorization or not.

  • Provide the Transport CA certificate for the Event Gateway Service to use to establish trust with the Kafka cluster. This is only required if the cluster is not using a certificate issued by a well-known authority.

If the Kafka cluster does not require TLS:

  • Set Security Protocol to PLAINTEXT or SASL_PLAINTEXT, depending on whether you are using SASL authentication and authorization or not.

After updating the API to have the correct settings, save the API, and republish the Products this API is included in. This will update the configuration used by the Event Gateway Service, and allow clients to connect successfully.

Error validating CRs against a new CRD schema in OLM when upgrading DataPower

Symptom: An error occurs when you attempt to uninstall and reinstall the DataPower Operator on OpenShift 4.8 or higher. When OLM is validating existing CRs against a new CRD schema, the conversion webhook is not found.

Solution: Reinstall the DataPower Operator using the following steps:

  1. Uninstall the failed DataPower Operator

  2. Edit the DataPowerService CRD in the cluster

  3. Remove the spec.conversion.webhook spec.

  4. Set spec.conversion.strategy to None.

  5. Apply changes to the DataPowerService CRD.

  6. Reinstall the DataPower Operator.

Unable to deploy API Connect using the Platform UI with Portal Site Name set

Symptom: If you attempt to deploy API Connect using the Platform UI with a Portal Site Name set, the create button is disabled.

Solution: To deploy API Connect with a Portal Site Name set, create your API Connect CR using the OpenShift Console or the OpenShift command-line interface (oc CLI).

Kubernetes garbage collection might cause pods to go into a CrashLoopBackOff state

Symptom: When the IBM Platform Navigator Operator is upgraded from 2020.4 to 2022.2, the pod enters a CrashLoopBackOff state.

Cause: The Kubernetes garbage collection might not remove the package lock, which can cause the pod to enter a CrashLoopBackOff state.

Solution: Delete the configMap associated with the operator. This releases the lock and allows the upgrade to proceed.

The Automation Assets UI can fail to load after a new installation or an upgrade of 2020.4 to 2022.2.

Symptom: After the new install or upgrade, the Automation Assets user interface is inaccessible. The page renders only the text of a version and platform (for example, 3.5.0.0 (20220429_2020) x86_64).

Cause: The Automation Assets user interface may be inaccessible if the Zen component does not process the required configMaps.

Solution: Run the following commands to restart the zen-watcher pod associated with this deployment. This triggers the configMap to be loaded.

oc get pods -n <namespace>
oc delete pod <zen-watcher-pod-name> -n <namespace>

OpenShift version support warning remains after upgrading Automation Assets

Symptom After Automation Assets has been upgraded from version 2020.4 (with OpenShift versions prior to 4.10) to version 2022.2 (with OpenShift version 4.10), the following warning remains in the status after the custom resource has reconciled:

An EUS instance (2020.4.1-6-eus) has been installed but the OCP version (4.7.45) does not qualify for the extended support duration. This instance is supported with a regular CD release.

Upgrading Automation Assets to version 2022.2 can result in an outage during upgrade

Symptom: Upgrading Automation Assets to version 2022.2 from a prior version can result in an outage of the Automation Assets for a period of time during the upgrade.

Automation Assets may show as CrashLoopBackOff for a period of time before successfully deploying

Symptom: When installing Automation Assets, the pod can enter the CrashLoopBackOff status during the installation process. After some time this can complete successfully. The following error is shown in the logs during this time:

Error starting server Error: connect ECONNREFUSED
...
statusText: 'ECONNREFUSED',
body: 'Response not received - no connection was made to the service.'

During upgrade, new API pod for Automation assets running in single replica mode gets stuck in Creating state

Symptom: Attempting to upgrade an Automation assets instance that is running in single replica mode leaves the new Automation assets API pod in a Creating state. The pod events show messages similar to:

Generated from attachdetach-controller
Multi-Attach error for volume "[PVC-NAME]" Volume is already used by pod(s) [POD-NAME]

Cause: The Automation assets API Deployment is using RollingUpgrade strategy, but the new pod cannot mount the existing PVC, because the original pod still has it mounted.

Solution: Locate the Automation assets API Deployment (which is called <INSTANCE-NAME>-ibm-integration-asset-repository-api), click the YAML tab, and find the strategy section:

  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 40%
      maxSurge: 100%

Replace it with:

  strategy:
    type: Recreate

Save your changes.

Event Streams shows the Kafka cluster is not ready during upgrade to 2022.2

Symptom: When upgrading Event Streams from operator version 2021.4 to 2022.2 (instance version 10.5.0 to 11.0.2) the Kafka cluster shows the error Kafka cluster is not ready.

Cause: The Kafka cluster can show the error if the connection to the Zookeeper component drops.

Solution: Edit the custom resource for the Event Streams instance, updating the instance spec.version field to 11.0.2.

Unable to delete a namespace due to the OperandRequest remaining

Symptom: An attempt to delete a namespace fails. The logs for the ODLM show "No permission to update OperandRequest".

Cause: The OperandRequest for the operand-deployment-lifecycle-manager (ODLM) might not be deleted, causing the namespace to remain in the 'Terminating' state.

Solution: Remove all finalizers from the OperandRequest by deleting finalizers: and any associated finalizer URLs from the OperandRequest, such as finalizer.request.ibm.com in the following snippet:

  apiVersion: operator.ibm.com/v1alpha1
  kind: OperandRequest
  metadata:
    finalizers: 
    - finalizer.request.ibm.com
    generation: 1
    labels:
      ibm-common-services.common-service/config: "true"
      ibm-common-services.common-service/registry: "true"
    name: common-service
    namespace: cp1

Installations on IBM Power or Z platforms can fail to complete due to Zen storage requirements

Cause: When installing the Platform UI on IBM Power or Z platforms, the Zen component requires additional storage capability provided by RWX in addition to the RWO default.

Solution: Follow the steps in https://www.ibm.com/docs/en/cloud-paks/cp-integration/2022.2?topic=ui-deploying-platform-rwo-storage in the documentation for Cloud Pak for Integration 2022.2.

Installation may result in multiple iam-config-job pods

Symptom: When Cloud Pak for Integration 2022.2 is installed, the Cloud Pak foundational services deploys an iam-config-job pod which may enter an error state with the following message in the pod logs:

error: unable to upgrade connection: container not found ("usermgmt-container")
Could not copy ca.crt to pod

This results in the pod being recreated, though the previous pod remains and has to be manually deleted.

Platform UI continuously changes from Ready back to Pending even when the UI is accessible

Symptom: When you install Platform UI 2022.2.1, you might notice that the status of the resource keeps reverting back to Pending even when the resource is accessible, or hours after creation completed successfully.

Cause: This can happen due to a known limitation with the ZenService object does not stop reconciling.

Solution: Apply this annotation to the Platform UI YAML "integration.ibm.com/reconcile-zen-service": "false", for example:

metadata:
  name: integration-quickstart
  namespace: integration
  annotations:
    "integration.ibm.com/reconcile-zen-service": "false"

With this the Cloud Pak operator will no longer reconcile the Zen Service object. Note that this means some features, like replica and TLS control, are also disabled.

Upgrade of an operator fails

Symptom: Upgrade of an operator fails with this error:

Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline

Cause: Uncertain, but the cause may be that the first time the operator bundle was pulled from the openshift-marketplace, the extract job failed—probably due to an issue accessing the remote image or similar issue—and corrupted the ConfigMap. The operator manifest was most likely also corrupted. Once that happens, any attempt to use the same job and ConfigMap to install another instance of the operator—for example in another namespace—will fail.

Solution:

  1. Find the corresponding job and ConfigMap (usually with the same name) in the openshift-marketplace and grep for the operator name or keyword in its contents:

    oc get job -n openshift-marketplace -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("<operator_name_keyword>")) | .metadata.name'
  2. Delete the job and corresponding ConfigMap (which has the same name as the job) found in the previous step:

    oc delete job <job_string> -n openshift-marketplace
    oc delete configmap <configmap_string> -n openshift-marketplace
  3. Try installing the operator again. If the installation is successful, the process is complete and you can end here. If you are still unable to install the operator, continue with the next step.

  4. Uninstall the failed operator installation using the procedure in Uninstalling the operators and catalog sources.

  5. Delete the install plan, subscription, and CSV that are in the same namespace as the operator:

    oc delete ip <operator_installplan_name> -n <user_namespace>
    oc delete sub <operator_subscription_name> -n <user_namespace>
    oc delete csv <operator_csv_name> -n <user_namespace>
  6. Retry installing the operator. The installation should complete successfully. If not, collect a new must-gather by using the script in Troubleshooting and note the operator InstallPlan error messages for IBM support.

A docker run command returns a permission denied error

Symptom: When running a docker run command, you get the following error:

docker run "/kube/config": open /kube/config: permission denied

Cause: Read-write permissions are needed for KUBECONFIG (~/.kube/config).

Solution: Run the following to give the user read-write permissions to the file:

chmod +rw ~/.kube/config

User is unable to generate an upgrade plan by using the CLI

Symptom: When following the instructions for "Generating an upgrade plan using the CLI" in Upgrading from 2020.4 or Upgrading from 2021.4 for an online (connected) installation, you are unable to generate the upgrade plan.

Cause: You may not have the correct configuration or permissions for Docker or Podman, or there is an error in the KUBECONFIG command.

Solution:

  • On your online (connected) cluster, run the oc admin command:

    oc adm must-gather --image=icr.io/cpopen/ibm-integration-upgrade-must-gather:v2 -- /usr/bin/gather --namespace cp4i --to 2022.2