Known limitations
Cannot publish an unenforced API to any catalog in the UI (event endpoints only)
Error validating CRs against a new CRD schema in OLM when upgrading DataPower
Unable to deploy API Connect using the Platform UI with Portal Site Name set
Kubernetes garbage collection might cause pods to go into a CrashLoopBackOff state
OpenShift version support warning remains after upgrading Automation Assets
Upgrading Automation Assets to version 2022.2 can result in an outage during upgrade
Automation Assets may show as CrashLoopBackOff for a period of time before successfully deploying
Event Streams shows the Kafka cluster is not ready during upgrade to 2022.2
Unable to delete a namespace due to the OperandRequest remaining
Installations on IBM Power or Z platforms can fail to complete due to Zen storage requirements
Platform UI continuously changes from Ready back to Pending even when the UI is accessible
OLM causes Operator failures on OCP 4.10.38 and later
Operator Lifecycle Manager (OLM) in OpenShift 4.10.38 is limited to processing a single operator request at a time. As a result, users may observe operators in an unknown state with a log error stating a dependency is not satisfied and the catalog-operator-*
pod in the openshift-operator-lifecycle-manager
namespace has multiple restarts.
Solution: To resolve this, go to the openshift-operator-lifecycle-manager
namespace and restart the catalog-operator-*
pod, then the olm-operator-*
pod. Alternatively this can be done by deleting all of the pods in the openshift-operator-lifecycle-manager
namespace, which will trigger them to be recreated.
Unable to access Cloud Pak for Integration user interfaces
Issue: When users try to access the Cloud Pak for Integration UI routes, they get the message, Application is not available
.
Cause: Network traffic has not been allowed between the deployed instance of Platform Navigator and the Ingress Controller, as required. For more information about this policy, see https://docs.openshift.com/container-platform/4.6/networking/network_policy/multitenant-network-policy.html
Solution:
Using the CLI
Log in to the Red Hat OpenShift cluster CLI as a Cluster Administrator.
Confirm the
endpointPublishingStrategy
type of theIngressController
:oc get --namespace openshift-ingress-operator ingresscontrollers/default \ --output jsonpath='{.status.endpointPublishingStrategy.type}'; echo
If the type value from the previous step is
HostNetwork
, ingress traffic must be enabled through the default namespace. Add the following label to the default namespace:network.openshift.io/policy-group=ingress
oc label namespace default 'network.openshift.io/policy-group=ingress'
Using the Red Hat OpenShift web console
Log in to the Red Hat OpenShift web console for the cluster as a Cluster Administrator.
In the navigation pane, click Home > Search. Click to expand Project, select the
openshift-ingress-operator
namespace, then search for the resourceIngressController
.To confirm the value of
spec.endpointPublishingStrategy
, click to open the defaultIngressResource
and view the YAML.If the value of
spec.endpointPublishingStrategy.type
isHostNetwork
, ingress traffic must be enabled through the default namespace. In the left navigation pane, click Home > Search. Search for the resource namespace, select the default namespace, and click Edit Labels.Add the label
network.openshift.io/policy-group=ingress
, then click Save.
Expired leaf certificates not automatically refreshed
Error: Self-signed CA certificate refresh does not automatically refresh leaf certificates, resulting in unavailable services. The user gets an Application Unavailable
message when attempting to access the Platform UI or other capabilities in Cloud Pak for Integration.
In addition, the management-ingress pod logs show an error:
Error: exit status 1
2021/01/23 16:56:00 [emerg] 44#44: invalid number of arguments in "ssl_certificate" directive in /tmp/nginx-cfg123456789:123
nginx: [emerg] invalid number of arguments in "ssl_certificate" directive in /tmp/nginx-cfg123456789:123
nginx: configuration file /tmp/nginx-cfg123456789 test failed
Solution: For information on how to refresh these certificates, see Replacing default keys and certificates.
Cannot publish an unenforced API to any catalog in the UI (event endpoints only)
Symptoms: If you have Event Endpoint Management with only event endpoints enabled, you cannot publish an unenforced API to any catalog in the UI.
After creating an AsyncAPI document to describe an event source and selecting not to enforce access control to your API through an Event Gateway Service, there are no catalogs to select when publishing the draft API by clicking the Menu icon for more options and clicking Publish.
Causes: When publishing a product to a catalog, the UI provides a list of catalogs and spaces to choose from. The list is filtered to only include catalogs and spaces that have registered gateways, which support enforced APIs in the product.
When an unenforced draft API with no x-ibm-configuration.gateway
value is published, the draft product that is created for it has a default gateway type of datapower-api-gateway
. This gateway type does not match any catalog in Event Endpoint Management because only the gateway type of event-gateway
is available when only event endpoints are enabled.
Solution: Edit the newly created product to manually set the gatewayType
as follows:
In the API Manager UI, click the Develop button in the navigation bar.
Click Products to list the draft products.
Click the product created when publishing the API to open the product editor.
In the product editor, click Source to show the product document source.
Replace
datapower-api-gateway
withevent-gateway
in the gateways list.Click Save.
The product can now be published to a catalog with a registered Event Gateway.
Clients fail to connect to the Event Gateway Service
Symptoms: After signing up to use an enforced Kafka AsyncAPI in the Developer Portal, an application developer sees their client application failing to connect to the Event Gateway Service. The connection is closed by the gateway service after a 30 second delay.
The logs for the Event Gateway Service provide the following messages at the time the client application attempts to connect, and the client then reports the following error:
...
INFO Events Gateway - [start:196] Kafka Server starting.
INFO com.ibm.eem.Kafka Server - [startGatewayVerticle:65] Broker port: <BOOTSTRAP SERVER PORT> Broker hostname: <BOOTSTRAP SERVER>
...
ERROR com.ibm.eem.core.Request - [abortAndCloseConnection:187] Timed out after waiting 30000(ms) for a reply. address: __vertx.reply.<ID>, repliedAddress: ConnectionManager
...
Causes: The Event Gateway Service is attempting to connect to the Kafka cluster using an incorrect SSL configuration. For example, it is trying to connect to Kafka brokers without using TLS when TLS is required.
Solution: In the AsyncAPI editor, review and correct the Gateway > Invoke settings for this API to match the TLS settings of your Kafka cluster.
If the cluster is using TLS:
Set Security Protocol to
SASL_SSL
orSSL
, depending on whether you are using SASL authentication and authorization or not.Provide the Transport CA certificate for the Event Gateway Service to use to establish trust with the Kafka cluster. This is only required if the cluster is not using a certificate issued by a well-known authority.
If the Kafka cluster does not require TLS:
Set Security Protocol to
PLAINTEXT
orSASL_PLAINTEXT
, depending on whether you are using SASL authentication and authorization or not.
After updating the API to have the correct settings, save the API, and republish the Products this API is included in. This will update the configuration used by the Event Gateway Service, and allow clients to connect successfully.
Error validating CRs against a new CRD schema in OLM when upgrading DataPower
Issue: An error occurs when you attempt to uninstall and reinstall the DataPower Operator on OpenShift 4.8 or higher. When OLM is validating existing CRs against a new CRD schema, the conversion webhook is not found.
Solution: To remediate this, take the following steps:
Uninstall the failed DataPower Operator
Edit the
DataPowerService
CRD in the clusterRemove the
spec.conversion.webhook
spec.Set
spec.conversion.strategy
to None.Apply changes to the
DataPowerService
CRD.Reinstall the DataPower Operator.
Unable to deploy API Connect using the Platform UI with Portal Site Name set
Issue: If you attempt to deploy API Connect using the Platform UI with a Portal Site Name set, the create button is disabled.
Solution: To deploy API Connect with a Portal Site Name set, create your API Connect CR using the OpenShift Console or the OpenShift command-line interface (oc CLI).
Kubernetes garbage collection might cause pods to go into a CrashLoopBackOff state
When the IBM Platform Navigator Operator is upgraded from 2020.4 to 2022.2, the Kubernetes garbage collection might not remove the package lock, which can cause the pod to enter a CrashLoopBackOff
state.
To remediate this, delete the configMap associated with the operator. This releases the lock and allows the upgrade to proceed.
The Automation Assets UI can fail to load after a new installation or an upgrade of 2020.4 to 2022.2.
After the new install or upgrade, the Automation Assets user interface may be inaccessible if the Zen component does not process the required configMaps. The page renders only the text of a version and platform (for example, 3.5.0.0 (20220429_2020) x86_64).
To correct this, run the following commands to restart the zen-watcher pod associated with this deployment. This triggers the configMap to be loaded.
oc get pods -n <namespace>
oc delete pod <zen-watcher-pod-name> -n <namespace>
OpenShift version support warning remains after upgrading Automation Assets
After Automation Assets has been upgraded from version 2020.4 (with OpenShift versions prior to 4.10) to version 2022.2 (with OpenShift version 4.10), the following warning remains in the status after the custom resource has reconciled:
An EUS operand (2020.4.1-6-eus) has been installed but the OCP version (4.7.45) does not qualify for the extended support duration. This operand is supported with a regular CD release.
Upgrading Automation Assets to version 2022.2 can result in an outage during upgrade
Upgrading Automation Assets to version 2022.2 from a prior version can result in an outage of the Automation Assets for a period of time during the upgrade.
Automation Assets may show as CrashLoopBackOff for a period of time before successfully deploying
When installing Automation Assets, the pod can enter the CrashLoopBackOff status during the installation process. After some time this can complete successfully. The following error is shown in the logs during this time:
Error starting server Error: connect ECONNREFUSED
...
statusText: 'ECONNREFUSED',
body: 'Response not received - no connection was made to the service.'
Event Streams shows the Kafka cluster is not ready during upgrade to 2022.2
When upgrading Event Streams from operator version 2021.4 to 2022.2 (operand version 10.5.0 to 11.0.2) the Kafka cluster can show the error Kafka cluster is not ready
due to the connection to the Zookeeper component dropping.
To remediate this, edit the custom resource for the Event Streams instance, updating the operand spec.version field to 11.0.2
.
Unable to delete a namespace due to the OperandRequest remaining
An attempt to delete a namespace might fail: the OperandRequest for the operand-deployment-lifecycle-manager (ODLM) might not be deleted, causing the namespace to remain in the 'Terminating' state. The logs for the ODLM show "No permission to update OperandRequest".
To remediate this, remove the finalizers from the OperandRequest.
Installations on IBM Power or Z platforms can fail to complete due to Zen storage requirements
When installing the Platform UI on IBM Power or Z platforms, the Zen component requires additional storage capability provided by RWX in addition to the RWO default. To remediate this, follow the steps in https://www.ibm.com/docs/en/cloud-paks/cp-integration/2022.2?topic=ui-deploying-platform-rwo-storage in the documentation for Cloud Pak for Integration 2022.2.
Installation may result in multiple iam-config-job pods
When Cloud Pak for Integration 2022.2 is installed, the Cloud Pak foundational services deploys an iam-config-job
pod which may enter an error state with the following message in the pod logs:
error: unable to upgrade connection: container not found ("usermgmt-container")
Could not copy ca.crt to pod
This results in the pod being recreated, though the previous pod remains and has to be manually deleted.
Platform UI continuously changes from Ready back to Pending even when the UI is accessible
When you install Platform UI 2022.2.1, you might notice that the status of the resource keeps reverting back to Pending even when the resource is accessible, or hours after creation completed successfully. This can happen due to a known limitation with the ZenService object does not stop reconciling. To remediate this, apply this annotation to the Platform UI YAML "integration.ibm.com/reconcile-zen-service": "false"
, for example:
metadata:
name: integration-quickstart
namespace: integration
annotations:
"integration.ibm.com/reconcile-zen-service": "false"
With this the Cloud Pak operator will no longer reconcile the Zen Service object. Note that this means some features, like replica and TLS control, are also disabled.
Upgrade of an operator fails
Issue: Upgrade of an operator fails with this error:
Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline
Cause: Uncertain, but the cause may be that the first time the operator bundle was pulled from the openshift-marketplace
, the extract job failed—probably due to an issue accessing the remote image or similar issue—and corrupted the ConfigMap. The operator manifest was most likely also corrupted. Once that happens, any attempt to use the same job and ConfigMap to install another instance of the operator—for example in another namespace—will fail.
Solution:
Find the corresponding job and ConfigMap (usually with the same name) in the
openshift-marketplace
and grep for the operator name or keyword in its contents:oc get job -n openshift-marketplace -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("<operator_name_keyword>")) | .metadata.name'
Delete the job and corresponding ConfigMap (which has the same name as the job) found in the previous step:
oc delete job <job_string> -n openshift-marketplace oc delete configmap <configmap_string> -n openshift-marketplace
Try installing the operator again. If the installation is successful, the process is complete and you can end here. If you are still unable to install the operator, continue with the next step.
Uninstall the failed operator installation using the procedure in Uninstalling the operators and catalog sources.
Delete the install plan, subscription, and CSV that are in the same namespace as the operator:
oc delete ip <operator_installplan_name> -n <user_namespace> oc delete sub <operator_subscription_name> -n <user_namespace> oc delete csv <operator_csv_name> -n <user_namespace>
Retry installing the operator. The installation should complete successfully. If not, collect a new must-gather by using the script in Troubleshooting and note the operator InstallPlan error messages for IBM support.