Upgrading on OpenShift and Cloud Pak for Integration in an online environment
Perform an online (connected to the internet) upgrade of IBM® API Connect on OpenShift Container Platform or IBM Cloud Pak for Integration.
Before you begin
- If you are upgrading an air-gapped (disconnected from the internet) installation, see Upgrading on OpenShift in an air-gapped environment.
- The Gateway subsystem remains available during the upgrade of the Management, Portal, and Analytics subsystems.
Procedure
- Back up the current deployment in case the upgrade fails and you need to perform a roll back.
-
Ensure your IBM API
Connect
deployment is ready to upgrade:
- Your API Connect release (operand) supports a direct upgrade to this release.
- The DataPower operator version is correct for the currently deployed version of API Connect.
For information on the operator and operand version used with each API Connect release, see Operator, operand, and CASE versions.
For information on upgrade paths and supported versions of DataPower Gateway, see Upgrade considerations on OpenShift and Cloud Pak for Integration
-
Cloud Pak for Integration only: Update the operator channels.
-
If needed, upgrade the IBM Cloud Pak foundational services (previously called Common Services) operator to
channel
v3
. -
If needed, complete the following steps to upgrade the Platform Navigator operator (supports
the Automation Platform UI) to channel
v5
:- Upgrade the Platform Navigator operator to channel
v5
and wait for the Platform Navigator pods to restart. - Edit the yaml of the Platform Navigator instance and make the following changes:
- Add the license ID value for the license that you purchased; for example,
L-RJON-BXUPZ2
. - Add the RWX storage class; for example
rook-cephfs
- Change the operand version to
2021.4.1
.
For example:
apiVersion: integration.ibm.com/v1beta1 kind: PlatformNavigator metadata: name: cp4i-navigator namespace: apic spec: license: accept: true license: L-RJON-BXUPZ2 mqDashboard: true replicas: 3 storage: class: rook-cephfs version: 2021.4.1
- Add the license ID value for the license that you purchased; for example,
- Again, wait for the Platform Navigator pods to upgrade, and for the Platform Navigator to go to
a
Green
orReady
state.
- Upgrade the Platform Navigator operator to channel
-
If needed, upgrade the IBM Cloud Pak foundational services (previously called Common Services) operator to
channel
-
Update the operator channels for DataPower and API Connect.
- If needed, upgrade the DataPower operator channel to
v1.5
.Attention: Upgrade the DataPower operator before upgrading the IBM API Connect operator, to ensure that dependencies are satisfied.- If you previously chose automatic subscriptions, the operator version will upgrade automatically.
- If you previously chose manual subscriptions and the operator channel is already on the previous
version, OpenShift (OLM) will notify you that an upgrade is available. You must manually approve the
upgrade before proceeding.
Wait for the operator to update, for the pods to restart, and for a
Ready
status.
Known issue:When upgrading the DataPower operator, you might see messages appear in the log on the
datapower-operator
pod indicating that the pod is waiting for the lock to be removed:{"level":"info","ts":"2021-03-08T19:29:53.432Z","logger":"leader","msg":"Not the leader. Waiting."} {"level":"info","ts":"2021-03-08T19:29:57.971Z","logger":"leader","msg":"Leader pod has been deleted, waiting for garbage collection to remove the lock."}
If you see these messages, the DataPower operator cannot be upgraded until you resolve the problem as explained in the DataPower operator documentation.
-
If your IBM API
Connect
operator channel is not at v2.5, update it
now.
- If you previously chose automatic subscriptions, the Operator version will upgrade automatically.
- If you previously chose manual subscriptions and if the operator channel is already on the previous version, OpenShift (OLM) will notify you that an upgrade is available. You must manually approve the upgrade before proceeding.
Known issues:- The certificate manager was upgraded in Version 10.0.4.0 and you might encounter an upgrade error if the CRD for the new certificate manager is not found. For information on the errors messages that indicate this problem, and steps to resolve it, see Upgrade error when the CRD for the new certificate manager is not found in the Troubleshooting installation and upgrade on OpenShift topic.
- A null value for the
backrestStorageType
property in the pgcluster CR causes an error during the operator upgrade from versions earlier than 10.0.4.0. For information on the errors messages that indicate this problem, and steps to resolve it, see Operator upgrade fails with error from API Connect operator and Postgres operator in the Troubleshooting installation and upgrade on OpenShift topic.
- If needed, upgrade the DataPower operator channel to
- Check for certificate errors, and then recreate issuers and certificates if needed.
In 10.0.4, API Connect upgraded its certificate manager, which might cause some errors during the upgrade. Complete the following steps to check for certificate errors and correct them.
- Check the new API Connect operator's log for an error similar to the following
example:
{"level":"error","ts":1634966113.8442025,"logger":"controllers.AnalyticsCluster","msg":"Failed to set owner reference on certificate request","analyticscluster":"apic/<instance-name>-a7s","certificate":"<instance-name>-a7s-ca","error":"Object apic/<instance-name>-a7s-ca is already owned by another Certificate controller <instance-name>-a7s-ca",
To correct this problem, delete all issuers and certificates generated with certmanager.k8s.io/v1alpha1. For certificates used by route objects, you must also delete the route and secret objects.
- Run the following commands to delete the issuers and certificates that were generated
with certmanager.k8s.io/v1alpha1:
oc delete issuers.certmanager.k8s.io <instance-name>-self-signed <instance-name>-ingress-issuer <instance-name>-mgmt-ca <instance-name>-a7s-ca <instance-name>-ptl-ca
oc delete certs.certmanager.k8s.io <instance-name>-ingress-ca <instance-name>-mgmt-ca <instance-name>-ptl-ca <instance-name>-a7s-ca
In the examples,
<instance-name>
is the instance name of the top-levelapiconnectcluster
.When you delete the issuers and certificates, the new certificate manager generates replacements; this might take a few minutes.
- Verify that the new CA certs are refreshed and ready.
Run the following command to verify the certificates:
oc get certs <instance-name>-ingress-ca <instance-name>-mgmt-ca <instance-name>-ptl-ca <instance-name>-a7s-ca
The CA certs are ready when
AGE
is "new" and theREADY
column showsTrue
. - Delete the remaining old certificates, routes, and secret objects corresponding to
those routes.
Run the following commands:
oc get certs.certmanager.k8s.io | awk '/<instance-name>/{print $1}' | xargs oc delete certs.certmanager.k8s.io
oc delete certs.certmanager.k8s.io postgres-operator
oc get routes --no-headers -o custom-columns=":metadata.name" | grep ^<instance-name>- | xargs oc delete routes
Note: The following command deletes the secrets for the routes. Do not delete any other secrets.oc get routes --no-headers -o custom-columns=":metadata.name" | grep ^<instance-name>- | xargs oc delete secrets
- Verify that no old issuers or certificates from your top-level instance remain.
Run the following commands:
oc get issuers.certmanager.k8s.io | grep <instance-name>
oc get certs.certmanager.k8s.io | grep <instance-name>
Both commands should report that no resources were found.
- Check the new API Connect operator's log for an error similar to the following
example:
- Use
apicops
to validate the certificates.- Run the following command:
apicops upgrade:stale-certs -n <APIC_namespace>
- Delete any stale certificates that are managed by cert-manager. If a certificate failed the validation and it is managed by cert-manager, you can delete the stale certificate secret, and let cert-manager regenerate it. Run the following command:
kubectl delete secret <stale-secret> -n <APIC_namespace>
- Restart the corresponding so that it can pick up the new secret. To determine which pod to restart, see the following topics:
For information on the
apicops
tool, see The API Connect operations tool: apicops. - Run the following command:
- Run the following command to delete the Postgres pods, which refreshes the new
certificate:
oc get pod -n <namespace> --no-headers=true | grep postgres | grep -v backup | awk '{print $1}' | xargs oc delete pod -n <namespace>
- If needed, delete the
portal-www
,portal-db
andportal-nginx
pods to ensure they use the new secrets.If you have the Developer Portal deployed, then the
portal-www
,portal-db
, andportal-nginx
pods might require deleting to ensure that they pick up the newly generated secrets when restarted. If the pods are not showing as "ready" in a timely manner, then delete all the pods at the same time (this will cause down time).Run the following commands to get the name of the portal CR and delete the pods:
oc project <APIC_namespace>
oc get ptl
oc delete po -l app.kubernetes.io/instance=<name_of_portal_CR>
- If needed, renew the internal certificates for the analytics subsystem.
If you see analytics-storage-* or analytics-mq-* pods in the
CrashLoopBackOff
state, then renew the internal certificates for the analytics subsystem and force a restart of the pods.- Switch to the project/namespace where analytics is deployed and run the following
command to get the name of the analytics CR (AnalyticsCluster):
oc project <APIC_namespace>
oc get a7s
You need the CR name for the remaining steps.
- Renew the internal certificates (CA, client, and server) by running the following
commands:
oc get certificate <name_of_analytics_CR> -ca -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
oc get certificate <name_of_analytics_CR> -client -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
oc get certificate <name_of_analytics_CR> -server -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
- Force a restart of all analytics pods by running the following command:
oc delete po -l app.kubernetes.io/instance=<name_of_analytics_CR>
- Switch to the project/namespace where analytics is deployed and run the following
command to get the name of the analytics CR (AnalyticsCluster):
-
Ensure that the operators and operands are healthy before proceeding.
Operators: Verify that the OpenShift UI indicates that all operators are in the
Succeeded
state without any warnings.Operands:
- To verify whether operands are healthy, run the following command:
oc get apic
Make sure that the
apiconnectcluster
custom resource reportsREADY
. - Cloud Pak for Integration only: Wait until the IBM API
Connect capability shows
READY
(green check) in the Automation UI.Known issue: Status toggles betweenReady
andWarning
There is a known issue where the IBM API Connect operator toggles the overall status of the IBM API Connect deployment in Platform Navigator between
Ready
andWarning
. Look at the full list of conditions and whenReady
isTrue
, you can proceed to the next step even ifWarning
is also true.
- To verify whether operands are healthy, run the following command:
-
Update the operand version:
- OpenShift: Complete the following steps:
- Edit the top-level
apiconnectcluster
CR by running the following command:oc -n <APIC_namespace> edit apiconnectcluster
- Change the
version
setting to10.0.4.0-ifix3
. - In the
spec.gateway
section, delete thetemplate
override section, if it exists. You cannot perform an upgrade if the CR contains an override. - Save and close the CR.
- Edit the top-level
- Cloud Pak for Integration: Open the Automation Platform UI and complete the following steps:
- In the Automation Platform UI, click the Integration instances tab.
- Click
at the end of the current row, and then click Change version.
- Click Select a new channel or version, and then select
10.0.4.0-ifix3
in the Channel field.Selecting the new channel ensures that both DataPower Gateway and IBM API Connect are upgraded.
- Click Save to save your selections and start the upgrade.
In the instances table, the Status column for the instance displays the "Upgrading" message. The upgrade is complete when the Status is "Ready" and the Version displays the new version number.
Known issue: Webhook error for incorrect license.If you did not update the license ID in the CR, then when you change the operand version change, the following webhook error might display:
admission webhook "vapiconnectcluster.kb.io" denied the request: APIConnectCluster.apiconnect.ibm.com "<instance-name>" is invalid: spec.license.license: Invalid value: "L-RJON-BYGHM4": License L-RJON-BYGHM4 is invalid for the chosen version version. Please refer license document https://ibm.biz/apiclicenses
To resolve the error, visit https://ibm.biz/apiclicenses and select the appropriate License IDs for your deployment. Update the CR with the new license value as in the following example:
spec: license: accept: true use: production license: L-RJON-BZ5LJ5
Then, apply the updated CR.
- OpenShift: Complete the following steps:
- Resolve gateway peering issues by completing the following steps.
Due to the ingress issuer changes, the gateway pods must be scaled down and then back up. This process will cause 5 to 10 minutes of down time.
- Run the following command to verify that the management, portal, and analytics
subsystems are
Running
:oc get apic --all-namespaces
The response looks like the following example, with the gateway pods showing as
Pending
.NAME READY STATUS VERSION RECONCILED VERSION AGE analyticscluster.analytics.apiconnect.ibm.com/<instance-name>-a7s 8/8 Running 10.0.4.0 10.0.4.0-2221 7d1h NAME READY STATUS VERSION RECONCILED VERSION AGE apiconnectcluster.apiconnect.ibm.com/<instance-name> 6/7 Pending 10.0.4.0 10.0.3.0-ifix1-351 47h NAME PHASE READY SUMMARY VERSION AGE datapowerservice.datapower.ibm.com/<instance-name>-gw Pending True StatefulSet replicas ready: 1/1 10.0.4.0 46h NAME PHASE LAST EVENT WORK PENDING WORK IN-PROGRESS AGE datapowermonitor.datapower.ibm.com/<instance-name>-gw Pending false false 46h NAME READY STATUS VERSION RECONCILED VERSION AGE gatewaycluster.gateway.apiconnect.ibm.com/<instance-name>-gw 0/2 Pending 10.0.4.0 10.0.4.0-2221 46h NAME READY STATUS VERSION RECONCILED VERSION AGE managementcluster.management.apiconnect.ibm.com/<instance-name>-mgmt 16/16 Running 10.0.4.0 10.0.4.0-2221 47h NAME STATUS MESSAGE AGE managementdbupgrade.management.apiconnect.ibm.com/<instance-name>-mgmt-up-pxl77 Complete Fresh install is Complete (DB Schema/data are up-to-date) 46h managementdbupgrade.management.apiconnect.ibm.com/management-up-87fcz Complete Upgrade is Complete (DB Schema/data are up-to-date) 8h NAME READY STATUS VERSION RECONCILED VERSION AGE portalcluster.portal.apiconnect.ibm.com/<instance-name>-ptl 3/3 Running 10.0.4.0 10.0.4.0-2221 46h
- Scale down the gateway firmware containers by editing the top-level APIConnectCluster
CR and setting the replica count to
0
.OpenShift:
- Run the following command to edit the
CR:
oc edit apiconnectcluster <apic-cr-name>
- In the
spec.gateway
section, set thereplicaCount
setting to0
:... spec: gateway: replicaCount: 0 ...
If the setting is not already included in the CR, add it now as shown in the example.
- Save and exit the CR.
Cloud Pak for Integration:
- In the Automation Platform UI, edit the API Connect instance.
- Click Advanced.
- In the Gateway subsystem section, set the Advance Replica count field to
0
.
- Run the following command to edit the
CR:
- Wait for the gateway firmware pods to scale down and terminate.
Do not proceed to the next step until the pods are terminated.
- Reset the replica count to its original value.
If the replica count setting was not used previously, then:
- OpenShift: Delete the setting from the CR.
- Cloud Pak for Integration: Clear the Advance Replica count field.
- Run the following command to verify that the management, portal, and analytics
subsystems are
-
Ensure the upgrade is completed and the status of the top-level CR is
READY
:- OpenShift: Run the following command :
oc get apiconnectcluster
and make sure it reports
READY
. - CP4I: Make sure that the IBM API
Connect capability reports
READY
in the Automation Platform UI.
Known issue: Management subsystem remains inPending
state after upgradeThere is known problem where the management subsystem can remain in the
Pending
state after an upgrade if it was originally deployed in 10.0.1.0 with custom internal certificates. You can work around this problem as explained in Workaround: Management subsystem status remains Pending when upgrading with custom internal certificates.If the management upgrade fails, the
apiconnect-operator
performs a rollback procedure if possible. - OpenShift: Run the following command :
- Upgrade the OpenShift cluster to OpenShift 4.10.
API Connect 10.0.5 requires that your cluster be on OpenShift 4.10 before you begin that upgrade.
Upgrading OpenShift requires that you move to interim minor releases instead of upgrading directly from 4.6 to 4.10. For more information, see the Red Hat OpenShift documentation. In the "Documentation" banner, select the version of OpenShift that you want to upgrade to, and then expand the "Updating clusters" section in the navigation list.