Upgrading on OpenShift in an online environment
Perform an online (connected to the internet) upgrade of IBM® API Connect on Red Hat OpenShift Container Platform using either the top-level
APIConnectCluster
CR or individual subsystem CRs.
Before you begin
- If you are upgrading an air-gapped (disconnected from the internet) installation, see Air-gapped upgrade.
- The Gateway subsystem remains available during the upgrade of the Management, Portal, and Analytics subsystems.
- If you are upgrading to a version of API Connect that supports a newer version of Red Hat OpenShift, complete the API Connect upgrade before upgrading Red Hat OpenShift.
- Upgrading from 10.0.5.2 or earlier: If you did not verify that your portal customizations are
compatible with Drupal 10, do that now.
In API Connect 10.0.5.3, the Developer Portal moved from Drupal 9 to Drupal 10 (this upgrade also requires PHP 8.1). The upgrade tooling will update your Developer Portal sites; however, if you have any custom modules or themes, it is your responsibility to ensure their compatibility with Drupal 10 and PHP 8.1 before starting the upgrade. Review the Guidelines on upgrading your Developer Portal from Drupal 9 to Drupal 10 to ensure that any customizations to the Developer Portal are compatible with Drupal 10 and PHP 8.1.
Procedure
- Ensure that you have completed all of the steps in Preparing to upgrade on OpenShift, including reviewing the Upgrade considerations on OpenShift.
Do not attempt an upgrade until you have reviewed the considerations and prepared your deployment.
-
Ensure your API Connect deployment is
ready to upgrade:
- Your API Connect
release (operand) supports a direct upgrade to this release.
For information on the operator and operand version that is used with each API Connect release, see Operator, operand, and CASE versions.
- The DataPower operator version is correct for the currently deployed
version of API Connect.
For information on upgrade paths and supported versions of DataPower Gateway, see Upgrade considerations on OpenShift.
- Your deployment is running on a version of Red Hat OpenShift that is
supported by both the current version of API Connect and the target version of API Connect.
For information, see Supported versions of OpenShift.
- Your API Connect
release (operand) supports a direct upgrade to this release.
- Update the operator channels for DataPower and API Connect.
- If you previously chose automatic subscriptions, the operator version upgrades automatically when you update the operator channel.
- If you previously chose manual subscriptions and the operator channel is already on the previous version, OpenShift OLM notifies you that an upgrade is available. You must manually approve the upgrade before proceeding.
- Both the API Connect and DataPower channels must be changed before either operator upgrades. The upgrade of both operators begins when the channel is changed for both operators.
- Upgrade the DataPower operator channel to
v1.6
. From Operators > Installed Operators, select theIBM DataPower Gateway
operator, then select theSubscription
tab. Click the channel version underneathUpdate channel
to update it.Known issue:If you see messages in the
datapower-operator
pod log indicating that the pod is waiting for a lock to be removed:{"level":"info","ts":"2021-03-08T19:29:53.432Z","logger":"leader","msg":"Not the leader. Waiting."} {"level":"info","ts":"2021-03-08T19:29:57.971Z","logger":"leader","msg":"Leader pod has been deleted, waiting for garbage collection to remove the lock."}
The DataPower operator cannot be upgraded until this problem is resolved; see: DataPower operator documentation.
- Update the API Connect operator channel to v3.9. From
Operators > Installed
Operators select the
IBM API Connect
operator, then select theSubscription
tab. Click the channel version underneathUpdate channel
to update it.If you are upgrading from 10.0.4-ifix3 and the API Connect operator does not begin its upgrade within a few minutes, perform the following workaround to delete the
subscription and the associated csv:ibm-ai-wmltraining
- Run the following command to get the name of the
subscription:
oc get subscription --no-headers=true | grep ibm-ai-wmltraining | awk '{print $1}' -n <APIC_namespace>
- Run the following command to delete the
subscription:
oc delete subscription <subscription-name> -n <APIC_namespace>
- Run the following command to get the name of the
csv:
oc get csv --no-headers=true | grep ibm-ai-wmltraining | awk '{print $1}' -n <APIC_namespace>
- Run the following command to delete the
csv:
oc delete csv <csv-name> -n <APIC_namespace>
Deleting the subscription and csv triggers the API Connect operator upgrade.
- Run the following command to get the name of the
subscription:
Wait for the operators to update, for the pods to restart, and for the instances to display the
Ready
status. -
Ensure that the operators and operands are healthy before proceeding.
- Operators: Verify that the OpenShift UI indicates that all operators are in the
Succeeded
state without any warnings. - If you are using a top-level CR: To verify that your API Connect cluster is healthy, run the
following command:
oc get apiconnectcluster -n <APIC_namespace>
Confirm that theapiconnectcluster
CR reports all pods asREADY
.oc get ManagementCluster -n apicupgrade NAME READY STATUS VERSION RECONCILED VERSION AGE management 17/17 Running 10.0.5.8 10.0.5.8-95 70m
- If you are using individual subsystem CRs: To verify the health of each subsystem, run the
following commands:
oc get ManagementCluster -n <mgmt_namespace> oc get GatewayCluster -n <gway_namespace> oc get PortalCluster -n <portal_namespace> oc get AnalyticsCluster -n <mgmt_namespace>
Check that all pods areREADY
, for example:oc get PortalCluster -n apic NAME READY STATUS VERSION RECONCILED VERSION AGE portal 3/3 Running 10.0.5.9 10.0.5.9-95 57m
- Operators: Verify that the OpenShift UI indicates that all operators are in the
- If you are using a top-level CR: Update the top-level
apiconnectcluster
CR:The
spec
section of theapiconnectcluster
looks like the following example:apiVersion: apiconnect.ibm.com/v1beta1 kind: APIConnectCluster metadata: labels: app.kubernetes.io/instance: apiconnect app.kubernetes.io/managed-by: ibm-apiconnect app.kubernetes.io/name: apiconnect-production name: prod namespace: <APIC_namespace> spec: allowUpgrade: true license: accept: true use: production license: L-GVEN-GFUPVE profile: n12xc4.m12 version: 10.0.5.9 storageClassName: rook-ceph-block
- Edit the
apiconnectcluster
CR by running the following command:oc -n <APIC_namespace> edit apiconnectcluster
- If upgrading from v10.0.4-ifix3, or
upgrading from v10.0.1.7-eus (or higher): In
the
spec
section, add a newallowUpgrade
attribute and set it totrue
:spec: allowUpgrade: true
The
allowUpgrade
attribute enables the upgrade to 10.0.5.x. Because the upgrade to 10.0.5.x deletes your analytics data, the attribute is required to prevent an accidental upgrade. - In the
spec
section, update the API Connect version:Change theversion
setting to10.0.5.9
. - If you are upgrading to a version of API Connect that requires a new
license, update the license value now.
For the list of licenses, see API Connect licenses.
- In the
spec.gateway
section of the CR, delete anytemplate
ordataPowerOverride
sections.You cannot perform an upgrade if the CR contains an override.
- Save and close the CR to apply your changes.
The response looks like the following example:
apiconnectcluster.apiconnect.ibm.com/prod configured
Note: If you see an error message when you attempt to save the CR, check if it is one of the following known issues:- Webhook error for incorrect license.If you did not update the license ID in the CR, then when you save your changes, the following webhook error might display:
To resolve the error, see API Connect licenses for the list of the available license IDs and select the appropriate license IDs for your deployment. Update the CR with the new license value as in the following example, and then save and apply your changes again.admission webhook "vapiconnectcluster.kb.io" denied the request: APIConnectCluster.apiconnect.ibm.com "<instance-name>" is invalid: spec.license.license: Invalid value: "L-RJON-BYGHM4": License L-RJON-BYGHM4 is invalid for the chosen version version. Please refer license document https://ibm.biz/apiclicenses
- Webhook error:
Original PostgreSQL primary not found
. Take the following actions to complete the upgrade and fix the cause of the error message:- Edit your
apiconnectcluster
CR and add the following annotation:... metadata: annotations: apiconnect-operator/db-primary-not-found-allow-upgrade: "true" ...
- Continue with the upgrade. When the upgrade is complete, the management CR reports the
warning:
Original PostgreSQL primary not found. Run apicops upgrade:pg-health-check to check the health of the database and to ensure pg_wal symlinks exist. If database health check passes please perform a management database backup and restore to restore the original PostgreSQL primary pod
- Take a new management database backup.
- Immediately restore from the new backup taken in the previous step. The action of taking and restoring a management backup results in the establishment of a new Postgres primary, eliminating the CR warning message. Be careful to restore from the backup that is taken after the upgrade, and not from a backup taken before upgrade.
- Edit your
- Webhook error:
Original postgres primary is running as replica
. Complete a Postgres failover, see Postgres failover steps. After you apply the Postgres failover steps, the upgrade resumes automatically.
- Webhook error for incorrect license.
- Run the following command to verify that the upgrade is completed and the status of
the top-level CR is
READY
:oc get apiconnectcluster -n <APIC_namespace>
Important: If you need to restart the deployment, wait until all portal sites complete the upgrade.After the portal subsystem upgrade is complete, each portal site is upgraded. You can monitor the site upgrade progress from theMESSAGE
column in theoc get ptl
output. You can still use the portal while sites are upgrading, although a maintenance page is shown for any sites that are being upgraded. When the site upgrades are complete, theoc get ptl
output shows how many sites the portal is serving:NAME READY STATUS VERSION RECONCILED VERSION MESSAGE AGE portal 3/3 Running <version> <version> Serving 2 sites 22h
On two data center disaster recovery deployments, the sites are not upgraded until both data centers are upgraded.
- Delete all the old Postgres client certificates. To check whether there are any old Postgres client certificates, run the following command:
oc -n <namespace> get certs | grep db-client
For example, if you see that both
-db-client-apicuser
andapicuser
exist,apicuser
is no longer in use. Remove the old certificates by running one of the following commands, depending on how many old certifications left in your system:oc -n <namespace> delete certs apicuser pgbouncer primaryuser postgres replicator
or:
oc -n <namespace> delete certs apicuser pgbouncer postgres replicator
- Edit the
- If you are using individual subsystem CRs: Start with the
Management subsystem. Update the Management CR as follows:
- Edit the
ManagementCluster
CR:oc edit ManagementCluster -n <mgmt_namespace>
- If upgrading from v10.0.4-ifix3, or
upgrading from v10.0.1.7-eus (or higher): In
the
spec
section, add a newallowUpgrade
attribute and set it totrue
spec: allowUpgrade: true
The
allowUpgrade
attribute enables the upgrade to 10.0.5.x. Because the upgrade to 10.0.5.x. deletes your analytics data, the attribute is required to prevent an accidental upgrade. - In the
spec
section, update the API Connect version:Change theversion
setting to10.0.5.9
. - If you are upgrading to a version of API Connect that requires a new
license, update the license value now.
For the list of licenses, see API Connect licenses.
- Save and close the CR to apply your changes.
The response looks like the following example:
managementcluster.management.apiconnect.ibm.com/management edited
Note: If you see an error message when you attempt to save the CR, check if it is one of the following known issues:- Webhook error for incorrect license.If you did not update the license ID in the CR, then when you save your changes, the following webhook error might display:
To resolve the error, see API Connect licenses for the list of the available license IDs and select the appropriate license IDs for your deployment. Update the CR with the new license value as in the following example, and then save and apply your changes again.admission webhook "vapiconnectcluster.kb.io" denied the request: APIConnectCluster.apiconnect.ibm.com "<instance-name>" is invalid: spec.license.license: Invalid value: "L-RJON-BYGHM4": License L-RJON-BYGHM4 is invalid for the chosen version version. Please refer license document https://ibm.biz/apiclicenses
- Webhook error:
Original PostgreSQL primary not found
. Take the following actions to complete the upgrade and fix the cause of the error message:- Edit your
ManagementCluster
CR and add the following annotation:... metadata: annotations: apiconnect-operator/db-primary-not-found-allow-upgrade: "true" ...
- Continue with the upgrade. When the upgrade is complete, the management CR reports the
warning:
Original PostgreSQL primary not found. Run apicops upgrade:pg-health-check to check the health of the database and to ensure pg_wal symlinks exist. If database health check passes please perform a management database backup and restore to restore the original PostgreSQL primary pod
- Take a new management database backup.
- Immediately restore from the new backup taken in the previous step. The action of taking and restoring a management backup results in the establishment of a new Postgres primary, eliminating the CR warning message. Be careful to restore from the backup that is taken after the upgrade, and not from a backup taken before upgrade.
- Edit your
- Webhook error:
Original postgres primary is running as replica
. Complete a Postgres failover, see Postgres failover steps. After you apply the Postgres failover steps, the upgrade resumes automatically.
- Webhook error for incorrect license.
- Wait until the Management subsystem upgrade is complete before proceeding to the next
subsystem. Check the status of the upgrade with:
oc get ManagementCluster -n <mgmt_namespace>
, and wait until all pods are running at the new version. For example:oc get ManagementCluster -n <mgmt_namespace> NAME READY STATUS VERSION RECONCILED VERSION AGE management 18/18 Running 10.0.5.9 10.0.5.9-1281 97m
- Management subsystem only: If needed, delete old Postgres client certificates.
Skip this step for the portal, analytics, and gateway subsystems.
If you are upgrading from 10.0.1.x or 10.0.4.0-ifix1, or if you previously installed any of those versions before upgrading to 10.0.5.x, there might be old Postgres client certificates. To verify, run the following command:oc -n <namespace> get certs | grep db-client
For example, if you see that both-db-client-apicuser
andapicuser
exist,apicuser
is no longer in use. Remove the old certificates by running one of the following commands, depending on how many old certifications left in your system:-
oc -n <namespace> delete certs apicuser pgbouncer primaryuser postgres replicator
-
oc -n <namespace> delete certs apicuser pgbouncer postgres replicator
-
- Repeat the process for the remaining subsystem CRs in your preferred order:
GatewayCluster
,PortalCluster
,AnalyticsCluster
.Important:- In the
GatewayCluster
CR, delete anytemplate
ordataPowerOverride
sections. You cannot perform an upgrade if the CR contains an override. - If upgrading from v10.0.4-ifix3, or
upgrading from v10.0.1.7-eus (or higher): The
allowUpgrade
attribute set in the Management CR must also be set in theAnalyticsCluster
CR. It is not required for gateway or portal CRs.
Important: If you need to restart the deployment, wait until all portal sites complete the upgrade.After the portal subsystem upgrade is complete, each portal site is upgraded. You can monitor the site upgrade progress from theMESSAGE
column in theoc get ptl
output. You can still use the portal while sites are upgrading, although a maintenance page is shown for any sites that are being upgraded. When the site upgrades are complete, theoc get ptl
output shows how many sites the portal is serving:NAME READY STATUS VERSION RECONCILED VERSION MESSAGE AGE portal 3/3 Running <version> <version> Serving 2 sites 22h
On two data center disaster recovery deployments, the sites are not upgraded until both data centers are upgraded.
- In the
- Edit the
- Upgrading to 10.0.5.5: Verify that the GatewayCluster upgraded
correctly.
When upgrading to 10.0.5.5 on OpenShift, the rolling update might fail to start on gateway operand pods due to a gateway peering issue, even though the reconciled version on the gateway CR (incorrectly) displays as 10.0.5.5. Complete the following steps to check for this issue and correct it if needed.
- Check the
productVersion
of each gateway pod to verify that it is 10.0.5.7 (the version of DataPower Gateway that was released with API Connect 10.0.5.5 by running one of the following commands:
oroc get po -n apic_namespace <gateway_pods> -o yaml | yq .metadata.annotations.productVersion
where:oc get po -n apic_namespace <gateway_pods> -o custom-columns="productVersion:.metadata.annotations.productVersion"
apic_namespace
is the namespace where API Connect is installed<gateway_pods>
is a space-delimited list of the names of your gateway peering pods
- If any pod returns an incorrect value for the version, resolve the issue as explained in Incorrect productVersion of gateway pods after upgrade.
- Check the
- (Optional). If you upgraded from 10.0.5.4 or earlier, delete the
DataPowerService CRs so that they will be regenerated with random passwords for
gateway-peering.
Starting with API Connect 10.0.5.5, GatewayCluster pods are configured by default to secure the gateway-peering sessions with a unique, randomly generated password. However, GatewayCluster pods created prior to API Connect 10.0.5.5 are configured to use a single, hard-coded password and upgrading to 10.0.5.5 or later does not replace the hard-coded password.
After upgrading to API Connect 10.0.5.5 or later, you can choose to secure the gateway-peering sessions by running the following command to delete the DataPowerService CR that was created by the GatewayCluster:
oc delete dp <gateway_cluster_name>
This action prompts the API Connect Operator to recreate the DataPowerService CR with the unique, randomly generated password. This is a one-time change and does not need to be repeated for subsequent upgrades.
- If upgrading from v10.0.4-ifix3, or upgrading from v10.0.1.7-eus (or higher): Enable analytics as explained in Enabling Analytics after upgrading
- If you are upgrading to 10.0.5.3 (or later) from an earlier 10.0.5.x release: Review and configure the new inter-subsystem communication features: Optional post-upgrade steps for upgrade to 10.0.5.3 from earlier 10.0.5 release.
- Restart all nats-server pods by running the following command:
oc -n <namespace> delete po -l app.kubernetes.io/name=natscluster
What to do next
Update your toolkit CLI by downloading it from IBM Fix Central or from the Cloud Manager UI, see Installing the toolkit.
If you are upgrading from v10.0.5.1 or earlier to v10.0.5.2: The change in deployment profile CPU and memory limits that are introduced in 10.0.5.2 (see New deployment profiles and CPU licensing) can result in a change in the performance of your Management component. If you notice any obvious reduction in performance of the Management UI or toolkit CLI where you have multiple concurrent users, open a support case.