Upgrading on OpenShift in an online environment

Perform an online (connected to the internet) upgrade of IBM® API Connect on Red Hat OpenShift Container Platform using either the top-level APIConnectCluster CR or individual subsystem CRs.

Before you begin

  • If you are upgrading an air-gapped (disconnected from the internet) installation, see Air-gapped upgrade.
  • The Gateway subsystem remains available during the upgrade of the Management, Portal, and Analytics subsystems.
  • If you are upgrading to a version of API Connect that supports a newer version of Red Hat OpenShift, complete the API Connect upgrade before upgrading Red Hat OpenShift.
  • Upgrading from 10.0.5.2 or earlier: If you did not verify that your portal customizations are compatible with Drupal 10, do that now.

    In API Connect 10.0.5.3, the Developer Portal moved from Drupal 9 to Drupal 10 (this upgrade also requires PHP 8.1). The upgrade tooling will update your Developer Portal sites; however, if you have any custom modules or themes, it is your responsibility to ensure their compatibility with Drupal 10 and PHP 8.1 before starting the upgrade. Review the Guidelines on upgrading your Developer Portal from Drupal 9 to Drupal 10 to ensure that any customizations to the Developer Portal are compatible with Drupal 10 and PHP 8.1.

Procedure

  1. Ensure that you have completed all of the steps in Preparing to upgrade on OpenShift, including reviewing the Upgrade considerations on OpenShift.

    Do not attempt an upgrade until you have reviewed the considerations and prepared your deployment.

  2. Ensure your API Connect deployment is ready to upgrade:
    • Your API Connect release (operand) supports a direct upgrade to this release.

      For information on the operator and operand version that is used with each API Connect release, see Operator, operand, and CASE versions.

    • The DataPower operator version is correct for the currently deployed version of API Connect.

      For information on upgrade paths and supported versions of DataPower Gateway, see Upgrade considerations on OpenShift.

    • Your deployment is running on a version of Red Hat OpenShift that is supported by both the current version of API Connect and the target version of API Connect.

      For information, see Supported versions of OpenShift.

  3. Update the operator channels for DataPower and API Connect.
    • If you previously chose automatic subscriptions, the operator version upgrades automatically when you update the operator channel.

    • If you previously chose manual subscriptions and the operator channel is already on the previous version, OpenShift OLM notifies you that an upgrade is available. You must manually approve the upgrade before proceeding.
    • Both the API Connect and DataPower channels must be changed before either operator upgrades. The upgrade of both operators begins when the channel is changed for both operators.
    1. Upgrade the DataPower operator channel to v1.6. From Operators > Installed Operators, select the IBM DataPower Gateway operator, then select the Subscription tab. Click the channel version underneath Update channel to update it.
      Known issue:

      If you see messages in the datapower-operator pod log indicating that the pod is waiting for a lock to be removed:

      {"level":"info","ts":"2021-03-08T19:29:53.432Z","logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":"2021-03-08T19:29:57.971Z","logger":"leader","msg":"Leader pod has been deleted, waiting for garbage collection to remove the lock."}

      The DataPower operator cannot be upgraded until this problem is resolved; see: DataPower operator documentation.

    2. Update the API Connect operator channel to v3.9. From Operators > Installed Operators select the IBM API Connect operator, then select the Subscription tab. Click the channel version underneath Update channel to update it.

      If you are upgrading from 10.0.4-ifix3 and the API Connect operator does not begin its upgrade within a few minutes, perform the following workaround to delete the ibm-ai-wmltraining subscription and the associated csv:

      1. Run the following command to get the name of the subscription:
        oc get subscription --no-headers=true | grep ibm-ai-wmltraining | awk '{print $1}' -n <APIC_namespace>
      2. Run the following command to delete the subscription:
        oc delete subscription <subscription-name> -n <APIC_namespace>
      3. Run the following command to get the name of the csv:
        oc get csv --no-headers=true | grep ibm-ai-wmltraining | awk '{print $1}' -n <APIC_namespace>
      4. Run the following command to delete the csv:
        oc delete csv <csv-name> -n <APIC_namespace>

      Deleting the subscription and csv triggers the API Connect operator upgrade.

    Wait for the operators to update, for the pods to restart, and for the instances to display the Ready status.

  4. Ensure that the operators and operands are healthy before proceeding.
    • Operators: Verify that the OpenShift UI indicates that all operators are in the Succeeded state without any warnings.
    • If you are using a top-level CR: To verify that your API Connect cluster is healthy, run the following command:
      oc get apiconnectcluster -n <APIC_namespace>
      Confirm that the apiconnectcluster CR reports all pods as READY.
      oc get ManagementCluster -n apicupgrade
      NAME         READY   STATUS    VERSION    RECONCILED VERSION   AGE
      management   17/17   Running   10.0.5.8   10.0.5.8-95    70m
      
    • If you are using individual subsystem CRs: To verify the health of each subsystem, run the following commands:
      oc get ManagementCluster -n <mgmt_namespace>
      oc get GatewayCluster -n <gway_namespace>
      oc get PortalCluster -n <portal_namespace>
      oc get AnalyticsCluster -n <mgmt_namespace>
      Check that all pods are READY, for example:
      oc get PortalCluster -n apic
      NAME     READY   STATUS    VERSION    RECONCILED VERSION   AGE
      portal   3/3     Running   10.0.5.9   10.0.5.9-95    57m
  5. If you are using a top-level CR: Update the top-level apiconnectcluster CR:

    The spec section of the apiconnectcluster looks like the following example:

    apiVersion: apiconnect.ibm.com/v1beta1
    kind: APIConnectCluster
    metadata:
      labels:
        app.kubernetes.io/instance: apiconnect
        app.kubernetes.io/managed-by: ibm-apiconnect
        app.kubernetes.io/name: apiconnect-production
      name: prod
      namespace: <APIC_namespace>
    spec:
      allowUpgrade: true
      license:
        accept: true
        use: production
        license: L-GVEN-GFUPVE
      profile: n12xc4.m12
      version: 10.0.5.9
      storageClassName: rook-ceph-block
    1. Edit the apiconnectcluster CR by running the following command:
      oc -n <APIC_namespace> edit apiconnectcluster
    2. If upgrading from v10.0.4-ifix3, or upgrading from v10.0.1.7-eus (or higher): In the spec section, add a new allowUpgrade attribute and set it to true:
      spec:
        allowUpgrade: true

      The allowUpgrade attribute enables the upgrade to 10.0.5.x. Because the upgrade to 10.0.5.x deletes your analytics data, the attribute is required to prevent an accidental upgrade.

    3. In the spec section, update the API Connect version:
      Change the version setting to 10.0.5.9.
    4. If you are upgrading to a version of API Connect that requires a new license, update the license value now.

      For the list of licenses, see API Connect licenses.

    5. In the spec.gateway section of the CR, delete any template or dataPowerOverride sections.

      You cannot perform an upgrade if the CR contains an override.

    6. Save and close the CR to apply your changes.
      The response looks like the following example:
      apiconnectcluster.apiconnect.ibm.com/prod configured
      Note: If you see an error message when you attempt to save the CR, check if it is one of the following known issues:
      • Webhook error for incorrect license.
        If you did not update the license ID in the CR, then when you save your changes, the following webhook error might display:
        admission webhook "vapiconnectcluster.kb.io" denied the request: 
        APIConnectCluster.apiconnect.ibm.com "<instance-name>" is invalid: spec.license.license: 
        Invalid value: "L-RJON-BYGHM4": License L-RJON-BYGHM4 is invalid for the chosen version version. 
        Please refer license document https://ibm.biz/apiclicenses
        To resolve the error, see API Connect licenses for the list of the available license IDs and select the appropriate license IDs for your deployment. Update the CR with the new license value as in the following example, and then save and apply your changes again.
      • Webhook error: Original PostgreSQL primary not found. Take the following actions to complete the upgrade and fix the cause of the error message:
        1. Edit your apiconnectcluster CR and add the following annotation:
          ...
          metadata:
            annotations:
              apiconnect-operator/db-primary-not-found-allow-upgrade: "true"
              ...
        2. Continue with the upgrade. When the upgrade is complete, the management CR reports the warning:
          Original PostgreSQL primary not found. Run apicops upgrade:pg-health-check to check the health of the database and to ensure pg_wal symlinks exist. If database health check passes please perform a management database backup and restore to restore the original PostgreSQL primary pod
        3. Take a new management database backup.
        4. Immediately restore from the new backup taken in the previous step. The action of taking and restoring a management backup results in the establishment of a new Postgres primary, eliminating the CR warning message. Be careful to restore from the backup that is taken after the upgrade, and not from a backup taken before upgrade.
      • Webhook error: Original postgres primary is running as replica. Complete a Postgres failover, see Postgres failover steps. After you apply the Postgres failover steps, the upgrade resumes automatically.
    7. Run the following command to verify that the upgrade is completed and the status of the top-level CR is READY:
      oc get apiconnectcluster -n <APIC_namespace>
      Important: If you need to restart the deployment, wait until all portal sites complete the upgrade.
      After the portal subsystem upgrade is complete, each portal site is upgraded. You can monitor the site upgrade progress from the MESSAGE column in the oc get ptl output. You can still use the portal while sites are upgrading, although a maintenance page is shown for any sites that are being upgraded. When the site upgrades are complete, the oc get ptl output shows how many sites the portal is serving:
      NAME     READY   STATUS    VERSION          RECONCILED VERSION   MESSAGE              AGE
      portal   3/3     Running   <version>        <version>            Serving 2 sites      22h

      On two data center disaster recovery deployments, the sites are not upgraded until both data centers are upgraded.

    8. Delete all the old Postgres client certificates.
      To check whether there are any old Postgres client certificates, run the following command:
      oc -n <namespace> get certs | grep db-client

      For example, if you see that both -db-client-apicuser and apicuser exist, apicuser is no longer in use. Remove the old certificates by running one of the following commands, depending on how many old certifications left in your system:

      oc -n <namespace> delete certs  apicuser pgbouncer primaryuser postgres replicator

      or:

      oc -n <namespace> delete certs  apicuser pgbouncer postgres replicator
  6. If you are using individual subsystem CRs: Start with the Management subsystem. Update the Management CR as follows:
    1. Edit the ManagementCluster CR:
      oc edit ManagementCluster -n <mgmt_namespace>
    2. If upgrading from v10.0.4-ifix3, or upgrading from v10.0.1.7-eus (or higher): In the spec section, add a new allowUpgrade attribute and set it to true
      spec:
        allowUpgrade: true

      The allowUpgrade attribute enables the upgrade to 10.0.5.x. Because the upgrade to 10.0.5.x. deletes your analytics data, the attribute is required to prevent an accidental upgrade.

    3. In the spec section, update the API Connect version:
      Change the version setting to 10.0.5.9.
    4. If you are upgrading to a version of API Connect that requires a new license, update the license value now.

      For the list of licenses, see API Connect licenses.

    5. Save and close the CR to apply your changes.
      The response looks like the following example:
      managementcluster.management.apiconnect.ibm.com/management edited
      Note: If you see an error message when you attempt to save the CR, check if it is one of the following known issues:
      • Webhook error for incorrect license.
        If you did not update the license ID in the CR, then when you save your changes, the following webhook error might display:
        admission webhook "vapiconnectcluster.kb.io" denied the request: 
        APIConnectCluster.apiconnect.ibm.com "<instance-name>" is invalid: spec.license.license: 
        Invalid value: "L-RJON-BYGHM4": License L-RJON-BYGHM4 is invalid for the chosen version version. 
        Please refer license document https://ibm.biz/apiclicenses
        To resolve the error, see API Connect licenses for the list of the available license IDs and select the appropriate license IDs for your deployment. Update the CR with the new license value as in the following example, and then save and apply your changes again.
      • Webhook error: Original PostgreSQL primary not found. Take the following actions to complete the upgrade and fix the cause of the error message:
        1. Edit your ManagementCluster CR and add the following annotation:
          ...
          metadata:
            annotations:
              apiconnect-operator/db-primary-not-found-allow-upgrade: "true"
              ...
        2. Continue with the upgrade. When the upgrade is complete, the management CR reports the warning:
          Original PostgreSQL primary not found. Run apicops upgrade:pg-health-check to check the health of the database and to ensure pg_wal symlinks exist. If database health check passes please perform a management database backup and restore to restore the original PostgreSQL primary pod
        3. Take a new management database backup.
        4. Immediately restore from the new backup taken in the previous step. The action of taking and restoring a management backup results in the establishment of a new Postgres primary, eliminating the CR warning message. Be careful to restore from the backup that is taken after the upgrade, and not from a backup taken before upgrade.
      • Webhook error: Original postgres primary is running as replica. Complete a Postgres failover, see Postgres failover steps. After you apply the Postgres failover steps, the upgrade resumes automatically.
    6. Wait until the Management subsystem upgrade is complete before proceeding to the next subsystem. Check the status of the upgrade with: oc get ManagementCluster -n <mgmt_namespace>, and wait until all pods are running at the new version. For example:
      oc get ManagementCluster -n <mgmt_namespace>
      NAME         READY   STATUS    VERSION    RECONCILED VERSION   AGE
      management   18/18   Running   10.0.5.9   10.0.5.9-1281        97m
    7. Management subsystem only: If needed, delete old Postgres client certificates.

      Skip this step for the portal, analytics, and gateway subsystems.

      If you are upgrading from 10.0.1.x or 10.0.4.0-ifix1, or if you previously installed any of those versions before upgrading to 10.0.5.x, there might be old Postgres client certificates. To verify, run the following command:
      oc -n <namespace> get certs | grep db-client
      For example, if you see that both -db-client-apicuser and apicuser exist, apicuser is no longer in use. Remove the old certificates by running one of the following commands, depending on how many old certifications left in your system:
      • oc -n <namespace> delete certs  apicuser pgbouncer primaryuser postgres replicator
      • oc -n <namespace> delete certs  apicuser pgbouncer postgres replicator
    8. Repeat the process for the remaining subsystem CRs in your preferred order: GatewayCluster, PortalCluster, AnalyticsCluster.
      Important:
      • In the GatewayCluster CR, delete any template or dataPowerOverride sections. You cannot perform an upgrade if the CR contains an override.
      • If upgrading from v10.0.4-ifix3, or upgrading from v10.0.1.7-eus (or higher): The allowUpgrade attribute set in the Management CR must also be set in the AnalyticsCluster CR. It is not required for gateway or portal CRs.
      Important: If you need to restart the deployment, wait until all portal sites complete the upgrade.
      After the portal subsystem upgrade is complete, each portal site is upgraded. You can monitor the site upgrade progress from the MESSAGE column in the oc get ptl output. You can still use the portal while sites are upgrading, although a maintenance page is shown for any sites that are being upgraded. When the site upgrades are complete, the oc get ptl output shows how many sites the portal is serving:
      NAME     READY   STATUS    VERSION          RECONCILED VERSION   MESSAGE              AGE
      portal   3/3     Running   <version>        <version>            Serving 2 sites      22h

      On two data center disaster recovery deployments, the sites are not upgraded until both data centers are upgraded.

  7. Upgrading to 10.0.5.5: Verify that the GatewayCluster upgraded correctly.

    When upgrading to 10.0.5.5 on OpenShift, the rolling update might fail to start on gateway operand pods due to a gateway peering issue, even though the reconciled version on the gateway CR (incorrectly) displays as 10.0.5.5. Complete the following steps to check for this issue and correct it if needed.

    1. Check the productVersion of each gateway pod to verify that it is 10.0.5.7 (the version of DataPower Gateway that was released with API Connect 10.0.5.5 by running one of the following commands:
      oc get po -n apic_namespace <gateway_pods> -o yaml | yq .metadata.annotations.productVersion
      or
      oc get po -n apic_namespace <gateway_pods> -o custom-columns="productVersion:.metadata.annotations.productVersion"
      
      where:
      • apic_namespace is the namespace where API Connect is installed
      • <gateway_pods> is a space-delimited list of the names of your gateway peering pods
    2. If any pod returns an incorrect value for the version, resolve the issue as explained in Incorrect productVersion of gateway pods after upgrade.
  8. (Optional). If you upgraded from 10.0.5.4 or earlier, delete the DataPowerService CRs so that they will be regenerated with random passwords for gateway-peering.

    Starting with API Connect 10.0.5.5, GatewayCluster pods are configured by default to secure the gateway-peering sessions with a unique, randomly generated password. However, GatewayCluster pods created prior to API Connect 10.0.5.5 are configured to use a single, hard-coded password and upgrading to 10.0.5.5 or later does not replace the hard-coded password.

    After upgrading to API Connect 10.0.5.5 or later, you can choose to secure the gateway-peering sessions by running the following command to delete the DataPowerService CR that was created by the GatewayCluster:

    oc delete dp <gateway_cluster_name>

    This action prompts the API Connect Operator to recreate the DataPowerService CR with the unique, randomly generated password. This is a one-time change and does not need to be repeated for subsequent upgrades.

  9. If upgrading from v10.0.4-ifix3, or upgrading from v10.0.1.7-eus (or higher): Enable analytics as explained in Enabling Analytics after upgrading
  10. If you are upgrading to 10.0.5.3 (or later) from an earlier 10.0.5.x release: Review and configure the new inter-subsystem communication features: Optional post-upgrade steps for upgrade to 10.0.5.3 from earlier 10.0.5 release.
  11. Restart all nats-server pods by running the following command:
    oc -n <namespace> delete po -l app.kubernetes.io/name=natscluster

What to do next

Update your toolkit CLI by downloading it from IBM Fix Central or from the Cloud Manager UI, see Installing the toolkit.

If you are upgrading from v10.0.5.1 or earlier to v10.0.5.2: The change in deployment profile CPU and memory limits that are introduced in 10.0.5.2 (see New deployment profiles and CPU licensing) can result in a change in the performance of your Management component. If you notice any obvious reduction in performance of the Management UI or toolkit CLI where you have multiple concurrent users, open a support case.