Upgrading on OpenShift and Cloud Pak for Integration in an online environment

Perform an online (connected to the internet) upgrade of IBM® API Connect on OpenShift Container Platform or IBM Cloud Pak for Integration.

Before you begin

Attention: If you plan to upgrade to API Connect 10.0.5, first upgrade to API Connect to10.0.4-ifix3, and then upgrade OpenShift to 4.10 as explained in this procedure.

Procedure

  1. Back up the current deployment in case the upgrade fails and you need to perform a roll back.
  2. Ensure your IBM API Connect deployment is ready to upgrade:
    • Your API Connect release (operand) supports a direct upgrade to this release.
    • The DataPower operator version is correct for the currently deployed version of API Connect.

    For information on the operator and operand version used with each API Connect release, see Operator, operand, and CASE versions.

    For information on upgrade paths and supported versions of DataPower Gateway, see Upgrade considerations on OpenShift and Cloud Pak for Integration

  3. Cloud Pak for Integration only: Update the operator channels.
    1. If needed, upgrade the IBM Cloud Pak foundational services (previously called Common Services) operator to channel v3.
    2. If needed, complete the following steps to upgrade the Platform Navigator operator (supports the Automation Platform UI) to channel v5:
      1. Upgrade the Platform Navigator operator to channel v5 and wait for the Platform Navigator pods to restart.

      2. Edit the yaml of the Platform Navigator instance and make the following changes:
        • Add the license ID value for the license that you purchased; for example, L-RJON-BXUPZ2.
        • Add the RWX storage class; for example rook-cephfs
        • Change the operand version to 2021.4.1.

        For example:

        apiVersion: integration.ibm.com/v1beta1
        kind: PlatformNavigator
        metadata:
          name: cp4i-navigator
          namespace: apic
        spec:
          license:
            accept: true
            license: L-RJON-BXUPZ2
          mqDashboard: true
          replicas: 3
          storage:
            class: rook-cephfs
          version: 2021.4.1

      3. Again, wait for the Platform Navigator pods to upgrade, and for the Platform Navigator to go to a Green or Ready state.
  4. Update the operator channels for DataPower and API Connect.
    1. If needed, upgrade the DataPower operator channel to v1.5.
      Attention: Upgrade the DataPower operator before upgrading the IBM API Connect operator, to ensure that dependencies are satisfied.
      • If you previously chose automatic subscriptions, the operator version will upgrade automatically.

      • If you previously chose manual subscriptions and the operator channel is already on the previous version, OpenShift (OLM) will notify you that an upgrade is available. You must manually approve the upgrade before proceeding.

        Wait for the operator to update, for the pods to restart, and for a Ready status.

      Known issue:

      When upgrading the DataPower operator, you might see messages appear in the log on the datapower-operator pod indicating that the pod is waiting for the lock to be removed:

      {"level":"info","ts":"2021-03-08T19:29:53.432Z","logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":"2021-03-08T19:29:57.971Z","logger":"leader","msg":"Leader pod has been deleted, waiting for garbage collection to remove the lock."}

      If you see these messages, the DataPower operator cannot be upgraded until you resolve the problem as explained in the DataPower operator documentation.

    2. If your IBM API Connect operator channel is not at v2.5, update it now.
      • If you previously chose automatic subscriptions, the Operator version will upgrade automatically.

      • If you previously chose manual subscriptions and if the operator channel is already on the previous version, OpenShift (OLM) will notify you that an upgrade is available. You must manually approve the upgrade before proceeding.
      Known issues:
      • The certificate manager was upgraded in Version 10.0.4.0 and you might encounter an upgrade error if the CRD for the new certificate manager is not found. For information on the errors messages that indicate this problem, and steps to resolve it, see Upgrade error when the CRD for the new certificate manager is not found in the Troubleshooting installation and upgrade on OpenShift topic.
      • A null value for the backrestStorageType property in the pgcluster CR causes an error during the operator upgrade from versions earlier than 10.0.4.0. For information on the errors messages that indicate this problem, and steps to resolve it, see Operator upgrade fails with error from API Connect operator and Postgres operator in the Troubleshooting installation and upgrade on OpenShift topic.
  5. Check for certificate errors, and then recreate issuers and certificates if needed.

    In 10.0.4, API Connect upgraded its certificate manager, which might cause some errors during the upgrade. Complete the following steps to check for certificate errors and correct them.

    1. Check the new API Connect operator's log for an error similar to the following example:
      {"level":"error","ts":1634966113.8442025,"logger":"controllers.AnalyticsCluster","msg":"Failed to set owner reference on certificate request","analyticscluster":"apic/<instance-name>-a7s","certificate":"<instance-name>-a7s-ca","error":"Object apic/<instance-name>-a7s-ca is already owned by another Certificate controller <instance-name>-a7s-ca",
      

      To correct this problem, delete all issuers and certificates generated with certmanager.k8s.io/v1alpha1. For certificates used by route objects, you must also delete the route and secret objects.

    2. Run the following commands to delete the issuers and certificates that were generated with certmanager.k8s.io/v1alpha1:
      oc delete issuers.certmanager.k8s.io <instance-name>-self-signed <instance-name>-ingress-issuer  <instance-name>-mgmt-ca <instance-name>-a7s-ca <instance-name>-ptl-ca
      oc delete certs.certmanager.k8s.io <instance-name>-ingress-ca <instance-name>-mgmt-ca <instance-name>-ptl-ca <instance-name>-a7s-ca

      In the examples, <instance-name> is the instance name of the top-level apiconnectcluster.

      When you delete the issuers and certificates, the new certificate manager generates replacements; this might take a few minutes.

    3. Verify that the new CA certs are refreshed and ready.

      Run the following command to verify the certificates:

      oc get certs <instance-name>-ingress-ca <instance-name>-mgmt-ca <instance-name>-ptl-ca <instance-name>-a7s-ca
      

      The CA certs are ready when AGE is "new" and the READY column shows True.

    4. Delete the remaining old certificates, routes, and secret objects corresponding to those routes.

      Run the following commands:

      oc get certs.certmanager.k8s.io | awk '/<instance-name>/{print $1}'  | xargs oc delete certs.certmanager.k8s.io
      oc delete certs.certmanager.k8s.io postgres-operator
      oc get routes --no-headers -o custom-columns=":metadata.name" | grep ^<instance-name>- | xargs oc delete routes
      Note: The following command deletes the secrets for the routes. Do not delete any other secrets.
      oc get routes --no-headers -o custom-columns=":metadata.name" | grep ^<instance-name>- | xargs oc delete secrets
    5. Verify that no old issuers or certificates from your top-level instance remain.

      Run the following commands:

      oc get issuers.certmanager.k8s.io | grep <instance-name>
      oc get certs.certmanager.k8s.io | grep <instance-name>
      

      Both commands should report that no resources were found.

  6. Use apicops to validate the certificates.
    1. Run the following command:
      apicops upgrade:stale-certs -n <APIC_namespace>
    2. Delete any stale certificates that are managed by cert-manager.
      If a certificate failed the validation and it is managed by cert-manager, you can delete the stale certificate secret, and let cert-manager regenerate it. Run the following command:
      kubectl delete secret <stale-secret> -n <APIC_namespace>
    3. Restart the corresponding so that it can pick up the new secret.
      To determine which pod to restart, see the following topics:

    For information on the apicops tool, see The API Connect operations tool: apicops.

  7. Run the following command to delete the Postgres pods, which refreshes the new certificate:
    oc get pod -n <namespace> --no-headers=true | grep postgres | grep -v backup | awk '{print $1}' | xargs oc delete pod -n <namespace>
  8. If needed, delete the portal-www, portal-db and portal-nginx pods to ensure they use the new secrets.

    If you have the Developer Portal deployed, then the portal-www, portal-db, and portal-nginx pods might require deleting to ensure that they pick up the newly generated secrets when restarted. If the pods are not showing as "ready" in a timely manner, then delete all the pods at the same time (this will cause down time).

    Run the following commands to get the name of the portal CR and delete the pods:

    oc project <APIC_namespace>
    oc get ptl
    oc delete po -l app.kubernetes.io/instance=<name_of_portal_CR>
    
  9. If needed, renew the internal certificates for the analytics subsystem.

    If you see analytics-storage-* or analytics-mq-* pods in the CrashLoopBackOff state, then renew the internal certificates for the analytics subsystem and force a restart of the pods.

    1. Switch to the project/namespace where analytics is deployed and run the following command to get the name of the analytics CR (AnalyticsCluster):
      oc project <APIC_namespace>
      oc get a7s

      You need the CR name for the remaining steps.

    2. Renew the internal certificates (CA, client, and server) by running the following commands:
      oc get certificate <name_of_analytics_CR>
      -ca -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
      oc get certificate <name_of_analytics_CR>
      -client -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
      oc get certificate <name_of_analytics_CR>
      -server -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
      
    3. Force a restart of all analytics pods by running the following command:
      oc delete po -l app.kubernetes.io/instance=<name_of_analytics_CR>
      
  10. Ensure that the operators and operands are healthy before proceeding.

    Operators: Verify that the OpenShift UI indicates that all operators are in the Succeeded state without any warnings.

    Operands:

    • To verify whether operands are healthy, run the following command: oc get apic

      Make sure that the apiconnectcluster custom resource reports READY.

    • Cloud Pak for Integration only: Wait until the IBM API Connect capability shows READY (green check) in the Automation UI.
      Known issue: Status toggles between Ready and Warning

      There is a known issue where the IBM API Connect operator toggles the overall status of the IBM API Connect deployment in Platform Navigator between Ready and Warning. Look at the full list of conditions and when Ready is True, you can proceed to the next step even if Warning is also true.

  11. Update the operand version:
    • OpenShift: Complete the following steps:
      1. Edit the top-level apiconnectcluster CR by running the following command:
        oc -n <APIC_namespace> edit apiconnectcluster
      2. Change the version setting to 10.0.4.0-ifix3.

      3. In the spec.gateway section, delete the template override section, if it exists. You cannot perform an upgrade if the CR contains an override.
      4. Save and close the CR.
    • Cloud Pak for Integration: Open the Automation Platform UI and complete the following steps:
      1. In the Automation Platform UI, click the Integration instances tab.

      2. Click Menu icon at the end of the current row, and then click Change version.

      3. Click Select a new channel or version, and then select 10.0.4.0-ifix3 in the Channel field.

        Selecting the new channel ensures that both DataPower Gateway and IBM API Connect are upgraded.

      4. Click Save to save your selections and start the upgrade.

        In the instances table, the Status column for the instance displays the "Upgrading" message. The upgrade is complete when the Status is "Ready" and the Version displays the new version number.

    Known issue: Webhook error for incorrect license.

    If you did not update the license ID in the CR, then when you change the operand version change, the following webhook error might display:

    admission webhook "vapiconnectcluster.kb.io" denied the request: 
    APIConnectCluster.apiconnect.ibm.com "<instance-name>" is invalid: spec.license.license: 
    Invalid value: "L-RJON-BYGHM4": License L-RJON-BYGHM4 is invalid for the chosen version version. 
    Please refer license document https://ibm.biz/apiclicenses

    To resolve the error, visit https://ibm.biz/apiclicenses and select the appropriate License IDs for your deployment. Update the CR with the new license value as in the following example:

    spec:
      license:
        accept: true
        use: production
        license: L-RJON-BZ5LJ5

    Then, apply the updated CR.

  12. Resolve gateway peering issues by completing the following steps.

    Due to the ingress issuer changes, the gateway pods must be scaled down and then back up. This process will cause 5 to 10 minutes of down time.

    1. Run the following command to verify that the management, portal, and analytics subsystems are Running:
      oc get apic --all-namespaces

      The response looks like the following example, with the gateway pods showing as Pending.

      NAME                                                        READY   STATUS    VERSION      RECONCILED VERSION   AGE
      analyticscluster.analytics.apiconnect.ibm.com/<instance-name>-a7s   8/8     Running   10.0.4.0     10.0.4.0-2221        7d1h
      
      NAME                                           READY      STATUS    VERSION          RECONCILED VERSION   AGE
      apiconnectcluster.apiconnect.ibm.com/<instance-name>   6/7        Pending   10.0.4.0         10.0.3.0-ifix1-351   47h
      
      NAME                                            PHASE     READY   SUMMARY                           VERSION      AGE
      datapowerservice.datapower.ibm.com/<instance-name>-gw   Pending   True    StatefulSet replicas ready: 1/1   10.0.4.0     46h
      
      NAME                                            PHASE     LAST EVENT   WORK PENDING   WORK IN-PROGRESS   AGE
      datapowermonitor.datapower.ibm.com/<instance-name>-gw   Pending                false          false              46h
      
      NAME                                                   READY   STATUS    VERSION          RECONCILED VERSION   AGE
      gatewaycluster.gateway.apiconnect.ibm.com/<instance-name>-gw   0/2     Pending   10.0.4.0         10.0.4.0-2221        46h
      
      NAME                                                           READY   STATUS    VERSION          RECONCILED VERSION   AGE
      managementcluster.management.apiconnect.ibm.com/<instance-name>-mgmt   16/16   Running   10.0.4.0         10.0.4.0-2221        47h
      
      NAME                                                                      STATUS     MESSAGE                                                     AGE
      managementdbupgrade.management.apiconnect.ibm.com/<instance-name>-mgmt-up-pxl77   Complete   Fresh install is Complete (DB Schema/data are up-to-date)   46h
      managementdbupgrade.management.apiconnect.ibm.com/management-up-87fcz     Complete   Upgrade is Complete (DB Schema/data are up-to-date)         8h
      
      NAME                                                  READY   STATUS    VERSION        RECONCILED VERSION   AGE
      portalcluster.portal.apiconnect.ibm.com/<instance-name>-ptl   3/3     Running   10.0.4.0       10.0.4.0-2221        46h
    2. Scale down the gateway firmware containers by editing the top-level APIConnectCluster CR and setting the replica count to 0.

      OpenShift:

      1. Run the following command to edit the CR:
        oc edit apiconnectcluster <apic-cr-name>
      2. In the spec.gateway section, set the replicaCount setting to 0:
        ...
        spec:
          gateway:
            replicaCount: 0
        ...

        If the setting is not already included in the CR, add it now as shown in the example.

      3. Save and exit the CR.

      Cloud Pak for Integration:

      1. In the Automation Platform UI, edit the API Connect instance.
      2. Click Advanced.
      3. In the Gateway subsystem section, set the Advance Replica count field to 0.
    3. Wait for the gateway firmware pods to scale down and terminate.

      Do not proceed to the next step until the pods are terminated.

    4. Reset the replica count to its original value.

      If the replica count setting was not used previously, then:

      • OpenShift: Delete the setting from the CR.
      • Cloud Pak for Integration: Clear the Advance Replica count field.
  13. Ensure the upgrade is completed and the status of the top-level CR is READY:
    • OpenShift: Run the following command :
      oc get apiconnectcluster

      and make sure it reports READY.

    • CP4I: Make sure that the IBM API Connect capability reports READY in the Automation Platform UI.
    Known issue: Management subsystem remains in Pending state after upgrade

    There is known problem where the management subsystem can remain in the Pending state after an upgrade if it was originally deployed in 10.0.1.0 with custom internal certificates. You can work around this problem as explained in Workaround: Management subsystem status remains Pending when upgrading with custom internal certificates.

    If the management upgrade fails, the apiconnect-operator performs a rollback procedure if possible.

  14. Upgrade the OpenShift cluster to OpenShift 4.10.

    API Connect 10.0.5 requires that your cluster be on OpenShift 4.10 before you begin that upgrade.

    Upgrading OpenShift requires that you move to interim minor releases instead of upgrading directly from 4.6 to 4.10. For more information, see the Red Hat OpenShift documentation. In the "Documentation" banner, select the version of OpenShift that you want to upgrade to, and then expand the "Updating clusters" section in the navigation list.

What to do next

Optionally, you can install the newer Toolkit and Local Test Environment, from IBM Fix Central. For a link to the latest files on Fix Central, see What's new in the latest release.