Upgrading Gateway subsystem on native Kubernetes

Upgrade the Gateway subsystem to the latest version of API Connect.

Before you begin

  1. Complete all steps in Upgrading subsystems on native Kubernetes prior to the step that links to this topic.
  2. Ensure your upgrade path is supported.
  3. To ensure operator compatibility, upgrade the API Connect management subsystem before you upgrade the DataPower gateway subsystem. This requirement applies to all upgrade scenarios.
  4. When you upgrade a cluster of gateway pods in Kubernetes, a small number of API transactions may fail. During the upgrade, Kubernetes removes the pod from the load balancer configuration, deletes the pod and then starts a new pod. The steps are repeated for each pod. Socket hang ups occur on transactions that are in process at the time the pod is killed.

    The number of transactions that fail depends on the rate of incoming transactions and the length of time needed to complete each transaction. Typically the number of failures is a very small percentage. This behavior is expected during an upgrade. If the failure level is not acceptable, schedule the upgrade during an off-hours maintenance window.

    Note also that DataPower Gateway supports long-lived connections such as such as GraphQL subscriptions or other websockets connections. These long-lived connections might not be preserved when upgrading. Workloads with long-lived connections are more vulnerable to failed API transactions during upgrading.

    You can limit the number of failed API transactions during the upgrade by using the DataPower Operator's lifecycle property to configure the preStop container lifecycle hook on the gateway pods. This approach mitigates the risk of API failures during the rolling update of the gateway StatefulSet by sleeping the pod for a span of time, allowing in-flight transactions to complete prior to the SIGTERM being delivered to the container. While this feature does not guarantee that no in-flight APIs would fail, it does provide some mitigation for in-flight transactions that can complete successfully within the configured time window. For more information, see Delaying SIGTERM with preStop in the DataPower documentation.

  5. When upgrading a high-availability cluster, ensure that you meet the following requirements:
    • Gateways must be updated one at a time.
    • Before starting the upgrade, a single gateway must be running as primary for all gateway-peering definitions.
    • When upgrading multiple gateways, the primary gateway must be upgraded last.
    • Ensure that the pod with a name like gwv6-0 or gwv5-0 is the primary because it is the last node to be upgraded.

    To determine which gateway is running as primary, use either the show gateway-peering-status command in the DataPower CLI, or use the Gateway Peering Status display in the Web GUI in the API Connect application domain by running the following command:

    kubectl attach -it <podname>
    To move the primary to the DataPower on which you're currently working, you can issue the following command:
    gateway-peering-switch-primary <peering-object-name>
    command.

About this task

When upgrading to a new mod release, the version must be changed to the latest mod release version on each CR. This change will be picked up by the operator, and the operator will then start the upgrade.

Procedure

  1. Update the gateway CR for the new version of API Connect.
    1. Run the following command to edit the CR:
      kubectl -n $NAMESPACE edit gw <CLUSTERNAME>

      where CLUSTERNAME is the name specified in the subsystem CR at installation time.

    2. Update the API Connect version in the CR:

      For example:

      version: 10.0.5.8
    3. If you are upgrading to a version of API Connect that requires a new license, update the license value now.

      For example:

      license: L-GVEN-GFUPVE

      For the list of licenses, see API Connect licenses.

    4. Delete any template or dataPowerOverride sections. You cannot perform an upgrade if the CR contains an override.
    5. Run the following command to save and apply the CR: wq

      When you save the updated CR, the upgrade starts automatically.

      Known issue: Webhook error for incorrect license.

      If you did not update the license ID in the CR, then when you save your changes, the following webhook error might display:

      admission webhook "vmanagementcluster.kb.io" denied the request: 
      ManagementCluster.management.apiconnect.ibm.com "management" is invalid: 
      spec.license.license: Invalid value: "L-RJON-BYGHM4": 
      License L-RJON-BYGHM4 is invalid for the chosen version 10.0.5.8. 
      Please refer license document https://ibm.biz/apiclicenses

      To resolve the error, see API Connect licenses for the list of the available license IDs and select the appropriate License IDs for your deployment. Update the CR with the new license value as in the following example, and then save and apply your changes again.

  2. Run the following command to verify that the upgrade was successful:
    kubectl get apic -n <namespace>

    Example output after upgrading the gateway subsystem:

    NAME                                            READY   STATUS    VERSION              RECONCILED VERSION      AGE
    gatewaycluster.gateway.apiconnect.ibm.com/gw1   2/2     Running   <version>  <version-build>  100m
    
  3. (Optional). If you upgraded from 10.0.5.4 or earlier, delete the DataPowerService CRs so that they will be regenerated with random passwords for gateway-peering.

    Starting with API Connect 10.0.5.5, GatewayCluster pods are configured by default to secure the gateway-peering sessions with a unique, randomly generated password. However, GatewayCluster pods created prior to API Connect 10.0.5.5 are configured to use a single, hard-coded password and upgrading to 10.0.5.5 or later does not replace the hard-coded password.

    After upgrading to API Connect 10.0.5.5 or later, you can choose to secure the gateway-peering sessions by running the following command to delete the DataPowerService CR that was created by the GatewayCluster:

    kubectl delete dp <gateway_cluster_name>

    This action prompts the API Connect Operator to recreate the DataPowerService CR with the unique, randomly generated password. This is a one-time change and does not need to be repeated for subsequent upgrades.

  4. Optional: If upgrading to v10.0.5.3 (or later) from an earlier 10.0.5.x release: Review and configure the new inter-subsystem communication features: Optional post-upgrade steps for upgrade to 10.0.5.3 (or later) from earlier 10.0.5 release.