Upgrading to 10.0.1.8-eus on Kubernetes

Upgrade your API Connect deployment to 10.0.1.8-eus from 10.0.1.6-ifix1 or earlier on native Kubernetes.

Before you begin

Procedure

  1. Prepare for upgrading the management subsystem:
    1. Verify that the pgcluster is healthy:
      1. Get the name of the pgcluster:
        kubectl get pgcluster -n <APIC_namespace>

        The response displays the name of the postgres cluster running in the specified namespace.

      2. Check the status of the pgcluster:
        kubectl get pgcluster <pgcluster_name> -n <APIC_namespace> -o yaml | grep status -A 2 | tail -n3

        The response for a healthy pgcluster looks like the following example, where the state is Initialized:

        status:
            message: Cluster has been initialized
            state: pgcluster Initialized
      Important: If the pgcluster returns any other state, it is not healthy and an upgrade will fail.
      • If there are any ongoing backup or restore jobs, wait until they complete and then check the status again. Do not proceed with the upgrade until the status is Initialized.
      • If all of the background jobs complete but the pgcluster remains unhealthy, contact IBM Support for assistance.
  2. Backup the current deployment.
    • Wait until the backup completes before starting the upgrade.
    • Do not start an upgrade if a backup is scheduled to run within a few hours.
    • Do not perform maintenance tasks (such as rotating keys and certificates, restoring from a backup, or starting a new backup) while the upgrade process is running.
  3. If you used any microservice image overrides in the management CR during a fresh install, remove the image overrides prior to upgrade.
    Important: If you used any microservice image overrides in the management CR during a fresh install, these image overrides will be automatically removed by the operator during upgrade. You can apply them again after the upgrade is complete.
  4. Run the pre-upgrade health check:
    1. Verify that the apicops utility is installed by running the following command to check the current version of the utility:
      apicops --version

      If the response indicates that apicops is not available, install it now. See The API Connect operations tool: apicops in the API Connect documentation.

    2. Run the following command to set the KUBECONFIG environment.
      export KUBECONFIG=/path/to/kubeconfig
    3. Run the following command to execute the pre-upgrade script:
      apicops version:pre-upgrade -n <namespace>

      If the system is healthy, the results will not include any errors.

  5. Obtain the API Connect 10.0.1.8-eus files from IBM Fix Central.

    From the IBM Fix Central site, download the Docker image-tool file of the API Connect subsystems. Next, you will upload the image-tool file to your Docker local registry. If necessary, you can populate a remote container registry with repositories. Then you can push the images from the local registry to the remote registry.

    You will also download the Kubernetes operators, API Connect Custom Resource (CR) templates, and Certificate Manager, for use during deployment configuration.

    The following files are used for deployment on native Kubernetes:

    IBM® API Connect <version> for Containers
    Docker images for all API Connect subsystems
    IBM® API Connect <version> Operator Release Files for Containers
    Kubernetes operators and API Connect Custom Resource (CR) templates
    IBM® API Connect <version> Security Signature Bundle File
    Checksum files that you can use to verify the integrity of your downloads.
    Note: There is no need to download or install the Toolkit or the Local Test Environment now; use the latest versions of the tools when you upgrade to the latest version of API Connect in the next task.
  6. Upload the image files that you obtained from Fix Central in Step 5.
    1. Load the image-tool image into your Docker local registry:
      docker load < apiconnect-image-tool-<version>.tar.gz 

      Ensure that the registry has sufficient disk space for the files.

    2. If your Docker registry requires repositories to be created before images can be pushed, create the repositories for each of the images listed by the image tool. If your Docker registry does not require creation of repositories, skip this step and go to Step 6.c.
      1. Run the following command to get a list of the images from image-tool:
        docker run --rm apiconnect-image-tool-<version> version --images
      2. From the output of each entry of the form <image-name>:<image-tag>, use your Docker registry repository creation command to create a repository for <image-name>.
        For example in the case of AWS ECR the command would be for each <image-name>:
        aws ecr create-repository --repository-name <image-name>
    3. Upload the image:
      • If you do not need to authenticate with the docker registry, use:
        docker run --rm apiconnect-image-tool-<version> upload <registry-url>
      • Otherwise, if your docker registry accepts authentication with username and password arguments, use:
        docker run --rm apiconnect-image-tool-<version> upload <registry-url> --username <username> --password <password>
      • Otherwise, such as with IBM Container Registry, if you need the image-tool to use your local Docker credentials, first authenticate with your Docker registry, then upload images with the command:
        docker run --rm -v ~/.docker:/root/.docker --user 0 apiconnect-image-tool-<version> upload <registry-url>

        If necessary, review the following installation notes:

  7. If necessary, replace the pgbouncer image before upgrading the management subsystem. Follow the instruction that applies to your deployment scenario:
    Upgrading from 10.0.1.2-ifix1-eus, 10.0.1.2-ifix2-eus, or later
    Do not replace pgbouncer. Go to step 8
    Upgrading from 10.0.1.2-eus, without any ifixes, and the 10.0.1.2-eus installation completed without errors
    Replace pgbouncer. Go to step 7.a
    Upgrading from 10.0.1.2-eus, without any ifixes, and the 10.0.1.2-eus installation encountered an error
    • If the error was server DNS lookup failed, replace pgbouncer. Go to step 7.a
    • If any other error was encountered, do not replace pgbouncer. Contact IBM Support for assistance.
    1. Obtain the pgbouncer image from the registry where you pushed the latest (10.0.1.5-eus) images.
      <registry-name>/ibm-apiconnect-management-crunchy-pgbouncer@sha256:4a5caaf4e5cd4056ccb3de7d39b8e343b0c4ebce7cae694ccbfbe80924d98752

      The registry-name is the registry you selected to push the images to in Step 4.

    2. Get the pgbouncer deployment name:
      kubectl get deploy -n <namespace> | grep pgbouncer
    3. Replace the container image section with the new pgbouncer image:
      kubectl edit deploy <pgbouncer-deployment-name> -n <namespace>
    4. Restart the pgbouncer pod.
    5. Next, verify that the version of the new pgbouncer is correct. Get the pgbouncer pod name:
      kubectl get pods -n <namespace> | grep 'pgbouncer'
    6. Exec into the pgbouncer pod:
      kubectl exec -it <pgbouncer-pod> -n <namespace> -- bash
    7. Execute pgbouncer --version and make sure it matches:
      bash-4.4$ pgbouncer --version
      PgBouncer 1.15.0
      libevent 2.1.8-stable
      adns: evdns2
      tls: OpenSSL 1.1.1g FIPS  21 Apr 2020
      systemd: yes
    8. Get the logs for newly started pgbouncer microservice, and verify there are no errors for DNS lookup:
      kubectl logs <pgbouncer-pod-name> -n <namespace> | grep 'server DNS lookup failed' 

      The message server DNS lookup failed should not be included in the logs.

    9. If your original 10.0.1.2-eus installation (with the old pgbouncer) did not get an error, continue with Step 8.

      If your original 10.0.1.2-eus installation encountered the message server DNS lookup failed in the pgbouncer logs, you must also delete the following microservices:

      1. apim: Get the pod name and delete the microservice:
        kubectl get pods -n <namespace> | grep 'apim'
        kubectl delete pod <apim-pod> -n <namespace>
      2. lur: Get the pod name and delete the microservice:
        kubectl get pods -n <namespace> | grep 'lur'
        kubectl delete pod <lur-pod> -n <namespace>
      3. task manager: Get the pod name and delete the microservice:
        kubectl get pods -n <namespace> | grep 'task'
        kubectl delete pod <task-manager-pod> -n <namespace>
  8. Download and decompress IBM® API Connect <version> Operator Release Files for Containers.
  9. Upgrade cert-manager to version 1.7.1.
    Complete the following steps:
    API Connect 10.0.1.6-ifix1-eus and earlier use cert-manager version 0.10.1, which cannot be upgraded directly to 1.7.1. Complete the following steps to delete cert-manager 0.10.1 and install version 1.7.1.
    1. Create a directory named helper_files.
    2. Extract the contents of the helper_files.zip from the release_files.zip file into the new helper_files directory.
    3. Run the following command to back up the old certificates and issuers to a file called backup.yaml:
      kubectl get --all-namespaces -oyaml issuer,clusterissuer,cert,secret > backup.yaml
    4. Run the following commands to delete the old certificates and issuers:
      kubectl delete issuers.certmanager.k8s.io selfsigning-issuer ingress-issuer -n <namespace>
      kubectl delete certs.certmanager.k8s.io ingress-ca -n <namespace>
      kubectl delete certs.certmanager.k8s.io portal-admin-client gateway-client-client analytics-client-client analytics-ingestion-client -n <namespace>
    5. Run the following command to delete the old cert-manager v0.10.1 resources, including the namespace (the old cert-manager CRDs are not deleted):
      kubectl delete -f helper_files/cert-manager-0.10.1-deploy.yaml

      Wait a few minutes for the old version of cert-manager to be deleted.

    6. Run the following command to verify that cert-manager v0.10.1 was successfully deleted:
      kubectl get ns cert-manager

      The following message indicates that cert-manager was not found:

      Error from server (NotFound): namespaces "cert-manager" not found

      Do not proceed until you are sure that cert-manager v0.10.1 was deleted.

    7. Run the following command to install cert-manager 1.71:
      kubectl apply -f helper_files/cert-manager-1.7.1.yaml
    8. Run the following command to install the new issuers:
      kubectl apply -f helper_files/ingress-issuer-v1.yaml -n <namespace>
  10. Apply the new CRDs from the Operator Release Files for Containers that you decompressed:
    kubectl apply -f ibm-apiconnect-crds.yaml
  11. Apply the new DataPower Operator YAML into the namespace where the DataPower Operator is running.
    1. If the operator is not running in the default namespace, open the ibm-datapower.yaml file in a text editor and find and replace all references to default the name of your namespace. You do not need to take this action when using Operator Lifecycle Manager (OLM).
    2. Open ibm-datapower.yaml in a text editor. Locate the image: key in the containers section of the deployment YAML file immediately after imagePullSecrets:. Replace the value of the image: key with the location of the datapower-operator image, either uploaded to your own registry or pulled from a public registry.
    3. kubectl apply -f ibm-datapower.yaml -n <namespace>

      The Gateway CR goes to Pending state when the operator is updated. The state of the Gateway CR will change to Running after installation of the API Connect operator in the next step.

    Note: There is a known labels issue with upgrading the DataPower operator where applying datapower yaml results in the following error:
    The Deployment "datapower-operator" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"datapower-operator", "app.kubernetes.io/managed-by":"datapower-operator", "app.kubernetes.io/name":"datapower-operator", "name":"datapower-operator"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

    To work around this issue, see DataPower Operator Upgrades in the DataPower documentation.

  12. Clean up the webhook configuration before deploying the 10.0.1.8-eus API Connect operator:
    kubectl get mutatingwebhookconfiguration,validatingwebhookconfiguration | grep ibm-apiconnect
    kubectl delete mutatingwebhookconfiguration/ibm-apiconnect-<namespace>-mutating-webhook-configuration
    kubectl delete validatingwebhookconfiguration/ibm-apiconnect-<namespace>-validating-webhook-configuration
  13. Apply the new API Connect operator YAML into the namespace where the API Connect operator is running.
    • Single-namespace deployment:
      1. If the operator is not running in the default namespace, open the ibm-apiconnect.yaml file in a text editor and find and replace all references to default the name of your namespace. You do not need to take this action when using Operator Lifecycle Manager (OLM).
      2. Open ibm-apiconnect.yaml in a text editor. Replace the value of each image: key with the location of the apiconnect operator images (from the ibm-apiconnect container and the ibm-apiconnect-init container), either uploaded to your own registry or pulled from a public registry.
      3. kubectl apply -f ibm-apiconnect.yaml -n <namespace>
    • Multi-namespace deployment:
      1. Locate and open the newly downloaded ibm-apiconnect-distributed.yaml in a text editor of choice. Then, find and replace each occurrence of $OPERATOR_NAMESPACE with <namespace>, replacing <namespace> with the desired namespace for the deployment. In this example, the namespace is operator.
      2. Also in ibm-apiconnect-distributed.yaml, locate the image: keys in the containers sections of the deployment yaml right below imagePullSecrets:. Replace the placeholder values REPLACE-DOCKER-REGISTRY of the image: keys with the docker registry host location of the apiconnect operator image (either uploaded to own registry or pulled from public registry).
      3. Install ibm-apiconnect-distributed.yaml with the following command
        kubectl apply -f ibm-apiconnect-distributed.yaml

    When the API Connect operator deployment is updated, it detects that the existing version 10 pods for all subsystems have labels that no longer match, and tries to fix labels. When fixing the labels, most of the microservices (for all subsystems) are recreated. All the subsystem CRs go into Pending state and then into Running state. Management subsystem microservices are recreated, with the exception of postgres/NATS components, at the end of the process.

  14. Use apicops v10 version 0.10.57+ to validate the certificates.
    1. Run the following command:
      apicops upgrade:stale-certs -n <namespace>
    2. Delete any stale certificates that are managed by cert-manager.
      If a certificate failed the validation and it is managed by cert-manager, you can delete the stale certificate secret, and let cert-manager regenerate it. Run the following command:
      kubectl delete secret <stale-secret> -n <namespace>
    3. Restart the corresponding so that it can pick up the new secret.
      To determine which pod to restart, see the following topics:

    For information on the apicops tool, see The API Connect operations tool: apicops.

  15. Delete the Postgres pods, which refreshes the new certificate.

    If the managementcluster CR is not ready, run the following command to delete the Postgres pods and refresh the new certificate:

    kubectl get pod -n <namespace> --no-headers=true | grep postgres | grep -v backup | awk '{print $1}' | xargs kubectl delete pod -n <namespace>
  16. Run the following command to delete the old gateway certificates:
    kubectl delete certs.certmanager.k8s.io gateway-service gateway-peering -n <namespace>
  17. Verify that the API Connect operator recreated the necessary microservices as part of the label updates in step 13:

    The microservices must be available for upgrading the subsystems (operands).

    kubectl get apic -n <namespace>
  18. Upgrade the subsystems (operands) by updating the CRs.
    Important: When you save each updated CR, the version change is detected by the operator, which triggers the subsystem upgrade. To ensure a successful upgrade, update the subsystem CRs in the following required sequence:
    1. Management
    2. Portal
    3. Analytics
    4. Gateway

    For each CR, make the following changes:

    1. Update the endpoint ingress-issuer for the new version of cert-manager.

      Change the annotations setting from:

          annotations:
              certmanager.k8s.io/issuer: ingress-issuer

      to:

          annotations:
              cert-manager.io/issuer: ingress-issuer
    2. In the Gateway CR, remove the template override section, if it exists.

      You cannot perform an upgrade if the CR contains an override.

    3. Update the version to reflect the new release of API Connect:

      For example:

      version: 10.0.1.8-eus

      When you save each updated CR, the version change is detected by the operator, which triggers the subsystem upgrade.

      Note: On the version change, you might see the following webhook error message:
      admission webhook "vapiconnectcluster.kb.io" denied the request: APIConnectCluster.apiconnect.ibm.com "<instance-name>" is invalid: spec.license.license: Invalid value: "L-RJON-BYGHM4": License L-RJON-BYGHM4 is invalid for the chosen version 10.0.1.7-eus. Please refer license document https://ibm.biz/apiclicenses

      To correct the webhook error, visit https://ibm.biz/apiclicenses and select the appropriate License ID for the new version of API Connect, and then re-apply your CR with the newer license value. The updated CR spec should look like the following example:

      spec:
        license:
          accept: true
          use: production
          license: license-value
  19. Verify that the upgraded subsystems report as Running.

    Run the following command:

    kubectl get apic --all-namespaces

    The Management, Analytics, and Portal subsystems should report as Running. The Gateway subsystem will not be running until you complete the next step to correct peering issues.

    Example response:

    NAME                                                READY   STATUS    VERSION              RECONCILED VERSION      AGE
    analyticscluster.analytics.apiconnect.ibm.com/analytics      8/8     Running   10.0.1.15-eus   10.0.1.15-eus-1074   121m
    
    NAME                                     PHASE     READY   SUMMARY                           VERSION    AGE
    datapowerservice.datapower.ibm.com/gw1   Running   True    StatefulSet replicas ready: 1/1   10.0.1.15-eus   100m
    
    NAME                                     PHASE     LAST EVENT   WORK PENDING   WORK IN-PROGRESS   AGE
    datapowermonitor.datapower.ibm.com/gw1   Running                false          false              100m
    
    NAME                                            READY   STATUS    VERSION              RECONCILED VERSION      AGE
    gatewaycluster.gateway.apiconnect.ibm.com/gw1   2/2     Running   10.0.1.15-eus   10.0.1.15-eus-1074  100m
    
    NAME                                                 READY   STATUS    VERSION              RECONCILED VERSION      AGE
    managementcluster.management.apiconnect.ibm.com/m1   16/16   Running   10.0.1.15-eus   110.0.1.15-eus-1074   162m
    
    
    NAME                                             READY   STATUS    VERSION              RECONCILED VERSION      AGE
    portalcluster.portal.apiconnect.ibm.com/portal   3/3     Running   10.0.1.15-eus   10.0.1.15-eus-1074   139m
  20. After the subsystem upgrades, scale the Gateways pods down and back up to correct peering issues caused by the ingress issuer change.
    1. Scale down the Gateway firmware containers:
      1. Edit the Gateway subsystem CR:
        kubectl edit gw <gateway-cr-name>
      2. Set the replicaCount to 0 (you might need to add this setting) and save the change:
        ...
        spec:
          replicaCount: 0
        ...
    2. Wait for Gateway firmware pods to scale down and terminate.

      Do not proceed until the pods have terminated.

    3. Scale up the Gateway firmware containers back to the original value:
      1. Edit the Gateway subsystem CR:
        kubectl edit gw <gateway-cr-name>
      2. Set the replicaCount to its original value (or delete the setting) and save the change:
        ...
        spec:
          replicaCount: 0
        ...
    4. Run the following command and verify that the all subsystems (including Gateway) now report the STATUS as Running and the RECONCILED VERSION as 10.0.1.8-eus:
      kubectl get apic --all-namespaces

      For example:

      NAME                                                      READY   STATUS    VERSION              RECONCILED VERSION      AGE
      analyticscluster.analytics.apiconnect.ibm.com/analytics   8/8     Running   10.0.1.8-eus   10.0.1.8-eus-5352   121m
      
      NAME                                     PHASE     READY   SUMMARY                           VERSION    AGE
      datapowerservice.datapower.ibm.com/gw1   Running   True    StatefulSet replicas ready: 1/1   10.0.1.8-eus   100m
      
      NAME                                     PHASE     LAST EVENT   WORK PENDING   WORK IN-PROGRESS   AGE
      datapowermonitor.datapower.ibm.com/gw1   Running                false          false              100m
      
      NAME                                            READY   STATUS    VERSION              RECONCILED VERSION      AGE
      gatewaycluster.gateway.apiconnect.ibm.com/gw1   2/2     Running   10.0.1.8-euss   10.0.1.8-eus-5352  100m
      
      NAME                                                 READY   STATUS    VERSION              RECONCILED VERSION      AGE
      managementcluster.management.apiconnect.ibm.com/m1   16/16   Running   10.0.1.8-eus   10.0.1.8-eus-5352   162m
      
      
      NAME                                             READY   STATUS    VERSION              RECONCILED VERSION      AGE
      portalcluster.portal.apiconnect.ibm.com/portal   3/3     Running   10.0.1.8-eus   10.0.1.8-eus-5352   139m

    If the Gateway pods appear to be out-of-sync with the Management subsystem after upgrading, see Gateway pods not in sync with Management after upgrade.

  21. Clean up the remaining stale certificates and issuers (used by the older version of cert-manager):

    Complete the following steps for each of the namespaces used in your deployment (for example, mgmt, ptl, a7s, and gtw.

    1. Run the following commands to get lists of stale certificates and issuers:
      kubectl get certs.certmanager.k8s.io <namespace>
      kubectl get issuers.certmanager.k8s.io -n <namespace>
    2. Run the following commands to delete the resources:
      kubectl delete certs.certmanager.k8s.io <list-of-stale-certificates> -n <namespace>
      kubectl delete issuers.certmanager.k8s.io <list-of-stale-issuers> -n <namespace>
    3. Repeat the process for the remaining namespaces.

What to do next

After the upgrade to 10.0.1.8-eus is complete, upgrade to the latest version.