Upgrading a 2DCDR deployment on Kubernetes and Red Hat OpenShift from V10.0.5

How to upgrade a two data center disaster recovery deployment to V10.0.8 from V10.0.5 on Kubernetes and Red Hat OpenShift.

To upgrade API Connect in a 2DCDR deployment, use the upgrade instructions for your platform, but with the extra considerations and steps documented in this topic.
Note: For OpenShift® users: The steps that are detailed in this topic use the Kubernetes kubectl command. On OpenShift, use the equivalent oc command in its place.

Before you begin

Key points applicable to both management and portal subsystems

  • Your API Connect deployments must be upgraded to the same API Connect release, down to the interim fix level.
  • Both data centers must be upgraded in the same maintenance window.
  • Upgrade steps might result in an update to the ingress-ca X.509 certificate. Extra steps must be taken at various points in the upgrade process to ensure that the ingress-ca secret in both data centers contains the same ingress-ca X.509 certificate.
Note: If you are not using cert-manager and you customized your certificates, the ingress-ca certificate might have a different name. Ensure that the CA certificate that is used by both data centers is the same during all stages of the upgrade.

Upgrading to V10.0.8: Steps for management upgrades

  1. Upgrade the warm-standby data center first.
    Follow the upgrade steps for your platform:
    When the warm-standby upgrade is complete, the warm-standby management CR reports an HA status error:
    kubectl get mgmt -A
    returns:
    NAMESPACE   NAME   READY   STATUS    VERSION    RECONCILED VERSION   MESSAGE                                            AGE
    ns1         m2     2/19    Blocked   10.0.8.1   10.0.5.5-6227        HA status Error - see HAStatus in CR for details   83m
    
    Confirm the blocked status with the following command:
    kubectl get mgmt <management CR name> -o json| jq -r '.status.conditions[] | select(.status=="True")
    returns:
    {
      "lastTransitionTime": "2024-05-21T16:05:34Z",
      "message": "HA status Error - see HAStatus in CR for details",
      "reason": "na",
      "status": "True",
      "type": "Blocked"
    }
    Check the HA status with the following command:
    kubectl get mgmt <management CR name> -o json | jq -r '.status.haStatus[] | select(.status=="True")'
    returns:
    {
      "lastTransitionTime": "2024-05-21T16:05:34Z",
      "message": "Remote HAMode is Empty (Not received from peer). Expected it to be in either active or setup complete phase",
      "reason": "na",
      "status": "True",
      "type": "Error"
    }

    Do not proceed until the warm-standby reports this status.

  2. Verify that both data centers have the same ingress-ca X.509 certificate in their ingress-ca secret. Run the following command in both data centers and check that the output is the same:
    openssl x509 -noout -fingerprint -sha256 -in <(kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d)
    if you do not have the openssl command available, you can instead run only the kubectl part, which produces a larger output:
    kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d
    if the outputs are different, follow these steps to synchronize the certificates: Synchronizing the ingress-ca X.509 certificate across data centers.
  3. Upgrade the active data center, following the steps for your platform:

Steps for portal upgrade

  1. Verify that both data centers have the same ingress-ca X.509 certificate in their ingress-ca secret. Run the following command in both data centers and check that the output is the same:
    openssl x509 -noout -fingerprint -sha256 -in <(kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d)
    if you do not have the openssl command available, you can instead run only the kubectl part, which produces a larger output:
    kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d
    if the outputs are different, follow these steps to synchronize the certificates: Synchronizing the ingress-ca X.509 certificate across data centers.
  2. Start the upgrade of your warm-standby data center by following the upgrade documentation for your platform. Stop at the point where the portal CR is updated with the new API Connect version.
  3. Verify that both data centers still have the same ingress-ca X.509 certificate, repeating step 1. If they are different, then follow these steps: Synchronizing the ingress-ca X.509 certificate across data centers.
  4. Complete the upgrade of your warm-standby data center by updating the portal subsystem CR, following the remaining upgrade steps for your platform. Do not wait for the warm-standby to reach READY state before starting the upgrade on the active data center (in certain circumstances the warm-standby portal does not reach READY state until the active data center is upgraded).

    For example, let's assume that both DCs have the portal cluster (PTL) CR in Warning state with a message that says, “Full file synchronization running". You can move both DCs from the Warning state by upgrading the Active state.

  5. Start the upgrade of your active data center by following the upgrade documentation for your platform. Stop at the point where the portal CR is updated with the new API Connect version.
  6. Verify that both data centers still have the same ingress-ca X.509 certificate, repeating step 1. If they are different, then follow these steps: Synchronizing the ingress-ca X.509 certificate across data centers.
  7. Upgrade the portal subsystem in your active data center by updating the portal subsystem CR, following the remaining upgrade steps for your platform.

How to upgrade when one data center is down

If API Connect is still running on the failed data center, follow the steps that are documented previously to upgrade both data centers before you bring the failed data center back online.

If the failed data center is expected to be down for a long time, you can convert the active data center to a stand-alone data center by following these steps: Removing a two data center deployment, but note the following points:
  1. Ensure that the network links to the failed data center are removed.
  2. Ensure that the failed data center is set to warm-standby in the multiSiteHA section. Do not proceed to the next step until the data center completes the transition to warm-standby. View the status of the management and portal CRs to confirm that HA Mode reports passive.
  3. Remove the multiSiteHA section from failed data center, and verify that the failed data center resets itself to become an empty stand-alone API Connect deployment (all data is deleted).
  4. Before you restore the network links between the data centers, do the following operations:
    • Upgrade API Connect on the failed data center to the same version as the active.
    • Add the multiSiteHA sections to both data centers, setting the failed data center to be warm-standby.
      Important: Do not set the failed data center to be active in the multiSiteHA section because it results in an overwrite of the data on your working data center with the empty database of your failed data center.