Upgrading a two data center deployment on Kubernetes and OpenShift
How to upgrade a two data center disaster recovery (DR) deployment on Kubernetes and OpenShift.
To upgrade API Connect in a 2DCDR deployment,
use the upgrade instructions for your platform, but with the extra considerations and steps
documented in this topic.
Note: For OpenShift users: The steps that are detailed in this topic use the
Kubernetes
kubectl
command. On OpenShift, use the equivalent oc
command in its place.Before you begin
- Ensure that API Connect is running in both data centers and that replication is working: Verifying replication between data centers.
- Ensure that you have recent backups of Management and Portal subsystems: Backup and restore requirements for a two data center deployment.
Key points applicable to both Management and Portal subsystems
- Your API Connect deployments must be upgraded to the same API Connect release, down to the interim fix level.
- Both data centers must be upgraded in the same maintenance window.
- Pre-upgrade operations in the upgrade documentation for your platform might result in an update
to the
ingress-ca
X.509 certificate. Extra steps must then be taken during the 2DCDR deployment upgrade to ensure that both data centers always use the sameingress-ca
X.509 certificate.
Note: If you are not using cert-manager and you customized your certificates, the
ingress-ca
certificate might have a different name. Ensure that the CA certificate
that is used by both data centers is the same during all
stages of the upgrade.Steps for Management upgrades
- Remove the
multiSiteHA
section from the management CRs in both data centers, converting them both to stand-alone deployments.Note: When you remove themultiSiteHA
section from the warm-standby data center, all data is deleted from it. - Verify that the management subsystem in both data centers is running as a stand-alone, and that
all the pods are running:
If one of the data centers returnskubectl get mgmt -n <namespace> NAME READY STATUS VERSION RECONCILED VERSION AGE <management instance name> 17/17 Running <version> <version>-xxxx 14d
8/8
pods then this means it is still running as a warm-standby, wait a few minutes for it to complete the transition to a stand-alone data center. Do not move on until both data centers show the same number of pods. Runkubectl describe
and confirm that themultiSiteHA
section is gone from thespec
section, and that theStatus
section showsha mode
of active in both data centers:kubectl describe mgmt -n <namespace> ... Status Ha Mode: active ...
- Upgrade both data centers, following the steps for your platform:
- Verify
that both data centers have the same
ingress-ca
X.509 certificate in theiringress-ca
secret. Run the following command in both data centers and check that the output is the same:
if you do not have theopenssl x509 -noout -fingerprint -sha256 -in <(kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^ tls.crt:" | awk '{print $2}' | base64 -d)
openssl
command available, you can instead run only thekubectl
part, which produces a larger output:
if the outputs are different, follow these steps to synchronize the certificates: Synchronizingkubectl get secret ingress-ca -n <namespace> -o yaml | grep "^ tls.crt:" | awk '{print $2}' | base64 -d
ingress-ca
X.509 certificates. - Add the
multiSiteHA
sections to the management CRs, setting one of them to be the active and the other to be the warm-standby.
Steps for Portal upgrades
- Verify
that both data centers have the same
ingress-ca
X.509 certificate in theiringress-ca
secret. Run the following command in both data centers and check that the output is the same:
if you do not have theopenssl x509 -noout -fingerprint -sha256 -in <(kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^ tls.crt:" | awk '{print $2}' | base64 -d)
openssl
command available, you can instead run only thekubectl
part, which produces a larger output:
if the outputs are different, follow these steps to synchronize the certificates: Synchronizingkubectl get secret ingress-ca -n <namespace> -o yaml | grep "^ tls.crt:" | awk '{print $2}' | base64 -d
ingress-ca
X.509 certificates. - Start the upgrade of your warm-standby data center by following the upgrade documentation for your platform. Stop at the point where the portal CR is updated with the new API Connect version.
- Verify that both data centers still have the same
ingress-ca
X.509 certificate, repeating step 1. If they are different, then follow these steps: Synchronizing ingress-ca X.509 certificates. - Complete the upgrade of your warm-standby data center by
updating the portal subsystem CR, following the remaining upgrade steps for your platform. Do not
wait for the warm-standby to
reach READY state before starting the upgrade on the active data center (in certain circumstances
the warm-standby portal does
not reach READY state until the active data center is upgraded).
For example, let's assume that both DCs have the portal cluster (PTL) CR in Warning state with a message that says, “Full file synchronization running". You can move both DCs from the Warning state by upgrading the Active state.
- Start the upgrade of your active data center by following the upgrade documentation for your platform. Stop at the point where the portal CR is updated with the new API Connect version.
- Verify that both data centers still have the same
ingress-ca
X.509 certificate, repeating step 1. If they are different, then follow these steps: Synchronizing ingress-ca X.509 certificates. - Upgrade the portal subsystem in your active data center by updating the portal subsystem CR, following the remaining upgrade steps for your platform.
Synchronizing the ingress-ca
X.509 certificate across data
centers
Follow these steps to extract your
ingress-ca
X.509 certificate from your source
data center and prepare it for application on your target data center:- Determine which data center has the
ingress-ca
Kubernetes cert-manager certificate object certificate:
this is your source data center.kubectl get certificates -n <namespace> | grep ingress-ca
- Extract the
ingress-ca
secret from your source data center to a file callednew-ca-issuer-secret.yaml
:kubectl get secret ingress-ca -o yaml -n <namespace> > new-ca-issuer-secret.yaml
- Edit the
new-ca-issuer-secret.yaml
file and remove thecreationTimestamp
,resourceVersion
,uid
,namespace
, andmanagedFields
. Remove the labels and annotations sections completely. The resulting contents should include theingress-ca
X.509 certificate, and the secret name:apiVersion: v1 data: ca.crt: <long cert string> tls.crt: <long cert string> tls.key: <long cert string> kind: Secret metadata: name: ingress-ca type: kubernetes.io/tls
- Copy the
new-ca-issuer-secret.yaml
to the target data center.
Follow these steps to apply the extracted
ingress-ca
X.509 certificate on your target data center:- To apply the
new-ca-issuer-secret.yaml
file, run:kubectl apply -f new-ca-issuer-secret.yaml -n <namespace>
- Regenerate all
ingress-ca
end-entity certificates:
All affected pods should automatically restart. For more information on regenerating certificates, see: Renewing certificates with cert-manager.kubectl get secrets -n <namespace> -o custom-columns='NAME:.metadata.name,ISSUER:.metadata.annotations.cert-manager\.io/issuer-name' --no-headers=true | grep ingress-issuer | awk '{ print $1 }' | xargs kubectl delete secret -n <namespace>
How to upgrade when one data center is down
If API Connect is still running on the failed data center, follow the steps that are documented previously to upgrade both data centers, before you bring the failed data center back online.
If the failed data center is expected to be down for a long time, you can convert the active data
center to a stand-alone data center following these steps: Removing a two data center deployment, but note the following points:
- Ensure that the network links to the failed data center are removed.
- Ensure that the failed data center is set to warm-standby in the
multiSiteHA
section. Do not proceed to the next step until the data center completes the transition to warm-standby. View the status of the management and portal CRs to confirm thatHA Mode
reportspassive
. - Remove the
multiSiteHA
section from failed data center, and ensure that the failed data center resets itself to become an empty stand-alone API Connect deployment (all data is deleted). - Before you restore the network links between the data centers, do the following:
- Upgrade API Connect on the failed data center to the same version as the active.
- Add the
multiSiteHA
sections to both data centers, setting the failed data center to be warm-standby.Important: Do not set the failed data center to be active in themultiSiteHA
section because it results in an overwrite of the data on your working data center with the empty database of your failed data center.