Upgrading a two data center deployment on Kubernetes and OpenShift

How to upgrade a two data center disaster recovery (DR) deployment on Kubernetes and OpenShift.

To upgrade API Connect in a 2DCDR deployment, use the upgrade instructions for your platform, but with the extra considerations and steps documented in this topic.

Note: For OpenShift users: The steps that are detailed in this topic use the Kubernetes kubectl command. On OpenShift, use the equivalent oc command in its place.

Before you begin

Ensure that API Connect is running in both data centers and that replication is working: Verifying replication between data centers.
Ensure that you have recent backups of Management and Portal subsystems: Backup and restore requirements for a two data center deployment.

Key points applicable to both Management and Portal subsystems

Your API Connect deployments must be upgraded to the same API Connect release, down to the interim fix level.
Both data centers must be upgraded in the same maintenance window.
Pre-upgrade operations in the upgrade documentation for your platform might result in an update to the ingress-ca X.509 certificate. Extra steps must then be taken during the 2DCDR deployment upgrade to ensure that both data centers always use the same ingress-ca X.509 certificate.

Note: If you are not using cert-manager and you customized your certificates, the ingress-ca certificate might have a different name. Ensure that the CA certificate that is used by both data centers is the same during all stages of the upgrade.

Steps for Management upgrades

Remove the multiSiteHA section from the management CRs in both data centers, converting them both to stand-alone deployments.
Note: When you remove the multiSiteHA section from the warm-standby data center, all data is deleted from it.
Verify that the management subsystem in both data centers is running as a stand-alone, and that all the pods are running:
```
kubectl get mgmt -n <namespace>

NAME                         READY   STATUS    VERSION    RECONCILED VERSION   AGE
<management instance name>   17/17   Running   <version>   <version>-xxxx        14d
```
If one of the data centers returns 8/8 pods then this means it is still running as a warm-standby, wait a few minutes for it to complete the transition to a stand-alone data center. Do not move on until both data centers show the same number of pods. Run kubectl describe and confirm that the multiSiteHA section is gone from the spec section, and that the Status section shows ha mode of active in both data centers:
```
kubectl describe mgmt -n <namespace>
...
Status
  Ha Mode:                   active
...
```
Upgrade both data centers, following the steps for your platform:
- Upgrading subsystems on native Kubernetes.
- Upgrading on OpenShift in an online environment.
Verify that both data centers have the same ingress-ca X.509 certificate in their ingress-ca secret. Run the following command in both data centers and check that the output is the same:
```
openssl x509 -noout -fingerprint -sha256 -in <(kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d)
```
if you do not have the openssl command available, you can instead run only the kubectl part, which produces a larger output:
```
kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d
```
if the outputs are different, follow these steps to synchronize the certificates: Synchronizing ingress-ca X.509 certificates.
Add the multiSiteHA sections to the management CRs, setting one of them to be the active and the other to be the warm-standby.

Steps for Portal upgrades

Verify that both data centers have the same ingress-ca X.509 certificate in their ingress-ca secret. Run the following command in both data centers and check that the output is the same:
```
openssl x509 -noout -fingerprint -sha256 -in <(kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d)
```
if you do not have the openssl command available, you can instead run only the kubectl part, which produces a larger output:
```
kubectl get secret ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d
```
if the outputs are different, follow these steps to synchronize the certificates: Synchronizing ingress-ca X.509 certificates.
Start the upgrade of your warm-standby data center by following the upgrade documentation for your platform. Stop at the point where the portal CR is updated with the new API Connect version.
Verify that both data centers still have the same ingress-ca X.509 certificate, repeating step 1. If they are different, then follow these steps: Synchronizing ingress-ca X.509 certificates.
Complete the upgrade of your warm-standby data center by updating the portal subsystem CR, following the remaining upgrade steps for your platform. Do not wait for the warm-standby to reach READY state before starting the upgrade on the active data center (in certain circumstances the warm-standby portal does not reach READY state until the active data center is upgraded).
For example, let's assume that both DCs have the portal cluster (PTL) CR in Warning state with a message that says, “Full file synchronization running". You can move both DCs from the Warning state by upgrading the Active state.
Start the upgrade of your active data center by following the upgrade documentation for your platform. Stop at the point where the portal CR is updated with the new API Connect version.
Verify that both data centers still have the same ingress-ca X.509 certificate, repeating step 1. If they are different, then follow these steps: Synchronizing ingress-ca X.509 certificates.
Upgrade the portal subsystem in your active data center by updating the portal subsystem CR, following the remaining upgrade steps for your platform.

Synchronizing the `ingress-ca` X.509 certificate across data centers

Follow these steps to extract your ingress-ca X.509 certificate from your source data center and prepare it for application on your target data center:

Determine which data center has the ingress-ca Kubernetes cert-manager certificate object certificate:
```
kubectl get certificates -n <namespace> | grep ingress-ca
```
this is your source data center.
Extract the ingress-ca secret from your source data center to a file called new-ca-issuer-secret.yaml:
```
kubectl get secret ingress-ca -o yaml -n <namespace>  > new-ca-issuer-secret.yaml
```
Edit the new-ca-issuer-secret.yaml file and remove the creationTimestamp, resourceVersion, uid, namespace, and managedFields. Remove the labels and annotations sections completely. The resulting contents should include the ingress-ca X.509 certificate, and the secret name:
```
apiVersion: v1
data:
  ca.crt: <long cert string>
  tls.crt: <long cert string>
  tls.key: <long cert string>
kind: Secret
metadata:
  name: ingress-ca
type: kubernetes.io/tls
```
Copy the new-ca-issuer-secret.yaml to the target data center.

Follow these steps to apply the extracted ingress-ca X.509 certificate on your target data center:

To apply the new-ca-issuer-secret.yaml file, run:

kubectl apply -f new-ca-issuer-secret.yaml -n <namespace>

Regenerate all ingress-ca end-entity certificates:

kubectl get secrets -n <namespace> -o custom-columns='NAME:.metadata.name,ISSUER:.metadata.annotations.cert-manager\.io/issuer-name' --no-headers=true | grep ingress-issuer | awk '{ print $1 }' | xargs kubectl delete secret -n <namespace>

All affected pods should automatically restart. For more information on regenerating certificates, see: Renewing certificates with cert-manager.

How to upgrade when one data center is down

If API Connect is still running on the failed data center, follow the steps that are documented previously to upgrade both data centers, before you bring the failed data center back online.

If the failed data center is expected to be down for a long time, you can convert the active data center to a stand-alone data center following these steps: Removing a two data center deployment, but note the following points:

Ensure that the network links to the failed data center are removed.
Ensure that the failed data center is set to warm-standby in the multiSiteHA section. Do not proceed to the next step until the data center completes the transition to warm-standby. View the status of the management and portal CRs to confirm that HA Mode reports passive.
Remove the multiSiteHA section from failed data center, and ensure that the failed data center resets itself to become an empty stand-alone API Connect deployment (all data is deleted).
Before you restore the network links between the data centers, do the following:
- Upgrade API Connect on the failed data center to the same version as the active.
- Add the multiSiteHA sections to both data centers, setting the failed data center to be warm-standby.
  Important: Do not set the failed data center to be active in the multiSiteHA section because it results in an overwrite of the data on your working data center with the empty database of your failed data center.