Upgrading a two data center deployment
How to upgrade a two data center disaster recovery (DR) deployment on Kubernetes and OpenShift.
For general information about two data center disaster recovery in API Connect, see A two data center deployment strategy on Kubernetes and OpenShift.
Key points when upgrading a two data center DR deployment
Both data centers must be using the same version of API Connect, including down to the specific fix pack and iFix level, as appropriate. Each data center is effectively an identical deployment instance, so they must be kept on identical code level bases. Therefore, both data centers must be upgraded in the same maintenance window.
passive
(standby) data center first, and then upgrading it on the
active
(primary) data center. This ordering of the upgrade is important, as the
replication between the primary and the standby subsystems might fail if the
passive
subsystem is older than the active
subsystem. Use the
following instructions to upgrade your subsystems, paying particular attention to the detailed
instructions for the Management subsystem:- The Management subsystem cluster:
- Apply the v10.0.1.next CRDs on the
passive
cluster. For example:kubectl apply -f ibm-apiconnect-crds.yaml
- Update the
passive
ibm-apiconnect
operator image to v10.0.1.next (note update only the operator). For example:kubectl apply -f ibm-apiconnect.yaml
Note: If your two data center deployment used custom namespaces during the installation, you must ensure that those same namespaces are used in all of the files that are needed for the upgrade that require namespaces, for example theibm-apiconnect
operator yaml file. - Wait for the Management subsystem status to become
Running
. Note that the new operator should reconcile the old version operand, and this might take some time. For example:kubectl get mgmt -n passive_mgmt_cluster NAME READY STATUS VERSION RECONCILED VERSION AGE management 7/7 Running 10.0.1.2-ifix2-eus 10.0.1.2-ifix2-100-eus 4h43m
- Apply the v10.0.1.next CRDs on the
active
cluster. For example:kubectl apply -f ibm-apiconnect-crds.yaml
- Update the
active
ibm-apiconnect
operator image to v10.0.1.next (note update only the operator). For example:kubectl apply -f ibm-apiconnect.yaml
Note: If your two data center deployment used custom namespaces during the installation, you must ensure that those same namespaces are used in all of the files that are needed for the upgrade that require namespaces, for example theibm-apiconnect
operator yaml file. - Wait for the Management subsystem status to become
Running
. Note that the new operator should reconcile the old version operand, and this might take some time. For example:kubectl get mgmt -n active_mgmt_cluster NAME READY STATUS VERSION RECONCILED VERSION AGE management 7/7 Running 10.0.1.2-ifix2-eus 10.0.1.2-ifix2-100-eus 4h43m
- Remove the multi-site HA CR section from the
passive
Management subsystem. As soon as the multi-site HA configuration is removed, thepassive
Management subsystem converts itself to an independent cluster where you can see all the microservices. - Wait for the Management subsystem status to become
Running
on thepassive
Management subsystem. For example:kubectl get mgmt -n passive_mgmt_cluster NAME READY STATUS VERSION RECONCILED VERSION AGE management 7/7 Running 10.0.1.2-ifix2-eus 10.0.1.2-ifix2-100-eus 4h43m
- Remove the multi-site HA CR section from the
active
Management subsystem. As soon as the multi-site HA configuration is removed, theactive
Management subsystem converts itself to an independent cluster where you can see all the microservices. - Wait for the Management subsystem status to become
Running
on theactive
Management subsystem. For example:
At this point you have two independent Management clusters.kubectl get mgmt -n active_mgmt_cluster NAME READY STATUS VERSION RECONCILED VERSION AGE management 7/7 Running 10.0.1.2-ifix2-eus 10.0.1.2-ifix2-100-eus 4h43m
- Now upgrade both clusters to
10.0.1.<next>-eus
in parallel. For example:kubectl -n $NAMESPACE edit mgmt mgmt_cluster
version: 10.0.1.5-eus
- For both clusters, wait for the Management subsystem status to become
Running
. For example:kubectl get mgmt -n mgmt_cluster NAME READY STATUS VERSION RECONCILED VERSION AGE management 16/16 Running 10.0.1.5-eus 10.0.1.5-xxx-eus 5h46m
- Now, add the multi-site HA CR section back into the previously
active
Management subsystem. - Wait for the Management subsystem status to become
Running
on theactive
Management subsystem. For example:kubectl get mgmt -n active_mgmt_cluster NAME READY STATUS VERSION RECONCILED VERSION AGE management 17/17 Running 10.0.1.5-eus 10.0.1.5-xxxx-eus 5h56m
- Add the multi-site HA CR section back into the previously
passive
Management subsystem. - Wait for the Management subsystem status to become
Running
on thepassive
Management subsystem. For example:
You should see a limited set of microservices on thekubectl get mgmt -n passive_mgmt_cluster NAME READY STATUS VERSION RECONCILED VERSION AGE management 7/7 Running 10.0.1.5-eus 10.0.1.5-xxxx-eus 5h46m
passive
Management subsystem, for example7/7
. - Run the following command on the
active
Management subsystem to check that the API Manager service is ready for traffic:
The service is ready for traffic when thekubectl describe ServiceName
Ha mode
part of theStatus
object is set toactive
. For example:Status: ... Ha mode: active ...
- Apply the v10.0.1.next CRDs on the
- The Gateway subsystem cluster.
- The Portal subsystem cluster;
passive
followed byactive
.Note: If you upgrade from before Version 10.0.1.1-eus to Version 10.0.1.1 or later, you must upgrade the passive and active sites in parallel. This method means that both sites are down and then they restart their pods at the new versions in parallel so that they can cluster together. - The Analytics subsystem cluster.
You must verify that each subsystem has been updated successfully on both the
passive
and the active
data centers before upgrading the next
subsystem.
The Developer Portal upgrade takes place in two stages. Firstly, the system is upgraded, which updates the internal deployment scripts, packages, and so on. Then, as soon as the system upgrade is complete, the individual Portal sites are upgraded to the new level of the Portal UI code. This second stage, the Portal site upgrades, doesn't start until the first stage is complete and all of the pods (in both data centers) have been upgraded to the new version. The upgrade of the sites then starts automatically, a few at a time, spread across both data centers to allow the upgrade to run in parallel.
How to upgrade when one data center is down
In this scenario, one data centre is down and you have to failover to the other data center in your deployment. So, you are now running a two data center deployment, but only one data center is online and is in active mode. However, a critical fix must be applied before you can recover the failed data center. In this instance, you must apply the upgrade to the remaining active data center. Then, when you are ready to bring the failed data center back online, you must first remove the multi-site HA section from the failed data center CR, and then add the multi-site HA section back in again making that data center passive. This action triggers a re-sync with the active data center, and ensures that the passive data center contains the latest data.
- Remove the
multiSiteHA
section from the subsystem CR on the passive data center. - Apply the updated CR file to the subsystem on the passive data center, for
example:
Wherekubectl apply -f subsystem_cr.yaml -n <namespace>
subsystem_cr.yaml
is the file name of the subsystem CR, and<namespace>
is the target installation namespace in the Kubernetes cluster for the passive data center. - Add the
multiSiteHA
section back into the subsystem CR on the passive data center, for example:multiSiteHA: mode: passive replicationEndpoint: annotations: cert-manage.io/issuer: ingress-issuer hosts: - name: mgrreplicationraleigh.cluster2.example.com secretName: raleigh-mgr-replication-worker-1 replicationPeerFQDN: mgrreplicationdallas.cluster1.example.com tlsClient: secretName: mgr-replication-client
- Apply the updated CR file to the subsystem on the passive data center again, for
example:
kubectl apply -f subsystem_cr.yaml -n <namespace>