Recovering from a failover of a two data center deployment

What to do after completing a 2DCDR failover.

Ensure that you have read and understand the concepts of 2DCDR. See Two data center deployment strategy on Kubernetes and OpenShift and Key concepts of 2DCDR and failure scenarios.

This topic describes what to do after you have completed a failover of your 2DCDR deployment, as described in How to failover API Connect from the active to the warm-standby data center.

If after the failover operation, the failed data center was successfully updated to warm-standby, then verify that replication is working: Verifying replication between data centers. If replication is working, you can either:

If your failed data center could not be updated to warm-standby, then ensure that the network links between the data centers are disabled. If the network links remain enabled then a split-brain could occur if your failed data center recovers before you are able to set it to warm-standby.

When you are able to recover the failed data center, ensure that API Connect is set to warm-standby before restoring the network connectivity to the active data center.

If you expect your failed data center to be down for a long time, then convert your active data center to a stand-alone deployment:
  • Remove the multiSiteHA section from the spec sections of your management and portal CRs.
The steps to restore 2DCDR when the failed data center is recovered depend on the state of API Connect on the recovered data center.
  • If API Connect is still working on the recovered data center, re-enable 2DCDR as follows:
    1. Add the multiSiteHA section back to your working stand-alone deployment. Ensure that it is set to active.
    2. Ensure that the multiSiteHA section is set to warm-standby (passive) on the recovered data center.
    3. Restore the network links between data centers.
  • If the API Connect installation on the failed data center is not recoverable, then reinstall API Connect on this data center, and re-enable 2DCDR as follows:
    1. Add the multiSiteHA section back to your working stand-alone deployment. Ensure that it is set to active.
    2. Ensure that the multiSiteHA section is set to warm-standby (passive) on the recovered data center.
    3. Copy the ingress-ca X.509 certificate from the active data center and apply it on your reinstalled warm-standby data center. For more information, see Check the ingress-ca X.509 certificates match.
    4. Restore the network links between data centers.
Note: If your original active data center was in a failed state for some time, when it is recovered to warm-standby state it will take time for the data from your active data center to replicate across. The time that is taken depends on the size of your management and portal databases and how many changes were made to them while your original active data center was in a failed state.