Recovering from a failover of a two data center deployment
How to recover a two data center disaster recovery deployment on Kubernetes and OpenShift.
Before you begin
Ensure that you understand the concepts of two data center disaster recovery in API Connect. For more information, see A two data center deployment strategy on Kubernetes and OpenShift.
About this task
After a failure has been resolved, the failed data center can be brought back online and re-linked to the currently active data center. It's important to do this as soon as possible, in order to reinstate disaster recovery. Otherwise, if another failure occurred before the failed data center is brought back online, there could be a complete outage.
The failed data center can be brought back online as the new active primary data center, or it can be brought back as a passive secondary data center and the currently active data center kept as the primary one. This decision will be based on your own company policies about recovering from a failure.
The amount of data that is sent when recovering from a failover, will depend on how long the outage lasted, and how much activity there was during that period.
Operational state | Description |
---|---|
progressing to active |
Pods are progressing to the active state, but none are capable of serving traffic. |
progressing to active (ready for traffic) |
At least one pod of each type is ready for traffic. The dynamic router can be linked to this service. |
active |
All of the pods are ready and in the correct disaster recovery state for the active data center. |
progressing to passive |
Pods are progressing to the passive state, but none are capable of serving traffic. |
progressing to passive (ready for traffic) |
At least one pod of each type is ready for traffic. |
passive |
All of the pods are ready and in the correct disaster recovery state for the passive data center. |
progressing to down |
Pods are moving from passive state to down. |
down |
The multi-site HA mode is deleted from the passive data center. |
kubectl describe ServiceName
Where
ServiceName
is the name of the API Manager or Developer Portal
service.- If the active data center is offline, do not remove the multi-site HA CR section from the passive data center, as this action deletes all the data from the databases. If you want to revert to a single data center topology, you must remove the multi-site HA CR section from the active data center, and then redeploy the passive data center. If the active data center is offline, you must first change the passive data center to be active, and then remove the multi-site HA CR section from the now active data center. The passive site must be uninstalled and redeployed as a clean install before it can be used. For more information, see Removing a two data center deployment.
- If the passive data center is offline for more than 24 hours, there can be issues with the disk space on the active data center so you must revert your deployment to a single data center topology. To revert to a single data center topology, you must remove the multi-site HA CR section from the active data center. When the passive site has been redeployed, the multi-site HA CR can be reapplied to the active site. For more information, see Removing a two data center deployment.