Demote your active data center to warm-standby, and then promote your warm-standby data
center to active.
About this task
The steps in this topic cover how to complete a failover in the following scenarios:
- When the active management and portal subsystems are in a healthy state.
- The problem that the active management or portal subsystem has (that requires the failover) does
not prevent the running of the apicup commands that are necessary to convert the
active to warm-standby.
If the active site is not accessible, or responsive to
apicup commands,
then refer to
Failover steps when active site is inaccessible.
For both management portal subsystems, the order in which failover is done is critical to prevent
an active-active scenario that can cause a split-brain. The first step in failover is to demote the
active data center to warm-standby, confirm that this
demotion was successful, and then promote the original warm-standby to active.
In the steps provided in this topic, the active data center is called DC1, and the warm-standby data center is called
DC2.
Procedure
- Management subsystem failover
- Set DC1 to be warm-standby.
- Set the
multi-site-ha-mode
property to passive
in
DC1:apicup subsys set <DC1 management> multi-site-ha-mode=passive
- Apply the update to
DC1:
apicup subsys install <DC1 management> --accept-dr-data-deletion
Note: When an active management subsystem is converted to warm-standby, all contents of its
management database are deleted (to be replaced by the contents from the other data center when it
becomes the active). The --accept-dr-data-deletion
flag is acknowledgment that you
accept this temporary loss of data.
- Monitor the progress of the conversion to warm-standby:
apicup subsys health-check <DC1 management> -v
- Set DC2 to be the active data center.
- Confirm that DC2 is ready for promotion to
active:
apicup subsys health-check <DC2 management> -v
Output
should show:...
ReadyForPromotion
...
- Set the
multi-site-ha-mode
property to active
in
DC2:apicup subsys set <DC2 management> multi-site-ha-mode=active
- Apply the update to
DC2:
apicup subsys install <DC2 management> --skip-health-check
- Monitor the failover
process:
apicup subsys health-check <subsystem name>
When
failover is complete the health-check command returns no output in either data center. Use the
-v
flag to see more
information:
apicup subsys health-check <subsystem name>
-v
- Update your dynamic router to redirect all management subsystem traffic to DC2 instead
of DC1.
- Developer Portal
service failover
The following instructions show how to failover DC1 to DC2 for the Developer Portal
service. If you have multiple portal services, then you must repeat these steps for each Developer Portal
service that you want to failover.
- Set DC1 to be warm-standby.
- Set the
multi-site-ha-mode
property to passive
in
DC1:apicup subsys set <DC1 portal> multi-site-ha-mode=passive
- Apply the update to
DC1:
apicup subsys install <DC1 portal> --skip-health-check
- Monitor the progression to warm-standby
- Login to one of the portal VMs in DC1 using
SSH:
ssh apicadm@<portal hostname>
- Switch to root user:
sudo -i
- Run
kubectl describe ptl
, and look for HA Mode
in the
Status
section of the output:kubectl describe ptl
...
Status:
...
HA Mode: progressing to passive
When Status.HA Mode
shows
progressing to passive
, set portal in DC2 to be active, as described in step 2.
Warning: The Spec
section of the output also has an HA mode
property, take care to ensure that you
are checking the HA mode
in the Status
section.
- Set DC2 to be active.
- Set the
multi-site-ha-mode
property to active
in
DC2:apicup subsys set <DC2 portal> multi-site-ha-mode=active
- Apply the update in
DC2:
apicup subsys install <DC2 portal> --skip-health-check
- Monitor the failover
process:
apicup subsys health-check <subsystem name>
When
failover is complete the health-check command returns no output in either data center. Use the
-v
flag to see more
information:
apicup subsys health-check <subsystem name>
-v
- Update your dynamic router to redirect all traffic to DC2 instead of
DC1.
Results
How long it takes to complete the failover varies, and depends on hardware speed, network
latency, and the size of the databases. However, here are some approximate timings:
For the management subsystem:
-
warm-standby
to active
approximately 5 minutes
-
active
to warm-standby
approximately
15 minutes
For the portal subsystem:
-
warm-standby
to active
15 - 40 minutes
-
active
to warm-standby
approximately
10 minutes