Failover steps when active site is responsive

Demote your active data center to warm-standby, and then promote your warm-standby data center to active.

About this task

The steps in this topic cover how to complete a failover in the following scenarios:
  • When the active management and portal subsystems are in a healthy state.
  • The problem that the active management or portal subsystem has (that requires the failover) does not prevent the running of the apicup commands that are necessary to convert the active to warm-standby.
If the active site is not accessible, or responsive to apicup commands, then refer to Failover steps when active site is inaccessible.

For both management portal subsystems, the order in which failover is done is critical to prevent an active-active scenario that can cause a split-brain. The first step in failover is to demote the active data center to warm-standby, confirm that this demotion was successful, and then promote the original warm-standby to active.

In the steps provided in this topic, the active data center is called DC1, and the warm-standby data center is called DC2.

Procedure

  • Management subsystem failover
    1. Set DC1 to be warm-standby.
      1. Set the multi-site-ha-mode property to passive in DC1:
        apicup subsys set <DC1 management> multi-site-ha-mode=passive
      2. Apply the update to DC1:
        apicup subsys install <DC1 management> --accept-dr-data-deletion
        Note: When an active management subsystem is converted to warm-standby, all contents of its management database are deleted (to be replaced by the contents from the other data center when it becomes the active). The --accept-dr-data-deletion flag is acknowledgment that you accept this temporary loss of data.
      3. Monitor the progress of the conversion to warm-standby:
        apicup subsys health-check <DC1 management> -v
      Important: If the management subsystem does not successfully convert to warm-standby, then do not proceed any further. Treat DC1 as an inaccessible data center, and refer to Failover steps when active site is inaccessible.
    2. Set DC2 to be the active data center.
      1. Confirm that DC2 is ready for promotion to active:
        apicup subsys health-check <DC2 management> -v
        Output should show:
        ...
        ReadyForPromotion
        ...
      2. Set the multi-site-ha-mode property to active in DC2:
        apicup subsys set <DC2 management> multi-site-ha-mode=active
      3. Apply the update to DC2:
        apicup subsys install <DC2 management> --skip-health-check
    3. Monitor the failover process:
      apicup subsys health-check <subsystem name>
      When failover is complete the health-check command returns no output in either data center. Use the -v flag to see more information:
      apicup subsys health-check <subsystem name> -v
    4. Update your dynamic router to redirect all management subsystem traffic to DC2 instead of DC1.
  • Developer Portal service failover

    The following instructions show how to failover DC1 to DC2 for the Developer Portal service. If you have multiple portal services, then you must repeat these steps for each Developer Portal service that you want to failover.

    1. Set DC1 to be warm-standby.
      1. Set the multi-site-ha-mode property to passive in DC1:
        apicup subsys set <DC1 portal> multi-site-ha-mode=passive
      2. Apply the update to DC1:
        apicup subsys install <DC1 portal> --skip-health-check
      3. Monitor the progression to warm-standby
        1. Login to one of the portal VMs in DC1 using SSH:
          ssh apicadm@<portal hostname>
        2. Switch to root user:
          sudo -i
        3. Run kubectl describe ptl, and look for HA Mode in the Status section of the output:
          kubectl describe ptl
          
          ...
          Status:
            ...
            HA Mode: progressing to passive

          When Status.HA Mode shows progressing to passive, set portal in DC2 to be active, as described in step 2.

          Warning: The Spec section of the output also has an HA mode property, take care to ensure that you are checking the HA mode in the Status section.
    2. Set DC2 to be active.
      1. Set the multi-site-ha-mode property to active in DC2:
        apicup subsys set <DC2 portal> multi-site-ha-mode=active
      2. Apply the update in DC2:
        apicup subsys install <DC2 portal> --skip-health-check
    3. Monitor the failover process:
      apicup subsys health-check <subsystem name>
      When failover is complete the health-check command returns no output in either data center. Use the -v flag to see more information:
      apicup subsys health-check <subsystem name> -v
    4. Update your dynamic router to redirect all traffic to DC2 instead of DC1.

Results

How long it takes to complete the failover varies, and depends on hardware speed, network latency, and the size of the databases. However, here are some approximate timings:

For the management subsystem:
  • warm-standby to active approximately 5 minutes
  • active to warm-standby approximately 15 minutes
For the portal subsystem:
  • warm-standby to active 15 - 40 minutes
  • active to warm-standby approximately 10 minutes