Switching to passive hub cluster

About this task

Use this procedure when active hub is down or unreachable.

Procedure

  1. Restore the backups on the passive hub cluster. For information, see Restoring a hub cluster from backup.
    Important: Recovering a failed hub to its passive instance will only restore applications and their DR protected state to its last scheduled backup. Any application that was DR protected after the last scheduled backup would need to be protected again on the new hub.
  2. During the restore procedure, to avoid eviction of resources when ManifestWorks are not regenerated correctly, you can enlarge the AppliedManifestWork eviction grace period.
    1. Verify that the restore is complete.
      oc -n <restore-namespace> wait restore <restore-name> --for=jsonpath='{.status.phase}'=Finished --timeout=120s
    2. After the restore is completed, on the hub cluster, check for existing global KlusterletConfig
      • If global KlusterletConfig exists then edit and set the value for appliedManifestWorkEvictionGracePeriod parameter to a larger value. For example, 24 hours or more.
      • If global KlusterletConfig does not exist, then create the Klusterletconfig using the following yaml:
        apiVersion: config.open-cluster-management.io/v1alpha1
        kind: KlusterletConfig
        metadata:
          name: global
        spec:
          appliedManifestWorkEvictionGracePeriod: "24h"

        The configuration will be propagated to all the managed clusters automatically.

  3. Verify that the Primary and Seconday managed clusters are successfully imported into the RHACM console and they are accessible. If any of the managed clusters are down or unreachable then they will not be successfully imported.

    Wait until DRPolicy validation succeeds before performing any DR operation.

    Note: Submariner is automatically installed once the managed clusters are imported on the passive hub.
  4. Verify that the DRPolicy is created successfully. Run this command on the Hub cluster for each of the DRPolicy resources created, where <drpolicy_name> is replaced with a unique name.
    oc get drpolicy <drpolicy_name> -o jsonpath='{.status.conditions[].reason}{"\n"}'

    Example output:

    Succeeded
  5. Refresh the RHACM console to make the DR monitoring dashboard tab accessible if it was enabled on the Active hub cluster.
  6. Once all components are recovered, edit the global KlusterletConfig on the new hub and remove the parameter appliedManifestWorkEvictionGracePeriod and its value.
  7. If only the active hub cluster is down, restore the hub by performing hub recovery, and restoring the backups on the passive hub. If the managed clusters are still accessible, no further action is required.
  8. If the primary managed cluster is down, along with the active hub cluster, you need to fail over the workloads from the primary managed cluster to the secondary managed cluster. For failover instructions, based on your workload type, see Subscription-based application failover between managed clusters or ApplicationSet-based application failover between managed clusters.
  9. Verify that the failover is successful. When the Primary managed cluster is down, then the PROGRESSION status for the workload would be in Cleaning Up phase until the down managed cluster is back online and successfully imported into the RHACM console.
    On the passive hub cluster, run the following command to check the PROGRESSION status.
    oc get drpc -o wide -A
    NAMESPACE              NAME                                    AGE    PREFERREDCLUSTER    FAILOVERCLUSTER     DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION        PEER READY
    [...]
    busybox                cephfs-sub-busybox-placement-1-drpc     103m   ocp4bos1            ocp4bos2            Failover       FailedOver     Cleaning Up   2024-04-15T09:12:23Z                   False
    busybox                cephfs-sub-busybox-placement-1-drpc     102m   ocp4bos1                                               Deployed       Completed     2024-04-15T07:40:09Z   37.200569819s   True
    [...]