Switching to passive hub cluster
About this task
Use this procedure when active hub is down or unreachable.
Procedure
- Restore the backups on the passive hub cluster. For information, see Restoring a hub cluster from backup. Important: Recovering a failed hub to its passive instance will only restore applications and their DR protected state to its last scheduled backup. Any application that was DR protected after the last scheduled backup would need to be protected again on the new hub.
- During the restore procedure, to avoid eviction of resources when
ManifestWorks
are not regenerated correctly, you can enlarge theAppliedManifestWork eviction
grace period.- Verify that the restore is complete.
oc -n <restore-namespace> wait restore <restore-name> --for=jsonpath='{.status.phase}'=Finished --timeout=120s
- After the restore is completed, on the hub cluster, check for existing global
KlusterletConfig
- If global
KlusterletConfig
exists then edit and set the value forappliedManifestWorkEvictionGracePeriod
parameter to a larger value. For example, 24 hours or more. - If global
KlusterletConfig
does not exist, then create theKlusterletconfig
using the following yaml:apiVersion: config.open-cluster-management.io/v1alpha1 kind: KlusterletConfig metadata: name: global spec: appliedManifestWorkEvictionGracePeriod: "24h"
The configuration will be propagated to all the managed clusters automatically.
- If global
- Verify that the restore is complete.
- Verify that the Primary and Seconday managed clusters are successfully imported into the
RHACM console and they are accessible. If any of the managed clusters are down or unreachable then
they will not be successfully imported.
Wait until DRPolicy validation succeeds before performing any DR operation.
Note: Submariner is automatically installed once the managed clusters are imported on the passive hub. - Verify that the DRPolicy is created successfully. Run this command on the Hub
cluster for each of the DRPolicy resources created, where <drpolicy_name> is replaced
with a unique name.
oc get drpolicy <drpolicy_name> -o jsonpath='{.status.conditions[].reason}{"\n"}'
Example output:
Succeeded
- Refresh the RHACM console to make the DR monitoring dashboard tab accessible if it was enabled on the Active hub cluster.
- Once all components are recovered, edit the global
KlusterletConfig
on the new hub and remove the parameterappliedManifestWorkEvictionGracePeriod
and its value. - If only the active hub cluster is down, restore the hub by performing hub recovery, and restoring the backups on the passive hub. If the managed clusters are still accessible, no further action is required.
- If the primary managed cluster is down, along with the active hub cluster, you need to fail over the workloads from the primary managed cluster to the secondary managed cluster. For failover instructions, based on your workload type, see Subscription-based application failover between managed clusters or ApplicationSet-based application failover between managed clusters.
- Verify that the failover is successful. When the Primary managed cluster is down, then
the PROGRESSION status for the workload would be in
Cleaning Up
phase until the down managed cluster is back online and successfully imported into the RHACM console.On the passive hub cluster, run the following command to check the PROGRESSION status.oc get drpc -o wide -A
NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY [...] busybox cephfs-sub-busybox-placement-1-drpc 103m ocp4bos1 ocp4bos2 Failover FailedOver Cleaning Up 2024-04-15T09:12:23Z False busybox cephfs-sub-busybox-placement-1-drpc 102m ocp4bos1 Deployed Completed 2024-04-15T07:40:09Z 37.200569819s True [...]