Subscription-based application failover between managed clusters

Fail over a subscription-based application from a primary managed cluster to a secondary managed cluster to maintain application availability during a disaster or cluster failure.

Before you begin

If your setup has active and passive RHACM hub clusters, see Hub recovery using Red Hat Advanced Cluster Management.
When primary cluster is in a state other than Ready, check the actual status of the cluster as it might take some time to update.
1. Navigate to RHACM console > Infrastructure > Clusters > Cluster list tab.
2. Check the status of both the managed clusters individually before performing a failover operation.
However, failover operation can still be run when the cluster you are failing over to is in a Ready state.
Run the following command on the Hub Cluster to check if lastGroupSyncTime is within an acceptable data loss window, when compared to current time.
```
oc get drpc -o yaml -A | grep lastGroupSyncTime
```
Example output:
```
[...]
lastGroupSyncTime: "2023-07-10T12:40:10Z"
```

About this task

Failover is a process that transitions an application from a primary cluster to a secondary cluster in the event of a primary cluster failure. While failover provides the ability for the application to run on the secondary cluster with minimal interruption, making an uninformed failover decision can have adverse consequences, such as complete data loss in the event of unnoticed replication failure from primary to secondary cluster. If a significant amount of time has gone by since the last successful replication, it’s best to wait until the failed primary is recovered.

LastGroupSyncTime is a critical metric that reflects the time since the last successful replication occurred for all PVCs associated with an application. In essence, it measures the synchronization health between the primary and secondary clusters. So, prior to initiating a failover from one cluster to another, check for this metric and only initiate the failover if the LastGroupSyncTime is within a reasonable time in the past.

Note: During the course of failover the Ceph-RBD mirror deployment on the failover cluster is scaled down to ensure a clean failover for volumes that are backed by Ceph-RBD as the storage provisioner.

Procedure

Failover the protected application using either of the following paths:
- Option 1: From the Applications page
  
  On the Hub cluster, navigate to Applications. Click Actions (⋮) and select Failover application.
- Option 2: From the Protected Applications page
  
  On the Hub cluster, navigate to Data Services > Disater recovery > Protected Applications > Actions (⋮) > Failover.
After the Failover application popup is shown, select Policy and Target cluster to which the associated application will failover in a disaster.
Click the Select subscription group dropdown to verify the default selection or modify this setting.
By default, the subscription group that replicates for the application resources is selected.
Check the status of the Failover readiness.
- If the status is Ready with a green tick, it indicates that the target cluster is ready for failover to start. Proceed to step 5.
- If the status is Unknown or Not ready, then wait until the status changes to Ready.
Important:
If there are data inconsistencies caused by synchronization delays, a warning message appears stating Inconsistent data on target cluster. This alerts to the possibility of data loss if the failover is initiated. The message is no longer displayed when data synchronization is complete.
Click Initiate.
All the system workloads and their available resources are now transferred to the target cluster.
In the DR status column, you can view the progress of the failover. The modal shows the progress of the failover as Preparing > Failover > Restoring > Clean up.
Verify that the activity status shows as FailedOver for the application.
1. Navigate to the Applications > Overview tab.
2. In the Data policy column, click the policy link for the application you applied the policy to.
3. On the Data Policy popover page, click the View more details link.
4. Verify that you can see one or more policy names and the ongoing activities (Last sync time and Activity status) associated with the policy in use with the application.