Relocate disaster recovery protected discovered application

Before you begin

  • Ensure that the application namespace is created in both managed clusters (for example, busybox-discovered).

About this task

This section guides you on how to failover a discovered application which is disaster recovery protected.

Procedure

  1. Disable fencing on the Hub cluster.
    1. Edit the DRCluster resource for this cluster, replacing <drcluster_name> with a unique name.
      oc edit drcluster <drcluster_name>
      apiVersion: ramendr.openshift.io/v1alpha1
      kind: DRCluster
      metadata:
      [...]
      spec:
        cidrs:
        [...]
        ## Modify this line
        clusterFence: Unfenced
        [...]
      [...]

      Example output:

      drcluster.ramendr.openshift.io/ocp4perf1 edited
      Note: Once the managed cluster is fenced, all communication from applications to the OpenShift Data Foundation external storage cluster will fail and some Pods will be in an unhealthy state (for example: CreateContainerError, CrashLoopBackOff) on the cluster that is now fenced.
    2. Gracefully reboot OpenShift Container Platform nodes that were Fenced. A reboot is required to resume the I/O operations after unfencing to avoid any further recovery orchestration failures. Reboot all nodes of the cluster by following the steps in the procedure, Rebooting a node gracefully.
      Note: Make sure that all the nodes are initially cordoned and drained before you reboot and perform uncordon operations on the nodes.
    3. After all OpenShift nodes are rebooted and are in a Ready status, verify that all Pods are in a healthy state by running this command on the Primary managed cluster (or whatever cluster has been Unfenced)
      oc get pods -A | egrep -v 'Running|Completed'
      NAMESPACE                                          NAME                                                              READY   STATUS      RESTARTS       AGE
      The output for this query should be zero Pods before proceeding to the next step.
      Important: If there are Pods still in an unhealthy status because of severed storage communication, troubleshoot and resolve before continuing. Because the storage cluster is external to OpenShift, it also has to be properly recovered after a site outage for OpenShift applications to be healthy.

      Alternatively, you can use the OpenShift Web Console dashboards and Overview tab to assess the health of applications and the external ODF storage cluster. The detailed Fusion Data Foundation dashboard is found by navigating to Storage Data Foundation.

    4. Verify that the Unfenced cluster is in a healthy state. Validate the fencing status in the Hub cluster for the Primary-managed cluster, replacing <drcluster_name> with a unique name.
      oc get drcluster.ramendr.openshift.io <drcluster_name> -o jsonpath='{.status.phase}{"\n"}'
      Unfenced
    5. Login to your Ceph cluster and verify that the IPs that belong to the OpenShift Container Platform cluster nodes are now in the blocklist.
      ceph osd blocklist ls

      Ensure that you do not see the IPs added during fencing.

  2. In the RHACM console, navigate to Disaster Recovery > Protected applications tab.
  3. At the end of the application row, click on the Actions menu and choose to initiate Relocate.
  4. In the Relocate application modal window, review the status of the application and the target cluster.
  5. Click Initiate.
  6. Check the progression status of Relocate until the result is WaitOnUserToCleanup. The DRPC name can be identified by the unique Name configured in prior steps (for example, busybox-rbd).
    oc get drpc {drpc_name} -n openshift-dr-ops -o jsonpath='{.status.progression}{"\n"}'
  7. Remove the busybox application from the Secondary managed cluster before Relocate to the Primary managed cluster is completed
    1. Navigate to the cloned repository for busybox and run the following commands on the Secondary managed cluster where you relocated from. Use the same directory that was used to create the application (for example, odr-metro-rbd).
      cd ~/ocm-ramen-samples/
      git branch
      * main
      
      oc delete -k workloads/deployment/odr-metro-rbd -n busybox-discovered
      persistentvolumeclaim "busybox-pvc" deleted
      deployment.apps "busybox" deleted
  8. After deleting the application, navigate to the Protected applications tab and verify that the busybox resources are both in Healthy status.
  9. Verify that the busybox application is running on the Primary managed cluster.
    oc get pods,pvc -n busybox-discovered
    NAME                           READY   STATUS    RESTARTS   AGE
    pod/busybox-796fccbb95-qmxjf   1/1     Running   0          2m46s
    
    
    NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
    persistentvolumeclaim/busybox-pvc   Bound    pvc-b20e4129-902d-47c7-b962-040ad64130c4   1Gi        RWO            ocs-storagecluster-ceph-rbd   <unset>                 2m57s
  10. After deleting the application, navigate to the Protected applications tab and verify that the busybox resources are both in Healthy status.