Disaster recovery mechanism for the GDR solution

After you plan the details about how the GDR solution can integrate into your current environment, review the following flow chart that contains the high-level steps that are involved in the GDR implementation.

The following flow chart provides a summary of the entire GDR mechanism for disaster recovery:
Tip: To learn more about each step in the flow chart, click an element in the image.
Figure 1. GDR solution: Disaster recovery mechanism
Disaster recovery mechanism Installing GDR Setting up the KSYS subsystem Discovering resources Verifying the configuration Recovering the virtual machines during a disaster

1. Installation

The controlling system (KSYS) is the fundamental component in the GDR solution. Therefore, the KSYS filesets must be installed first.

The KSYS runs in an AIX® 7.2.1 (or later) logical partition in the disaster recovery site. It controls the entire cloud environment for the GDR solution.

To manage the servers and data replication, the KSYS must be connected to all the managed servers through the HMC and out-of-band storage system connectivity to all associated primary and secondary disks.

2. Configuration

After the KSYS is installed and a one-node KSYS cluster is set up, you must configure all the other entities by using the KSYS interface.

You must complete the following procedures by using the ksysmgr command:

  1. Create a one-node cluster for the KSYS node.
  2. Create sites.
  3. Add HMCs to the corresponding sites.
  4. Add hosts to the corresponding sites.
  5. Identify host pairs across the sites.
  6. Create host groups.
  7. Add storage agents to the corresponding sites.
  8. Add contacts details for error notification.

3. Discovery

After the initial configuration is complete, the KSYS discovers all the hosts from all the host groups that are managed by the HMCs in both sites and displays the status.

During discovery, the KSYS subsystem monitors the discovery of all logical partitions (LPARs) or virtual machines (VMs) in all the managed hosts in the active site. The KSYS collects the configuration information for each LPAR, and displays the status, and also logs the status in the log files at the /var/ksys/log/ directory.

The KSYS discovers the disks of each VM and checks whether the VMs are configured currently for the storage devices mirroring. If the disks are not configured for mirroring properly, KSYS notifies you about the volumes that are not mirrored. All volumes of a VM must be mirrored. Disks can be virtualized by using N_Port ID virtualization (NPIV), virtual SCSI (vSCSI), or combination of all these modes.

HMC collects information about the hosts, VIOS, and logical partitions that can be managed by the KSYS. For example, HMC collects information about the system processor, system memory, hardware, and worldwide port name (WWPN) of the physical Fibre Channel adapter. HMC also checks for VIOS capability for disaster recovery operations. HMC also collects the information about the host state, LPAR state, VIOS state, and IP addresses of the host, VIOS, and LPAR. HMC provides all this information to KSYS during the discovery phase.

4. Verification

In addition to the configuration validation that are initiated by you, the KSYS verifies and validates the environment periodically. The KSYS also verifies the configuration as part of the recovery process. In the verification phase, the KSYS fetches information from the HMC to check whether the backup site is capable to host the VMs during a disaster. The KSYS also verifies storage replication-related details and accessibility of the target disks. The verification is successful only if the storage area network (SAN) zones are configured properly on the target side.

If the verification fails as a part of the recovery process, the failure is considered as recovery failure.

5. Recovery

When any planned or unplanned outages occur, you must manually initiate the recovery by using the ksysmgr command that moves the virtual machines to the backup site. If you initiate a planned recovery, the storage replication direction is reversed from the current active site to the previously active site. If you initiate an unplanned recovery, the storage is failed over to the backup site and you must manually resynchronize the storage after the previously active site becomes operational.

6. Cleanup

After the disaster recovery phase, in case of a planned recovery, the KSYS automatically cleans up the source site of all the disk mapping and adapter information. In the case of an unplanned recovery, you must manually clean up the source site when the HMC and hosts in the previously active site become operational. If the VMs in the previously active site are still in active state, the VMs are first powered off, and then the cleanup operations are performed in the source site.