Bidirectional supports for disaster recovery using IBM Geographically Dispersed Resiliency for Power Systems
Overview of IBM Geographically Dispersed Resiliency
IBM® Geographically Dispersed Resiliency for Power Systems uses virtual machine (VM) restart technology to restart VMs on a backup site if there is a disaster or planned system maintenance by the administrator. All of this is controlled by a single management system separate from the production systems. VMs are defined as hosts in IBM Geographically Dispersed Resiliency for Power Systems and are paired across the two sites. Each site must have the resources available to run the host if there is a planned or an unplanned move. IBM Geographically Dispersed Resiliency for Power Systems is managed by a single control system logical partition (LPAR) called KSYS, which stands for C(K)ontroller system LPAR. The management system (KSYS) allows the administrator to perform move operations and disaster recovery (DR) tests. KSYS handles all the complexity of the communicating with the different components of the IBM Geographically Dispersed Resiliency for Power Systems environment to perform the necessary tasks.
This article demonstrates the working of bidirectional support using IBM Geographically Dispersed Resiliency disaster recovery solution introduced by IBM for Power Systems™ servers.
For bidirectional support, two sets of KSYS configuration is required. Each KSYS will handle one direction.
Let us consider the following examples:
- KSYSA: India is the active site, Austin is the backup site and manages KSYSA_VM1, KSYSA_VM2.
- KSYSB: Austin is the active site, India is the backup site and manages KSYSB_VM1, KSYSB_VM2.
- Site India: KSYSA_VM1, KSYSA_VM2, KSYSB, KSYSA (non-active KSYSA, replica of KSYSB).
- Site Austin: KSYSB_VM1, KSYSB_VM2, KSYSA, KSYSB (non-active KSYSB, replica of KSYSA).
Refer How to safeguard the KSYS node in IBM Geographically Dispersed Resiliency for Power systems? for more details about KSYS node replication.
Figure 1 shows the hardware setup for demonstrating bidirectional support using IBM Geographically Dispersed Resiliency.
Figure 1. Hardware support
Figure 2. KSYS and VM on production HMC and DR HMC
Configuration of KSYS cluster for both KSYSA and KSYSB
Let us consider two KSYS nodes, KSYSA and KSYSB. Here, KSYSA has a cluster named KSYSACluster and KSYSB has a cluster named KSYSBCluster.
Following steps need to be performed to create a cluster on KSYSA and KSYSB nodes.
Step 1: Create the KSYS clusters and add the KSYS nodes to it.
KSYS clusters KSYSACluster and KSYSBCluster are created on the KSYS nodes KSYSA and KSYSB respectively.
- KSYS cluster KSYSACluster on ksysnode KSYSA
- KSYS cluster KSYSBCluster on ksysnode KSYSB
Run the following command to create the KSYS cluster:
ksysmgr add ksyscluster <cluster_name> ksysnodes=<node_name> sync=yes
Figure 3. Creating a cluster on the KSYSA KSYS node
Figure 4. Creating a cluster on the KSYSB KSYS node
Step 2: Add sites to the KSYS cluster.
As per KSYS configuration, an active site (referred as the production site) and a backup site (referred as the remote site) are created.
Let's consider the following site names:
- KSYSACluster: With India as the production site and Austin as the remote site
- KSYSBCluster: With Austin as the production site and India as the remote site
Run the following command to create these sites:
ksysmgr add site <site_name> sitetype=<active|backup>
Figure 5. Adding site for KSYSA configuration
Figure 6. Adding site for KSYSB configuration
Step 3: Adding Hardware Management Console (HMC) to the KSYS cluster
Let us consider vmhmc8 as the production site HMC and vmhmc1 as the remote site HMC for KSYSA and vmhmc1 as the production site HMC and vmhmc8 as the remote site HMC for KSYSB. The following figure shows the addition of vmhmc1 and vmhmc8 for the KSYSA and KSYSB sites.
Run the following command to add the HMC:
ksysmgr add hmc <name> hostname=<hmc_name> login=<username> password=<password> site=<sitename to which added>
Figure 7. Adding HMC vmhmc8 to the production site for KSYSA configuration
Figure 8. Adding HMC vmhmc1 to the production site for KSYSB configuration
Figure 9. Adding HMC vmhmc1 to the remote site for KSYSA configuration
Figure 10. Adding HMC vmhmc8 to the remote site for KSYSB configuration
Step 4: Add a host to the KSYS cluster.
Let us consider host why2_9117-MMD-105E61P_32CPU256G of vmhmc8 shown in Figure 7 as the production site host and rar1m5 of vmhmc1 shown in Figure 9 as remote site host for the KSYS cluster KSYSACluster, host rar1m5 of vmhmc1 shown in Figure 8 as production site host and why2_9117-MMD-105E61P_32CPU256G of vmhmc8 shown in Figure 10 as remote site host for the KSYS cluster KSYSBCluster.
Run the following command to add the host:
ksysmgr add host <hostname> site=<site_name to which its hmc belongs> uuid=<uuid of cec>
Figure 11. Adding production site host and remote site host to KSYSACluster
Figure 12. Adding production site host and remote site host to KSYSBCluster
Step 5: Pair the host from the production site with the remote site.
For the KSYSACluster production site host, why2_9117-MMD-105E61P_32CPU256G is paired with rar1m5, and for the KSYSBCluster production site host, rar1m5 is paired with why2_9117-MMD-105E61P_32CPU256G.
Run the following command to pair the hosts:
ksysmgr pair host <active_site_host> pair=<backup_site_host>
Figure 13. Production host why2_9117-MMD-105E61P_32CPU256G paired with remote host rar1m5 for KSYSACluster
Figure 14. Production host rar1m5 paired with remote host why2_9117-MMD-105E61P_32CPU256G for KSYSBCluster
Step 6: Add a storage agent for handling disk replication.
Let us consider the storage agents, SA_L (as local storage agent) and SA_R (as remote storage agent) for both the KSYS nodes, KSYSA and KSYSB.
Run the following command to add the storage agent:
ksysmgr add storage_agent <name> login=<username> password=<password> site=<sitename_associated> serialnumber=<storage_no> storagetype=<type_of_storage> ip=<ip_of_storage>
Figure 15. Storage agent details for KSYSACluster (left) and KSYSBCluster (right)
The configuration is now ready with respect to KSYS.
Figure 16. Virtual machines to be handled under host why2_9117-MMD-105E61P_32CPU256G for KSYSA
Figure 17. Virtual machines to be handled under host rar1m5 for KSYSB
Step 7: Perform the discovery of KSYS configuration.
After discovery is completed on both KSYS nodes, check for the disk group and the disk pair created. The following figures show discovery details, disk pair details, and disk group details for each KSYS node.
Run the following command to discover the active (production) site:
ksysmgr discover site <active_site_name>
Figure 18. Discovery for KSYSACluster
Figure 19. Discovery for KSYSBCluster
Run the following command to query a disk pair:
ksysmgr q disk_pair
Figure 20. Disk pair details for both KSYSACluster and KSYSBCluster
Run the following command to query a disk group:
ksysmgr query disk_group
Figure 21. Disk group details for both KSYSACluster and KSYSBCluster
Step 8: Verify the site on both KSYS nodes, KSYSA and KSYSB.
Usually, verification is done for the remote site to confirm whether the production site virtual machines will be able to restart at the remote site.
Run the following command to verify the site:
ksysmgr verify site <active_site_name>
Figure 22. Verification on KSYSACluster at KSYSA
Figure 23. Verification on KSYSBCluster at KSYSB
Step 9: Move the virtual machine during a disaster or system failure.
After the virtual machines are in the READY_TO_MOVE state, ensure that the virtual machines can be restarted at the remote site during planned and unplanned moves.
You need to run the following command to move a site:
ksysmgr move site from=<active site name> to=<backup site name> dr_type=planned|unplanned
Planned move is needed in case of maintenance. Before a planned move, VMs must be in the
READY_TO_MOVE state. In case of an unplanned move, verification is not
Figure 24. Move initiated from the KSYSA site to the KSYSB site with dr_type=planned
The following figure shows that all virtual machines for KSYSA and KSYSB are now restarted at the remote site.
Figure 25. Hosts after the move
The following figure shows that now site Austin becomes the active site and site India becomes the backup site for KSYSACluster.
Figure 26. Sites after the move
Now, we need to unmanage the VM after the move.
Because VMs KSYSA_VM1 and KSYSA_VM2 have been moved to host rar1m5, they automatically get discovered by KSYSBCluster. So, we need to unmanage them from KSYSBCluster.
Because rar1m5 has become an active host to KSYSACluster, unmanage all the VMs, except KSYSA_VM1 and KSYSA_VM2.
Run the following command to unmanage the VM:
ksysmgr unmanage vm name=<vm name> host=<host name>
After this, we can run the
move command with
move our VMs KSYSA_VM1 and KSYSA_VM2 from rar1m5 to
why2_9117-MMD-105E61P_32CPU256G to set it back similar to the original
Let us suppose site India's central processor complex (CPC) is down.
Figure 27. CPC why2_9117-MMD-105E61P_32CPU256G is down
As the CPC goes down, KSYS will initiate an unplanned move.
Figure 28. Move initiated from the KSYSA site to the KSYSB site with dr_type=unplanned
Make the KSYSB (replica) node running on rar1m5 so that it can monitor the VMs: KSYSB_VM1 and KSYSB_VM2.
Now KSYSA_VM1 and KSYSA_VM2 have been moved to host rar1m5. So, they automatically get discovered by KSYSBCluster and we need to unmanage them from KSYSBCluster.
rar1m5 has now become an active host to KSYSACluster. So unmanage all the VMs except KSYSA_VM1 and KSYSA_VM2.
Run the following command to unmanage the VM:
ksysmgr unmanage vm name=<vm name> host=<host name>
After site India's CEC is powered on, perform a cleanup operation on site India to remove the copies of KSYSA_VM1 and KSYSA_VM2 from why2_9117-MMD-105E61P_32CPU256G.
Use the following command to clean up the site:
ksysmgr cleanup site <sitename>
The following figure shows that all virtual machines for KSYSA and KSYSB are now restarted at rar1m5.
Figure 29. Hosts after move
Figure 29 shows that KSYSA and KSYSB are in the Not Activated state on the active site due to an unplanned move and the respective VMS move to the DR site. So, KSYSA and KSYSB on the DR site will monitor their respective VMs. After rectifying the issue that causes the unplanned move, the active site initially will be online again and the host associated with it will also be online. So, to get the VMs back to the initial position with bidirectional support, you need to perform the following steps will be helpful.
- Notice that on the DR site, KSYSA and KSYSB monitor their respective VMs.
- Move the VMs associated with KSYSA from the DR site to the Active site with a planned move.
- After moving the VMs, bring down KSYSB on the DR site.
- Reverse the replication path at storage for KSYSB.
- Activate KSYSB on the active site.
This will get VMs into proper state where they can be monitored bidirectional as shown in Figure 16 and Figure 17.
Note: You can implement the same with two hosts per site, where each host pair is monitored by one KSYS node.
|Site A||Site B|
|HOST 1||----------- KSYS1----------||HOST3|
|HOST 2||---------- KSYS2----------||HOST4|
As shown above, KSYS1 will handle HOST1 and HOST3 whereas KSYS2 will handle HOST2 and HOST4. In this case after DR, the VMs will migrate to their respective target CECs. So, it might not be required to manage or unmanage VMs after DR. This way is useful if data center has sufficient hardware. Whereas, this article described the method that is useful when the data center has less hardware.