Contents


Disaster Recovery as a Service (DRaaS) offering using IBM Geographically Dispersed Resiliency Solution for Power Systems

Comments

What is IBM Geographically Dispersed Resiliency for Power Systems?

Disaster recovery and high availability (HA) solutions are mainly based on two type of technology, Cluster based technology and virtual machines restart based technology. Cluster HA and DR solution typically deploy redundant hardware and software components to provide near real-time failover when one or more components in configuration fails. The virtual machine (VM) based solution relies on the out of band monitoring and management option to restart the virtual machines during hardware failure in an infrastructure. The IBM® Geographically Dispersed Resiliency for Power Systems™ solution is based on virtual machine restart technology.

IBM Geographically Dispersed Resiliency for Power Systems is a disaster recovery solution that is easy to deploy and provides an automated process to recover production site virtual machines during disaster recovery. Because disaster recovery of applications and service are key components to provide continuity for business, IBM Geographically Dispersed Resiliency solution helps customer to have automated disaster recovery process during a failure. This solution provides an easy deployment model that uses a controller system (called KSYS) to monitor the entire virtual machine environment. This solution also provides flexible failover policies and storage replication management.

You can learn more about Geographically Disperse Resiliency for Power Systems at the IBM Developer Works wiki document: Why GDR is the ideal DR solution for Power Systems and FAQ.

Brief overview on disaster recovery as service (DRaaS) hosting configuration for IBM Geographically Dispersed Resiliency

In a cloud-based recovery model, two different customers can share the same IBM site for recovery through subscriptions of a disaster recovery management service offering from IBM. In the following figure, there are two Geographically Dispersed Resiliency configurations, each for different customers, with a different KSYS node (K-sys1 and K-sys2). At present, each customer has their own production site and remote site. This article provides information that can help to reduce the customer cost of the remote site hardware. As remote site hardware is provided by a service provider, it can only be charged for services offered for disaster recovery of virtual machines. In reality, customers will have their own production site, but both customers can fail over to the same IBM site and on the same host (server). If both customers reside in the same city, they might experience disaster recovery at the same time and request to fail over their production sites to the IBM site which will be a service provider for both customers at the same time. In this case, the customers are not aware of who else they are sharing the recovery infrastructure. It will be the service provider's responsibility to maintain confidentiality of configuration for each customer.

In a service offering based recovery model, it is not the customer who is handling the controller node, and instead the DRaaS provider handles all KSYS nodes of each customer and takes appropriate action based on priority and service offering contract agreements. While concurrent failover of multiple customer sites may be possible, it may be necessary to have customers agree to moving their site serially along with other customers, based on the priority in their service agreement. In this solution, customer need not to find recovery server during disaster recovery or failure, instead recovery server will be at service provider location or data center where Virtual machines will be restarted.

This model can be implemented using IBM geographically Dispersed Resiliency for Power systems, where customers will have only their own hardware running at its location and recovery hardware will be available at service provider location. All DR operations will be handled by the service provider using the applicable controller node (KSYS).

Figure 1. Basic configuration for DRaaS
image024
image024

As per Figure 1, the production site system is at the customer location and the backup site system is at the service provider data center.

Configuration of KSYS node with respect to each customer where recovery location will be same for all customers

Let us consider two KSYS nodes, r7r3m116 and r7r3m108. Here, r7r3m116 has a cluster named, cluster01 and r7r3m108 has a cluster named cluster02.

Step 1: Creating KSYS cluster

KSYS clusters cluster01 and cluster02 are created on the KSYS nodes r7r3m116 and r7r3m108 respectively.

  • KSYS cluster cluster01 on ksysnode r7r3m116
  • KSYS cluster cluster02 on ksysnode r7r3m108

You can use the ksysmgr command to create the KSYS cluster:

 ksysmgr add ksyscluster <cluster_name> ksysnodes=<node_name> sync=yes

Note: In this article, figures with yellow color font refers to the customerA controller node and figures with white color font refers to the customerB controller node. They are two different KSYS clusters with different KSYS nodes.

Figure 2. Creating Cluster on customer A KSYS node
image002
image002
Figure 3. Creating cluster on customer B KSYS node
image003
image003

Step 2: Adding sites to the KSYS cluster

As per KSYS configuration, an active site (referred as the production site) and a backup site (referred as the remote site) are created. The production site will be at the customer location, whereas, the disaster recovery site will be at service provider location. Service provider will provide DR as a service offering.

Let us consider name customerA as the production site for cluster01 and name customerB as the production site for cluster02. Disaster recovery site situated at the service provider location is service_provider.

You can use the ksysmgr command to create the sites:

 ksysmgr add site <site_name> sitetype=<active|backup>
Figure 4. Adding site for customerA configuration
image004
image004
Figure 5. Adding site for customerB configuration
image005
image005

Step 3: Adding Hardware Management Console (HMC) to the KSYS cluster

Let us consider vmhmc1 as the production site HMC for customerA and vmhmc5 as the production HMC for customerB, whereas vmhmc6 will be the remote site HMC situated at the service provider location. The following figure shows the addition of vmhmc1 and vmhmc5 to the customerA and customerB sites.

Figure 6. Adding HMC vmhmc1 to the production site for customerA
image006
image006
Figure 7. Adding HMC vmhmc5 to the production site for customerB
image007
image007

Now, we will be adding vmhmc6 on the remote disaster recovery site which is at the service provider data center. This remote site HMC will be the same for all the customers and it can be used to handle virtual machines when there is a failure on the customer production environment. This HMC will be maintained by the service provider that offers this service to handle customer virtual machines during disaster recovery. The figure shows the addition of vmhmc6 on the service_provider site. This HMC will be added on both KSYS clusters.

Use the following command to add HMC:

ksysmgr add hmc <name> hostname=<hmc_name> login=<username> password=<password> site=<sitename to which added>
Figure 8. Adding vmhmc6 on service_provider site to KSYS node on r7r3m116
image008
image008
Figure 9. Adding vmhmc6 on service_provider site to KSYS node on r7r3m108
image009
image009

Step 4: Adding a host to the KSYS cluster

Let us consider host pbrazos of vmhmc1 shown in Figure 6 as production host for KSYS cluster cluster01, host raichu of vmhmc5 shown in Figure 7 as production site host for KSYS cluster cluster02. Whereas, remote site host will be snorlax which will be at the service provider site. Host snorlax under vmhmc6 will be at the service provider data center and will be serving it during disaster recovery of production site VMs for customers who have registered for the service offering.

Use the following command to add the host:

ksysmgr add host <hostname> site=<site_name to which its hmc belongs> uuid=<uuid of cec>
Figure 10. Adding production host and remote host to cluster01
image010
image010
Figure 11. Adding production host and remote host to cluster02
image011
image011

Step 5: Pairing host from production site to remote site

For the cluster01 production host, pbrazos will be paired to snorlax and for the cluster02 production host, raichu will be paired with snorlax. Recovery host snorlax will be at service provider data center. In this case, whenever the disaster recovery occurs at the production site, all the virtual machines associated with the production host will be restarted on the remote site disaster recovery host maintained by the service provider.

Use the following command to pair hosts:

ksysmgr pair host <active_site_host> pair=<backup_site_host>
Figure 12. For cluster01, production host pbrazos paired with remote host snorlax
image012
image012
Figure 13. For cluster02, production host raichu paired with remote host snorlax
image013
image013

Step 6: Adding a storage agent for handling disk replication

Adding a storage agent is required for handling disk replication state. The disk replication state will be used to boot the operating system image that is replicated from the local storage to the remote storage. Let us consider the storage agents salocal_A for customerA and salocal_B for customerB as local storage, and sarmeote as the remote storage for both customerA and customerB and is at the service provider location.

Use the following command to add the storage agent:

 ksysmgr add storage_agent <name> login=<username> password=<password> site=<sitename_associated> serialnumber=<storage_no> storagetype=<type_of_storage> ip=<ip_of_storage>
Figure 14. Storage agent details for KSYS clusters
image014
image014

The configuration is now ready with respect to KSYS, and customerA and customerB are registered to the service provider. So, the virtual machine under pbrazos and raichu host will be handled by service provider on disaster recovery, if there is any failure on the production site, all virtual machines get restarted on the service provider host snorlax which will act as service remote host.

Figure 15. Virtual machines to be handled under host pbrazos for customerA
image015
image015
Figure 16. Virtual machines to be handled under host snorlax for customerB
image016
image016

Step 7: Performing discovery of KSYS configuration

After completing the configuration on the KSYS node, discovery can be done by both clusters in parallel, but verification will not be done in parallel for both customers at the same time. For cluster01 on KSYS node "r7r3m116", site customer A is the production site. Whereas, for cluster02 on KSYS node "r7r3m108", site customerBis the production site. Because discovery is always on the production (active) site, there are no chances of conflict. Therefore, discovery can be run on both KSYS nodes at the same time.

After discovery is completed on both KSYS nodes, check for disk group and disk pair created. Below is the command to execute discover on the active (production) site. The following figure shows disk group details and discovery executed details for each KSYS node.

Command to discover site:

ksysmgr discover site <active_site_name>

Command to query a disk group:

ksysmgr query disk_group

Command to query a disk pair:

ksysmgr query disk_pair
Figure 17. Disk group details for both cluster01 and cluster02
image017
image017

Note: After discovery is completed, also verify that the composite groups are created on the respective storage agent using the symcg list command.

Step 8: Verification on both KSYS nodes

Usually, verification is done for the remote site to confirm whether the production site virtual machines will be able to restart on the remote site. Therefore, make sure that verification is not performed for each KSYS cluster at the same time because verification is performed for remote site, which is the same for both customers. There should not be any conflict. This article demonstrates the verification on the r7r3m116 KSYS node with the cluster01 KSYS cluster. It also verifies that the snorlax remote site host at the service provider location can restart virtual machines of the pbrazos production host. After verifying cluster01, user or admin can run the verify command on the KSYS node, r7r3m108, which has the KSYS cluster, cluster02. It will also verify on same remote site host (snorlax), whether the virtual machines will be able to restart at the production host raichu.

Use the following command to verify the site:

 ksysmgr verify site <active_site_name>
Figure 18. Verification on cluster01 at customerA location
image018
image018
Figure 19. Verification on cluster02 at customerB location
image019
image019

For a planned move, before moving the virtual machines, make sure that after verifying VM state should be READY_TO_MOVE state. The following figure shows the state of the virtual machines of production hosts, pbrazos and raichu. The command to check the state is ksysmgr query vm.

Figure 20. Virtual machine state after verification
image020
image020

The auto discovery and verification time can be changed for each KSYS system as mentioned below. This is required to avoid conflict of discovery and verification process, because same site is going to be used as the remote site for both customerA and customerB. Command to modify auto discovery time is given below.

(0) root @ r7r3m116: /
# ksysmgr modify system -?
 ksysmgr modify system
       [auto_discovery_time=<hh:mm>]
               hh - hour:   00 to 23
               mm - minute: 00 to 59
       [lose_vios_redundancy=<yes | no>]
       [auto_reverse_mirror=<yes | no>]
       [notification_level=<low | medium | high | disable>]
       [dup_event_processing=<yes | no>]
       [replication_type=<async | sync>   sites=<A,B>]
     modify => ch*, sets
     system => sys*

Step 9: Moving virtual machine on disaster or failure notification

Once the virtual machines are in the READY_TO_MOVE state, ensure that the virtual machines can be restarted on the remote site during a disaster or failure. When a failure is detected, KSYS will notify the service provider to move the virtual machines. After receiving a notification, the service provider will initiate a move. Here move also will be for each KSYS cluster at a different time. Moving virtual machines from the active site to the backup site in parallel is not possible because the remote site host is same from both KSYS clusters. Only those customer virtual machines based on the contract or agreement signed during service offering can be moved.

In this article, we have demonstrated how to move virtual machines for the KSYS cluster cluster01 that has production host as prazbos to remote or backup site. After initiating a move, all virtual machines of pbrazos host will restart on the service provider host, snorlax. Because this is a planned move, cleanup of the VM on the customer site will be done automatically. After the move for cluster01 is done, the move for cluster02 is initiated. You can use the following command to move a site:

ksysmgr move site from=<active site name> to=<backup site name>
Figure 21. Move initiated from customerA site to service provider site
image021
image021
Figure 22. Move initiated from customerB site to service provider
image022
image022

The following figure shows that all virtual machines for customerA and customerB are now restarted at the remote site (which is at service provider location). Here in case of concurrent disaster or failure, movement of VMs can be initiated by a service provider based on agreement done between customer and service provider.

Figure 23. Service provider host snorlax able to restart virtual machines for customer
image023
image023

After the production site configuration is rectified, the customer can ask the service provider to move back the virtual machines to the actual site. This will in turn save cost on extra hardware for customer.

For automated discovery and verification, system time should be kept in such a way that there is no operation performed by all customer controller nodes at the same time. So, it is better to maintain system time with respect to requirement.

Conclusion

This article briefly explained the use of IBM Geographically Dispersed Resiliency for Power Systems as a cloud disaster recovery management model with a service offering to the customer. With this solution, customer need not to buy hardware for disaster recovery. This can be handled by the disaster recovery agent service provider.


Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=1043287
ArticleTitle=Disaster Recovery as a Service (DRaaS) offering using IBM Geographically Dispersed Resiliency Solution for Power Systems
publish-date=02272017