A Global Replication Service Solution Using IBM Power Virtual Server

The Global Replication Service reflects IBM’s commitment to enabling business continuity planning, operational excellence and cost optimization with IBM Power Systems Virtual Server.

Data replication is key to business resiliency because simply put, data drives decision-making. Data informs and feeds into mission-critical processes, analytics, systems and, ultimately, business insights. Organizations must guarantee that it is constantly available and accessible to users in near real-time. As enterprises develop across geographies and platforms, replication enables them to scale in tandem with their expanding data requirements while maintaining performance.

Solution overview

IBM Power Systems clients run mission-critical workloads. To guarantee business continuity in uncertain conditions, a secure, highly available and disaster-recovery solution is necessary. Global Replication is a valuable feature for high availability and disaster recovery because it keeps your data offsite and away from the premises. If the primary instance is destroyed by a catastrophic incident—such as a fire, storm, flood or other natural disaster—your secondary data instance will be secure off-premises, allowing you to retrieve data. Data replication off-premises is far less expensive than duplicating and keeping data in your data centre.

IBM Power Systems Virtual Server now brings a Global Replication solution that provides the replication capability to your workloads by maintaining the benchmarks for Recovery Time Objective (RTO) and Recovery Point Objective (RPO):

Global Replication Service (GRS) is based on well-known, industry-standard IBM Storwize Global Mirror Change Volume Asynchronous replication technology. Global Replication Service on IBM Power Virtual Server exposes cloud the application programming interface (API)/command line interface (CLI) to create and manage replication enabled volumes.

The benefits of Global Replication on Power Virtual Server include the following:

Maintain a consistent and recoverable copy of the data at the remote site, created with minimal impact to applications at your local site.
Efficiently synchronize the local and remote sites with support for failover and failback modes, helping to reduce the time that is required to switch back to the local site after a planned or unplanned outage.
Replicate more data in less time to remote locations.
Maintain redundant data centres in distant geographies for rapid recovery from disasters.
Eliminate costly dedicated networks for replication and avoid bandwidth upgrades.

IBM also provides the automation toolkit for GRS.

This tutorial focuses on two ways to use the new GRS API/CLI to build the disaster recovery solution:

Setting up replication from scratch
Setting up replication using existing volumes

The feature is currently enabled in two data centres: DAL12 and WDC06

Set up for Global Replication Services

The data centres for IBM Power Virtual Server are set up to have all the required configuration needed to offer replication capabilities. Supported storage controllers Tier1/Tier3 will be pre-configured to use Global Mirror Change Volume (GMCV) replications. Global Replication Services (GRS) provide the replication at the storage level by making use of IBM Storwize GMCV asynchronous replication technology. In this case, the first initial sync copies the entire data from master to auxiliary; going forward, only the delta changes are synchronized with the periodic interval of 500sec. This means the maximum RPO will be around 15 minutes.

Upon every creation of replicated volumes, four copies of volumes are created across two sites:

Master volume on site1.
Master change volume on site1 to store the delta changes.
Auxiliary volume on site2.
Auxiliary change volume on site2 to update the delta changes.

The solution uses a remote copy consistency group to ensure that the data spread across multiple volumes is consistent while it is copied across the remote site. Also, it helps to switch the replication direction is there is a planned and unplanned disaster. GRS API/CLIs can be used to create and manage replicated volumes and consistency groups.

IBM Power Virtual Server has DAL12/WDC06 data centres enabled to use Global Replication Services APIs. This means that if you are using DAL12 as a primary site, you will have auxiliary volumes created on WDC06. Similarly, if you are using WDC06 as a primary site, you will have auxiliary volumes created on DAL12. The site where volumes are created or enabled for replication is the primary site.

Once we have the volumes replicated at both primary and secondary steps, we can use the steps outlined in the section below on disaster recovery workflow to bring up the standby VM using the replicated volumes.The solution uses a remote copy consistency group to ensure that the data spread across multiple volumes is consistent while it is copied across the remote site. Also, it helps to switch the replication direction is there is a planned and unplanned disaster. GRS API/CLIs can be used to create and manage replicated volumes and consistency groups.

Disaster recovery workflow

As an example, let’s say we have an AIX VM running an Oracle DB application workload and the DAL12 data centre serving as the primary site, and we need to enable the global replication for the data volumes to recover the Oracle database:

Below are the steps to enable the replication of your application workload running on the primary site and make it ready to trigger failover/failback:

Create/enable the volume replication (it will create replicated volumes on both sites).
Create the volume group (this will create the replicated consistency group in the storage backend).
Update the volume group. Add the replication-enabled volumes to the volume group.
Switch to the secondary site.
Onboard auxiliary volumes. Now aux volumes and volume groups will be visible to site2.
Provision standby VM and attach aux volumes.
VM is ready for failover/failback.

Failover/failback

In case of disaster (i.e., primary site failure or storage failure), you will lose access to the storage volumes, and they will be marked as ERROR. The replication relationship will be disconnected, and consistency group will move to “consistent-disconnected.” The volume group primary role will be assigned as blank.

In this situation, no new replication operations are allowed, as replication is broken. You can only access existing workloads by powering on the standby VM and auxiliary replication volumes from the secondary site after giving them read access. This is accomplished by following the steps below:

Access auxiliary volumes on primary site failure.
Failover or switch volume group role to secondary.
Failback to primary site.

Disabling the replication

Disabling the replication means deleting the auxiliary volume from the remote site. Before disabling the replication, make sure that it is not associated with any group. Since there are two sites, we should follow the below procedure for disabling the replication.

Remove the volumes from the volume-group from the primary site.
Disable the replication of a volume.
Remove the volumes from the volume-group from the secondary site.
Delete the auxiliary volume from secondary site.

Bill and charging

You are charged from the location where you create a replication-enabled volume. No charges for the auxiliary volume from the remote site.

The volume of size X GB is charged based on following two components:

The master volume is charged 2x size based on its Tier under the existing part numbers for Tier 1 and Tier 3.
Replication capability cost is charged $Y/GB under a new part number “GLOBAL_REPLICATION_STORAGE_GIGABYTE_HOURS” that is independent of volume tier.

Upon a site failure due to a catastrophe, metering is not available from the failed site. The auxiliary volumes are charged from remote site for its 2x size based on its tier. There is no replication capability cost for any replication-enabled volume.

Conclusion

The introduction of the Global Replication Service reflects our commitment to enabling business continuity planning, data centre efficiency, operational excellence and cost optimization with IBM Power Systems Virtual Server.

Business continuity planning keeps your business running with reliable failover solutions, including backup, high availability and disaster recovery. Data centre optimization accelerates time to value, business expansion and worldwide growth by optimizing your data centre. Operational excellence and cost optimization reduce operational costs, improve service and response times, and ensure off-hours coverage.

References

Please refer to the following resources for additional information:

Author

Val Besong

Senior Product Marketer

Chhavi Agarwal

Software Developer

Imranuddin Kazi

STSM

Chief Architect - IBM PowerVC

Anu Jalan

Senior Developer

Power Cloud IaaS Development