Business continuity planning overview
In today’s world, business continuity planning (BCP) is imperative to the sustainability of your business. Without a well-thought-out plan in place, it is highly unlikely that your company will be able to survive and recover from disasters.
The BCP involves creation of a strategy through the recognition of threats and risks facing a company, with an eye to ensure that personnel and assets are protected and able to function in the event of a disaster.
Business continuity is a proactive plan to avoid and mitigate risks associated with a disruption of operations. It details steps to be taken before, during and after an event to maintain the financial viability of an organization. People often mistake disaster recovery as business continuity plan. It is in-fact a reactive plan for responding after an event.
However important the BCP is, there are several major roadblocks to the successful implementation such as business priority, prohibitive costs, high complexity, and willingness of the people involved.
Having a BCP enhances an organization's image with employees, shareholders and customers by demonstrating a proactive attitude. Additional benefits include improvement in overall organizational efficiency and identifying the relationship of assets and human and financial resources to critical services and deliverables.
The BCP help mitigate risks from potential disasters that include
- Natural disasters
- Accidents / Sabotage / Cyber attacks
- Power and energy disruptions
- Communications, transportation, safety and service sector failure
Solution Overview
The figure shows the solution overview.
When building this solution, start with the foundation layer of storage. Here we look at two sites at metro distance connected using Fiber channel connectivity for HyperSwap configuration. On both sites, we deploy IBM FlashSystem A9000 or A9000R storage systems. We are using storage consistency groups for this solution. First, consistency groups are created on IBM FlashSystem A9000 storage systems and they are configured in a HyperSwap configuration. The required volumes are created on the storage systems on both the sites and they are also configured in HyperSwap configuration. Note that the configuration is automatically activated as soon as it is created. These volumes are then moved to the consistency group. The volumes created are mapped to the ESXi servers on both sites and are used by ESXi servers in Active/Preferred and Active/Non-Preferred configuration. For more details on HyperSwap configuration, refer to IBM HyperSwap for IBM FlashSystem A9000 and A9000R Redbook.
After the storage configuration is in place, we create the VMware infrastructure. We are using two VMware vCenter servers deployed across two sites but pointing to the common Platform Service Controller (PSC). We create storage policies on both the VMware vCenter servers using custom built tags to associate datastores created on IBM FlashSystem A9000 HyperSwap volumes. We use datastores created on IBM FlashSystem A9000 HyperSwap volumes to create virtual machines.
After installing VMware Site Recovery Manager Instances on both the sites, IBM Spectrum Accelerate Storage Replication Adapter (SRA) is installed on the Site Recovery Manager Server instances on both the sites. IBM Spectrum Accelerate Storage Replication Adapter can be downloaded from VMware’s Site Recovery Manager Website.
IBM FlashSystem A9000 systems are then configured to VMware Site Recovery Manager across both the sites. This solution uses storage policy based protection groups which are tagged to datastores created on IBM FlashSystem A9000 storage HyperSwap volumes in Site Recovery Manager. Recovery plan is created using the storage policy based protection group. After recovery plan is created, it is good practice to test recovery plan by performing failover / failback of the virtual machines from both sites to make sure Site Recovery Manager is correctly configured and working fine.
Once the VMware infrastructure is in place, we convert our virtual machines on respective sites as Oracle RAC cluster nodes. In this solution, we have created 2 Oracle real application clusters on each site consisting of virtual machines available locally. The VM’s on respective sites share the data.
The benefit of this configuration seen in the event of failover scenarios described in Table below
# |
Scenario |
Result |
1 |
System A full failure |
System B is Master |
2 |
System A partial failure (e.g. OOP) |
System B is Master System A local volume is unavailable for I/O |
3 |
A<->B connectivity failure |
System A is Master System B local volume is unavailable for I/O |
4 |
System B failure |
System A remains Master |
5 |
Quorum Witness failure |
High availability is down |
Table 1: Failover scenarios and expected results
In each case based on the severity of the situations, administrators can decide to perform planned / un-planned migration of the VM’s from one site to another.
It does not have to be a disaster all the time in order to test the setup, so during planned failovers, the solution will work just as expected.
Full Solution Overview and demo of the solution is available at YouTube https://youtu.be/K7tF2dB6LSI
Visit us @ OOW-2017 (San Fransisco) booth #1107 to learn more about above solution.
Blog Authors: Shashank Shingornikar and Mandar Vaidya