Everyone who works in mission critical environments understand the need of having effective disaster recovery solution. Organizations demand disaster recovery operations fully automated and could be executed in a repeatable manner making them always ready for disaster situations. In addition, organizations always demanded seamless migration of applications across the sites for planned activities.
What is IBM and VMware’s joint DR solution in a virtualized environment?
IBM SAN Volume Controller (SVC) stretched cluster with VMware Site Recovery Manager (SRM) providing support for stretch cluster (announcement link) is an ideal combination for disaster recovery solution using IBM Storwize Family Storage Replication Adapter (SRA). It offers customers the ability to survive a wide range of failures transparently by planning for disaster avoidance, disaster recovery and mobility. This solution also offers planned live migration of applications running on virtual machines across the sites by orchestrating cross vCenter vMotion operations, enabling zero-downtime application mobility.
Solution Overview
IBM SVC is an industry leading storage virtualization solution that can virtualize storage devices practically from all other storage vendors. With stretched cluster implementation, customers can enjoy active-active configurations with servers and ESXi hosts can connect to storage cluster nodes at all sites. It helps to create balanced workloads across all nodes of clusters and provides disaster recovery capabilities in case of site failures.
VMware SRM can be seamlessly configured with IBM SVC stretched clusters using IBM Storwize Family SRAs. For configuring the solution, SVC nodes are set up in stretched cluster configuration with ESXi servers able to access storage across both the sites. Quorum site is set up as per IBM SVC stretched cluster configuration requirements to resolve tie-break situation in case of link failure between the two main sites. Each VMware vCenter server is configured to manage the ESXi servers at each site. VMware SRM is installed on each site to configure and automate the disaster recovery solution.
How to configure solution?
There are documents available individually describing IBM SVC stretched cluster and VMware Site Recovery Manager and their benefits and respective configuration details. Purpose of this blog is to touch key steps and guidelines required to configure solution together for planned and unplanned downtimes.
What configuration is needed on SVC?
- Configure SVC in a stretched cluster mode
SVC supports stretched cluster configuration for some time now. Stretched cluster implementation allows the configuration of two nodes in an I/O group which are separated by a distance between two locations. These two locations (sites) can be two racks in a data center, two buildings in a campus, or two labs between supported distances. A third site is configured to host a quorum device that provides an automatic tie-break in the event of a potential link failure between the two main sites.
- Configure mirrored volume on a SVC stretched cluster
In SVC, volume mirroring feature is used to keep two physical copies of a volume. Each volume can belong to a different pool. In case of stretched cluster feature, a mirrored volume can be configured from the external storages across two physically separated sites.
Any special need for vCenter and SRM installation for supporting this solution?
- vCenter installation
SRM stretch cluster support takes advantage of vSphere’s ability to perform vMotion across the sites and across the vCenter server instances. Therefore, the two vCenter server instances will need to be configured (at protected and recovery sites) in enhanced linked mode to enable cross vCenter vMotion.
- SRM installation at protected and recovery sites
Install SRM server instances at protected and recovery sites and register SRM server instances with Platform Service Controllers at each site respectively.
Where does IBM SRA come into picture?
IBM Storwize Family SRA is a software add-on that integrates with SRM to run the failover. It extends SRM capabilities and uses replication and mirroring as part of the SRM comprehensive Disaster Recovery Planning (DRP) solution. IBM Storwize Family SRA is installed at protected and recovery site and it works with SRM instance to run failovers.
What’s new while creating vSphere storage policy?
Site Recovery Manager 6.1 adds a new type of protection group which is a storage policy-based protection group. Storage policy-based protection groups use vSphere storage profiles to identify protected datastores and virtual machines. They automate the process of protecting and unprotecting virtual machines and adding and removing datastores from protection groups. In order to easily identify IBM storage objects in vSphere inventory, you can create an IBM storage tag to create tag rule based storage policy and then associate stretched datastore to a storage policy using IBM storage tag based rules.
How to configure SRM for this solution?
- After pairing the sites together, IBM Storwize Family SRA should be registered with the SRM server instances at primary and recovery sites and then configure array manager using SVC nodes.
- Configure bidirectional Network Mappings, Folder Mappings, Resource Mappings, and Placeholder Datastores Mappings between protected and recovery sites.
- NEW ⇒ SRM 6.1 allows you to configure storage policy based protection group using storage policy mappings. When the storage policy at the protected site is mapped to storage policy at the recovery site, SRM places the recovered virtual machines in the vCenter server inventory and on datastores on the recovery site according to the storage policies that is mapped to on the recovery site.
- NEW ⇒ Storage policy based protection group enables automated protection of virtual machines that are associated with a storage policy which in turn are created by tagging them to reside on a particular datastore. When a virtual machine is associated or disassociated with a storage policy, SRM automatically protects or unprotects it.
- Configure a recovery plan using storage policy based protection group.
Why to test recovery plan?
The tested recovery plan make the environment ready for disaster recovery situations by running almost every aspect of a recovery plan. It is strongly recommended to test the recovery plan for planned migration and disaster recovery situations to avoid surprises.
Okay. I've recovery plan but what’s next??
Failover and reprotect recovery plan: After successfully testing a recovery plan, recovery plan is ready for either planned failover or disaster recovery situations. After fail over, recovery site becomes primary. SRM provides reprotect function to provide automated protection in a reverse direction.
Hopefully above steps will give overview of various configuration steps required to setup a solution and plan accordingly. For additional details related to the configuration, refer technical guide Implementing disaster recovery using IBM SAN Volume Controller and VMware Site Recovery Manager.
Disclaimer : These are my personal views and do not necessarily reflect that of my employer.