October 4, 2013 | Written by: Ignacio Macias Jareño
Share this post:
In the first part of this series we took a look at the most common solutions for providing disaster recovery. Now, I’d like to show how this architecture can be changed and what additional advantages it can bring in a cloud solution.
Before reviewing each of the layers and seeing what may change in a cloud environment, I want to remind you of one of the key characteristics of cloud computing, which is rapid elasticity. This means that capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly. The capabilities available for provisioning can be seen as unlimited and can be appropriated in any quantity at any time. In other words, we can automatically or semi-automatically provide new servers into an existing architecture and we can think we have an unlimited amount of resources that can be used at any time.
In a cloud solution the location of our servers may not be limited to two cloud locations; we may be using a private cloud solution and/or public cloud solutions which may belong to one or different cloud providers. If we are using the same public cloud provider, the servers may still be located in different cloud locations.
Web and application layer
The web and application layer approach should be very similar; we should have servers in multiple locations with a application traffic balancing mechanism to distribute the application traffic between the servers and the sites. Depending on the architecture of the application these layers can be deployed in two or more sites, the minimum should be two to provide a robust disaster recovery solution. What changes in a cloud environment is the sizing of the number of servers per site, instead of having a fixed number of servers you need to have automatic scaling, whenever you need more servers to support the application traffic you automatically provision more servers and the application traffic decreases you deprovision servers. This mechanism will allow making an effective use of your resources, during peak hours you will be able to support all your application traffic requirements and during non-peak hours you will reduce the number of servers and therefore reduce your infrastructure costs. In case of disaster recovery initially the capacity will be the one of the remaining sites and automatic scaling will grow the number of servers to match the application traffic requirements.
For approaches were no application traffic balancing mechanism exists, instead of using storage mirroring or having a stand-by server. I would recommend the usage of automatic provisioning, in case where storage mirroring no longer you will have to pay all the time the infrastructure costs of the secondary site. With this automation you will only pay it when you deploy the server in a disaster recovery situation. In you are using stand-by servers, the question will be how long it takes you to provision an instance of that server. If this provisioning time is higher than your RTO ( recovery time objective) a stand by image should be used in case not, automatic provisioning should be use as this will save the infrastructure costs of having resources at the secondary site.
When we look at the database layer our ultimate goal should be to apply the same concepts of the web and application layer. Have an automated way to scale horizontally the database tier but this solution continues to remain an elusive goal in most of the cases if we are using relational databases which is the type of database most commonly used.
In my previous post, the three mechanisms proposed for the disaster recovery of this layer were storage mirroring, so mirroring and application based mirroring. Let’s examine each on of them and see how a cloud solution affects them.
Storage mirroring may be possible in private cloud solutions but in public cloud solutions it’s not common to have this possibility. A fiber channel connection is required between multiple sites and the connectivity between sites in cloud environments is normally only IP network. If you are using one public cloud provider this may be resolved by the provider but if we are using multiple cloud providers the only connectivity option available will be trough IP network so in most scenarios this option may not be possible.
Host-based mirroring may be possible in private cloud solutions and public cloud solutions. As I have mentioned in the previous paragraph if for host based mirroring a fibre channel communication is required it may be an issue. If host based mirroring does not require fiber channel communication this solution can be used in cloud environments.
Application-based mirroring is a solution where the replication of the data is done at the application level and the requirements between the servers replicating is IP network connectivity between them. This type of solutions allows database servers to have a master database which is the active database and various subordinates (servers where the data is being replicated). These subordinates can be in different sites in different cloud solutions of different providers or at the same site, so you may have four copies of the database with the same data in four different cloud solutions. This was not possible in traditional scenarios where having a third site for a couple of servers had a high infrastructure costs because of the floor spacing, cooling and energy, but with cloud computing this is possible as the infrastructure of having some servers in public cloud environment is more economical.
Another advantage of this type of solution is that subordinate databases can be used in a read-only mode. By adding an additional layer between the application and database a mechanism could be used to send write requests to the master and the read requests to the subordinates, this would make your database layer work in a more effective manner. By offloading application traffic from the master and using the secondary databases.
These are not the only mechanisms available for these three layers and these ways of resolving disaster recovery are not the only ones but they do cover a big part of the cases I have seen. Every architecture needs to do a deep analysis to see which the best strategy for their disaster recovery is.
As a summary, I have two final recommendations for disaster recovery solutions. First, have an automated way for deploying your servers; this will allow you to move from the benefits of auto scaling and the ability to migrate cloud servers between different providers. Second try to base any replication mechanism using IP network as this would allow you the highest flexibility when choosing where to place your cloud workload.