How will cloud affect your disaster recovery? (Part one)

Share this post:

Every time I see that a new architecture is deployed, one of the main worries of clients is the disaster recovery part. We all have an idea on how to solve this in a non-cloud solution, but should we do the same in a cloud environment? Although this is possible, I think new ways can be used to solve the disaster recovery solution of an application, and make it a more robust solution that takes benefit from the options of that cloud offering. First let’s see how this is solved in traditional environments.

Traditional solutions

Most clients I see using non-cloud solutions have a two data center setups, where one is the primary site and the second one is the secondary site. These two data centers are normally located in the same city.

Web and application layer

If we use the three-tier design which has a web layer, an application layer and a database layer as the reference architecture for any application, the disaster recovery for the web and application layer is normally solved the same way. We have one or more servers of each layer in both sites which are part of the architecture and work in an active/active configuration. This means we have all the servers in both data centers working as part of the application. If a server goes down, the rest of the servers will receive all the application traffic. If a site goes down, the remaining servers will receive the entire application traffic. In order to use this you require an application traffic balancing mechanism to distribute the application traffic across the web and application layer.

In occasions where there is no possibility for application traffic balancing, the most common workarounds are storage replication or having stand-by images. Storage mirroring works by replicating all the disks that a server has from a primary site to a secondary site. In case of a disaster recovery we need to have hardware which was provisioned previously and present the disks to it. Assuming we have the same hardware, firmware levels and network configurations you will have the same server but now running in secondary site.

A stand-by image solution consists of having, at the secondary site, an active server which will have the same configuration as its equivalent in the primary site but will not receive any application traffic. Making this server have the same configuration can be done in a manual fashion or in some cases it can be integrated into an application cluster in the way it can receive configurations changes and updates but does not receive any application traffic. In case of disaster recovery, a manual configuration needs to be done to make the server receive application traffic.

Database layer

At the database layer I see these three choices as the most common techniques used to provide the disaster recovery solution:

  • Storage mirroring
  • Host based mirroring
  • Application based mirroring

Storage mirroring works the same as described in the application and web layer but let’s take a more detailed look on how the data can be replicated. The replication can be configured in a synchronous (sync) mode or asynchronous (async) mode.  In a synchronous mode a write is not considered complete until acknowledgment is received from the primary and secondary storage system. This means that every time something is modified until it’s not written in both sites, the write is not considered complete. In an asynchronous mode the acknowledgment of the write is received as soon as it’s completed in the local storage. This means that the system does not need to wait for the secondary storage system to complete its write. Synchronous mode guarantees that any write operation is always in both disks but with a performance penalty as it takes more time to write in the two storage systems versus to writing only on the local storage system.

To provide connectivity between the storage systems a fibber channel connection is used, this can be based in “dark fiber” using DWDM networks or over an IP network. The usage of DWDM is faster and has higher costs versus an IP network connection. In synchronous configurations the first option needs to be used as it provides low latency. In asynchronous configurations both options may be used although the recommend solution is the usage of DWDM because of its lower latency.

Host based mirroring is very similar to storage mirroring; the difference is that the mirroring is controlled by the OS or a software. In this configuration the host is the element that replicates the data to the remote site. This mode can work in sync or async mode and shares the same performance penalty as sync storage mirroring. The communications channels used to replicate the data can be FC channel based as described for storage mirroring or using an ip network connection between the servers.

Application based mirroring is when the data mirroring is done at the application level, the server at each site is up and running and it’s the application running on the OS who is charge of replicating the data.  The application has its own mechanism to replicate all the data to another server, normally database servers use log shipping to send transaction logs from one database (the primary database) to another (the secondary database). The log shipping mechanism may also be configured to work in a sync or async mode. The communication channel used to replicate the data is an IP network connection between the servers and therefore no fiber channel connection is required between the sites.

These are not the only mechanisms available for these three layers and these ways of resolving disaster recovery are not the only ones but they do cover a big part of the cases I have seen.  Every architecture needs to do a deep analysis to see which the best strategy for their disaster recovery is.

In this first post we have taken look at the common architectures used in non-cloud solutions for providing disaster recovery. In the second post we will see how these architectures can be changed and what additional advantages it can bring in a cloud solution.

Click here to read Part Two of this series.

More stories

Why we added new map tools to Netcool

I had the opportunity to visit a number of telecommunications clients using IBM Netcool over the last year. We frequently discussed the benefits of have a geographically mapped view of topology. Not just because it was nice “eye candy” in the Network Operations Center (NOC), but because it gives an important geographically-based view of network […]

Continue reading

How to streamline continuous delivery through better auditing

IT managers, does this sound familiar? Just when everything is running smoothly, you encounter the release management process in place for upgrading business applications in the production environment. You get an error notification in one of the workflows running the release management process. It can be especially frustrating when the error is coming from the […]

Continue reading

Want to see the latest from WebSphere Liberty? Join our webcast

We just released the latest release of WebSphere Liberty, It includes many new enhancements to its security, database management and overall performance. Interested in what’s new? Join our webcast on January 11, 2017. Why? Read on. I used to take time to reflect on the year behind me as the calendar year closed out, […]

Continue reading