I recently got a request from a business partner about how to plan for disaster using IBM technology. As many people know, the short answer is that it depends. What I'll try to do is to build a disaster environment across multiple sites connected by WAN on the "cheap".
Here's the scenario. Retailer just like any other company needs to have disaster recovery (failover), disaster avoidance / disaster resilience (availability across multiple sites) plan. Retailer currently has two sites connected by a WAN with 6 meg connection. Their application is JEE based using on WebSphere Application Server, DB2 Workgroup and WebSphere MQ. They're looking to have an hot-warm (active-standby) configuration to save a bit instead of the usual hot-hot (active-active) configuration.
With this situation, you can have a range of hardware only, software only or blend of hardware and software. Let's list out the options
* Hardware: SAN at each location and doing flash copy between the two locations. * File system: Using something like GPFS and replicating the file system across a WAN.* Middleware: Cluster WebSphere Application Server using WebSphere Application Server Network Deployment. Cluster DB2 Workgroup using DB2 Workgroup High Availability Disaster Recovery (HADR) feature or WebSphere Replication Server or IBM InfoSphere Data Capture (DataMirror).
If you look at the options the most cost effective solution (and bandwidth friendly) is just to replicate just the data between the two sites. So what is the pros/cons of the 3 database replication options.
DB2 Workgroup High Availability Disaster Recovery (HADR) featureHow does it work: Works by moving the transactions logs from one database server onto another and running them again to duplicate the data.Pros: It's free with DB2 Workgroup. Does full database replication only. Cons: You're talking about failover in minutes. Only replicates on exact same systems. One way replication. No network compression and encryption.
WebSphere Replication ServerHow does it work: Multiple ways, it can keep a separate table that "tracks" the changes or move and run transactions logs. Replicates via SQL replication (socket) or Q replication (WebSphere MQ).Pros: Does row replication. Seconds failover and replication. If using queue replication, moves delivery of data replication to WebSphere MQ. Replication can be done on different database vendors. Can do one way, both way and peer to peer replication. Has network compression and encryption features.Cons: If you really want to do it right, you'll have to buy WebSphere MQ.
IBM InfoSphere Data Capture (DataMirror)How does it work: Similar with WebSphere Replication ServerPros: Replication can be database or applications or data warehouse. Built not just for replication but also for transformation of data.Cons: Not optimized for replication per se.
Now that we have looked at all the options, here is my take on the best, cheapest solution. Obviously HADR gives you a lot if you can live with the limitations. In this configuration, I would use HADR between site 1 and site 2 and have people only write to site 1 and read from site 1. I would only have to pay for 1 DB2 Workgroup server licenses (site 1 server 1). Now the problem is (assuming I can live with all the limitations of HADR) is how do I replicate the data from site 2 back to site 1. The sad answer is that it's a manual process. So the next best option (and best IMHO) is a blend of HADR and WebSphere Replication Server. I would have the same HADR configuration with site 1 and site 2 using HADR. I would have write happen at site 1 and read from site 1. I would then have WebSphere Replication Server to sync the data when site 1 comes back online.
I would also note that if I wanted a hot-hot (active-active) configuration across 2 site connected by a WAN, the should would be similar but I would have 2 DB2 Workgroup Server at each site. Site 1 Server 1 is connected via HADR to Site 1 Server 2. Site 2 Server 1 is connected via HADR to Site 2 Server 2. Site 1 and Site 2 are connected by WebSphere Replication Server. Getting this next level of DR would only be an increment of 2 additional servers and 1 more license of DB2 Workgroup.
Just as a summary of using DB2 HADR and WebSphere Replication Server would allow:
* Allows the fastest switchover with transaction-consistent data* Excellent solution for scheduled outage* Allows flexibility of OS level, DB level, application level, data format* Can be easily tested and monitored* Allows for database read or write activity on secondary* Can supplement other HA solutions* Allows for lower cost hardware or platform* Low impact on source applications* Choice of conflict detection options. One-site wins vs time-based
Disadvantages* Asynchronous* Application awareness is required (triggers, generated always columns if p2p)
Here are some article that you might want to check out: An Overview of High Availability and Disaster Recovery for DB2 UDB (circa 2003), Q Replication recovery (circa 2007)
I'd also like to thank fellow IBMers Derek Botti, Peter Inzana, David Tolleson, and Wendy Tam for educating me about all these DR / HA options.
High availability and disaster recovery / avoidance / resiliance over a WAN