Double-Take RecoverNow for AIX: Impressions and Corrections
I wrote a post in October 2012 giving my first impressions of Double-Take RecoverNow for AIX. This is a solution for Continuous Data Protection (CDP) that replicates data in real time. RecoverNow for AIX lets you replicate data between physical and virtual servers, and can be used for migrations. So it's a pretty valuable product if your business can't tolerate significant data loss or extensive downtime.
I've since had more of a look at RecoverNow for AIX, and I've run some successful disaster recovery tests. It lets you nominate a file system (or set of file systems) that you want to replicate, and when it comes to a failover to the Disaster Recovery (DR) server, those file systems get mounted ready for you to start the application and database.
I was particularly impressed with the ease of use of RecoverNow for AIX. For those who don't have the time or the inclination to learn how AIX all holds together, it's a product that lets you get going pretty quickly. Of course you need to have a Recovery host available in the first place, but once that OS is installed and storage has been assigned for data volume groups, the RecoverNow installation and use doesn't require either a lot of time or deep knowledge of RecoverNow or AIX.
So if you're looking into business continuity for your AIX system, read on.
When DR is a Disaster
In the various industries I hear about, I'd have to say that the business continuity plans range from excellent to poor to non-existent. In fact, in many – I dare say most businesses – DR is little more than a tick in the box with an annual test and little hope that a business can recover from disaster. Or sometimes there's a lot of hope but little evidence that things will work.
In my view, DR is still much neglected and businesses suffer heavy preventable data loss because they are inadequately prepared for even the most basic damage to their data. If that's how they want to run their companies, good luck to them. They'll need it. It all comes down to how much downtime your business can wear and how much data loss it can tolerate.
I've borrowed this picture from Vision Solutions. I think it explains very well how to assess possible availability solutions to avoid data loss.
Double-Take RecoverNow for AIX has a lot of positives as it allows you to failover to a recovery server very, very quickly. It's also got a superb feature which allows you to create a point-in-time snapshot on the recovery server without breaking the continuous replication from Prod to DR. More on this in a moment.
Whichever DR or CDP solution you go for, it needs to be able to work within your business' RPO and RTO. Although we all want 24 X 7 X 365 uptime, the reality is that some businesses, or at least some areas of the business, can quite easily sustain a prolonged outage. Other businesses or areas of a business are more truly mission critical.
The Emergency Answer: Triple Zero.
If you've never lived in Australia, you may not know that our emergency number for police, fire and ambulance is 000. Not 911 (although it probably would work). Why triple zero? Well, try asking a manager these three questions about their emergency disaster recover plans:
Q: How much downtime is acceptable for your business?
Q: How much data are you willing to afford to lose?
I'll spare you the third question other than to say it's to do with their DR budget.
Full disclosure and corrections
As I mentioned in my earlier post, I'm providing technical support for this product for the Australian distributor.
I've also got some corrections to make regarding my earlier post. Mostly these are due to some confusion I had between two different products: Double-Take RecoverNow for AIX and Double-Take for Windows.
The two solutions share a brand name, but they are not a suite built from the same technology, or sharing the same features and benefits.
Double-Take for Windows
Double-Take for Windows (often referred to simply as Double-Take) is an availability solution for Windows which provides “Plug-and-Play Application Availability for Windows”. This has many of the features I was speaking about in my earlier blog post:
A Unified Console
Management of physical, virtual, cloud and clustered environments
Performs operations on multiple servers at the same time
Policy-based recovery environments
My focus in this post is going to be on Double-Take RecoverNow for AIX and its surgical recovery features.
Database and App Protection
Double-Take RecoverNow for AIX protects databases and applications from data loss. It does this by keeping a journal of file changes, and replicating that journal to a recovery system.
The product has an AIX-feel about it. It can replicate at the logical volume level, it loads AIX filesets into the ODM, and it interacts very nicely with the AIX Logical Volume Manager (LVM). For example, the GUI lists volume groups and the file systems within the VGs that you want to replicate. This is a step ahead of other replication methods such as SAN replication. With RecoverNow for AIX, you can replicate a single file system, or a set of file systems, and you don't have to replicate an entire SAN LUN or even a whole volume group.
You nominate a file system (or a group of them) that need to be replicated from the source system (usually Production) to the target system. You have to unmount all file systems on the source server that you are planning to replicate, then start RecoverNow and then mount the file systems. In other words, stopping and starting RecoverNow replication on the source server requires an outage.
If you're using raw logical volumes, RecoverNow for AIX works well, too.
RecoverNow for AIX maintains an exact replica of the Prod server's data. That means all disk writes (creations, deletions and updates) are captured. These are ordered by time sequence, which will be very important for recovery to a particular point in time, even down to the second. This is a big improvement over other approaches such as snapshots that run periodically, such as every five minutes.
RecoverNow for AIX Useability
From what I've now seen of RecoverNow for AIX, I'd say that overall I've found it pretty cool. There were some parts I thought a bit clunky: mainly to do with the GUI. There were some cosmetics that I think could do with some improvement although this might be just a matter of personal preference.
Vision Solutions Portal
Version 4 of RecoverNow for AIX uses a GUI called the Vision Solutions Portal (VSP). The VSP allows you to do your failover, failback and monitoring using point and click. It's also a breeze to create a view of your file system at a particular time in the last few hours. The VSP is pretty easy to learn and use, especially if you're short of time and don't feel you need to learn the command line interface (CLI).
Using the Vision Solutions Portal (VSP), you can do a failover from a source system to a target system without even logging on to the command line.
It took me a little while to get used to the GUI and some of the terminology, but perhaps it's just a sign that I'm getting old and grumpy.
You can use the same VSP to administer Vision Solutions' MIMIX for IBM i. This would be helpful for shops that run both AIX and IBM i.
Enough of ascetics. How do you install it?
I created a simple configuration for testing RecoverNow for AIX. I had two AIX hosts, both running AIX 7.1. The production host was the source host, and the Disaster Recovery host was the target. I've also tested RecoverNow on AIX 5.3 and it worked without any difficulties.
The first step was to install the VSP. You can install the VSP on a separate Windows server using an install wizard. The installation was a breeze. It took me 90 seconds.
If you want, you can simply connect to the VSP running on one of your AIX servers. I like the fact you have this as an option. If you're familiar with the Integrated Virtualization Manager (IVM), you'll get the idea. You can run the IVM GUI but it's really just running an interface to the Virtual I/O Server (VIOS). You don't need to have a separate appliance such as a Hardware Management Console (HMC), or load an application onto a Windows environment before you can connect to the VIOS. Similar idea with RecoverNow for AIX. I can imagine you'd use it in a situation where the Windows server is not available.
Installation onto AIX
After you install the VSP, the RecoverNow for AIX agent has to be installed on each of the AIX hosts (usually one production host, and one recovery host).
The basic installation is very straightforward. I installed it onto two logical partitions, each running AIX 7.1, but it's supported on AIX 6.1, and even back to AIX 5.3 ML 4. (The “ML” is for maintenance level, as the terminology was back then before the days of Technology Levels and Service Packs. Hopefully, you're not still running AIX 5.3, although I'm still finding sites who are asking to upgrade from 5.3 to 6.1 or 7.1. They never look back.)
Installing onto AIX is quite easy. The installation can be done from the installation wizard or from SMIT, depending on your preference. You need to be able to access the AIX hosts from the VSP using SSH. If you're connecting via hostname, then the Windows server where you have installed the VSP needs to be able to resolve the names.
Once host name resolution and SSH access is in place, you can log in using root or another user (such as scrt). Non-root users need to be members of the group scrt which gets created as part of the AIX installation.
There are a few smarts in the installation that I really liked. For example, the installation won't proceed if there's an existing version of RecoverNow for AIX. Also, there are some pre-checks for sufficient spare file system space before the RecoverNow installation proceeds. Insufficient free space in a file system is the usual showstopper I've run into.
You then create a Replication Group. Here's where you specify the servers you want to be in the group. In my case, it was a single Prod (source) server and a single Recovery (target) server.
Then you get to choose which logical volumes you want in your replication group. If you want to replicate a file system, then select its logical volume. You can have several LVs in a replication group.
As data changes, those changes are captured and written to Replication Group containers. You don't need to wait for the containers to be 100% full before the next replication happens.
There's a lot more detail I could go into. If you're interested, you're best to look at some of the documentation about Double-Take RecoverNow for AIX on the Vision Solutions web site. For now, I'd like to point out some features I've found helpful, and include answers to some questions I had raised about the product.
I started by replicating a single file system from source to target. However, using the VSP, it was very easy to nominate additional file systems and/or raw logical volumes.
Control of Logical Volume layout
I mentioned in my October post my (slight) concern that the software doesn't give you enough control over logical volume layout on the target system. I've since discovered that you can create the logical volumes outside of RecoverNow. Then, when the initialization comes along, it will recognize that the LV is already there and won’t recreate it.
Striping of LVs
I'm no great fan of striping of logical volumes at the operating system level, since I think striping via the SAN gives you much more flexibility. However, if you have to do OS striping, there's a way of doing it with RecoverNow. Get the Vision Solutions Portal (VSP) to create the LV first. You can then take a screen capture of how many LPs are assigned to it. You can then remove the LV using rmlv, and recreate it with the striping layout you want.
This is a workaround that is probably not as messy as it sounds. Anyway, as most people are not using software striping at the Logical Volume level, I don't think it's likely to be an issue.
Nested File Systems
One question I didn't mention in my previous post was about nested file systems. With nested file systems, it's important that the parent file system get mounted before the child. When unmounting, it has to happen in reverse order.
The good news is that RecoverNow handles nested file systems in the correct order, both with mounting and unmounting. Nice.
If you're using the CLI, the commands to start and stop RecoverNow for AIX are rtstart and rtstop respectively. The CLI uses rtmnt to mount file systems and rtumnt to unmount them. These commands ensure that file systems are sorted in the correct order, so that nested file systems get mounted and unmounted correctly.
So then it was simply a matter of initiating a failover. This could be a planned failover, such as you might do for a DR test, or an unplanned failover. With the planned failover, you quiesce the application and database, and let RecoverNow unmount file systems.
Then any final replication of data is done from Prod to the Recovery server, and the file systems on the Recovery server are ready to mount.
When we think of Disaster Recovery, we usually think of the hole-in-the-ground which has wiped out an entire data centre. However, most disasters (thankfully!) are far less exciting. They can still be damaging and have a major impact on business.
The surgical recovery aspect of RecoverNow for AIX is what I find very attractive. For example, suppose a developer logs in to the development server and drops a table. (For those not familiar with database administration terminology, “drop” a table means delete it.) Now the developer realises that he was, in fact, logged into production, and has dropped a critical table on prod. That change may have been replicated already from Prod to the recovery server. No problem.
On the recovery server, create a snapshot of the database file systems from a point in time prior to the accidental drop of the table. Dump the data and send it back to prod, where the table can be rebuilt quickly.
These surgical recovery features of RecoverNow for AIX are very useful. They not only allow minimal or zero data loss; they also let you get your database back to a working status very quickly.
CDP with an AIX feel
I'm pretty happy with what I've seen so far of RecoverNow for AIX. It looks like an AIX solution, and it's nice to have a product that is tuned for real-world users. There were some cosmetics that I think could do with improving, but overall these are minor things, in my view. I've seen a few sites now that are using RecoverNow for AIX and they have done some very successful – and very quick – DR tests. It's also got some impressive features for surgical data recovery (such as when you need to recover a single database table rather than an entire database or operating system).
If you want to find out more, check the Vision Solutions web site.