I'm providing some AIX support for Vision Solutions' Australian distributor, Availability Solutions, as I mentioned in an earlier blog post. This has given me access to the continuous data protection (CDP) product, DoubleTake RecoverNow for AIX. I've given it a test run and here are my first impressions.
(Note that this is not intended as a comprehensive product review or a sales pitch. It's a techie's take).
RecoverNow for AIX is simple to use, very flexible and is suitable for Disaster Recovery / Business Continuity. It allows for very rapid recovery with little or no data loss or downtime, if it's set up correctly.
A huge hidden plus is its usefulness for migrations.
Here's a little more detail of the Pros, the Cons and the Spinoffs. First, let's look at some of the positives.
DoubleTake gets a big tick for simplicity of use. The install would be no challenge for anyone who has access to a GUI. It's got a unified console, and it lets you manage physical, virtual and cloud environments and clustered environments. It can be managed by someone who has little experience on AIX, which is kind of nice when the poor AIX sys admin has skipped the country or left in mysterious circumstances around the same time as the system disaster has taken place. In version 6.0 of DoubleTake there is a single Unified Console, so you can protect a large number of enterprise servers using a single pane of glass. You can organise servers by data centre, or do operations on many servers at the same time. You can have protection groups, which I take to mean a policy-based recovery environment.
Integration via Unified Console
The second feature which caught my eye was the integration with other platforms. I'm seeing more and more that the technical people are getting scarcer, and even those who have the ability to learn a product generally don't have the time. So it's good not to have a steep learning curve to recover your systems. From the one Unified Console you can protect and migrate your workloads between disparate environments.
Back In Business
I love the fact that the protection is platform agnostic, and the protection offered is not just at the disk or data level. Let's face it, recovering a single production database isn't usually enough to recover all you need for an application to be up and running. Even recovering an OS, or a server or a SAN subsystem, is not enough to get users working again. You really want to recover an IT ecosystem. I like DoubleTake's whole-of-environment approach. It's similar to the way that large migration projects go in waves, moving the interdependent components together. For example, there may be little point moving a database across the ocean if the app server which connects to it is going to have devastatingly slow response times. So you may try to move them together. DoubleTake lends itself to this approach when it's doing system recovery.
You can still recover just the AIX production environment, or even just a single file system. It really depends on what your business recovery needs are. That comes down to two questions: "how much time will it take to recover?" and "How much data will I have lost once we're back up again?" The answer to both of those questions needs to be as close to zero as possible.
I'm also impressed that the data replication gets right down to the byte level. Snapshots wouldn't be sufficient if you can't afford to lose any data between snapshots, so byte-level replication is valuable, especially in some industries where 15 minutes of lost data could put you out of business.
Any to any replication
Perhaps "Any to any" replication (my expression) is a bit of an exaggeration. Even so, the DoubleTake suite of products (it goes beyond RecoverNow for AIX) allows you to recover a physical, standalone server to a target server that is virtualised, or even one that is in the cloud. That's pretty impressive, and adds a lot of value in my eyes. Let me indulge in a plug for my favourite OS. AIX itself has a great history here, with the ability to restore a mksysb from a very old source server to a target server that is virtualised and the latest architecture. Alright, it's not absolutely true that any old AIX system can be restored onto any new Power System, but AIX has great strengths with backward compatibility, and RecoverNow seems to complement that great strength of AIX.
I've even seen RecoverNow used to replicate from multiple standalone Power 5 servers to a set of Workload Partitions (WPARs) on a target Power 7 AIX 7.1 LPAR. It's a great tool for migrations from older systems onto newer ones.
Those are all pluses for me, as the suite of products offered release you from the obligation of having identical source and target platforms or storage.
So, was there anything I didn't like? I'd say RecoverNow is a great product provided your AIX system is configured well (particularly the file systems).
Preparing the Target System
First, RecoverNow for AIX requires the target system to have its OS built, and also to have volume groups created, ready to receive the logical volumes from the production (source) server. If you're used to working on warm DRs where you do a restore from a mksysb backup, then create data volume groups and file systems, then the RecoverNow procedure is simpler. Still, it does require a certain expertise in setting up the target system, as RecoverNow replicates from a working OS to a working OS. So you have to have two working operating systems to get started. In a certain sense, I'd say this is unavoidable. You can't just replicate at the Storage level, for example, and then hope the target system will start with correct IP addresses etc.
It does take a certain amount of expertise to recover a mksysb onto a target system, and then create volume groups for data. Technically, this is not a fault or a feature of RecoverNow. As I said, you have to set up the target system before installing RecoverNow.
RecoverNow then creates the logical volumes and file systems in that volume group. This may not give you sufficient control over the target VG's logical volume layout. For example, if you're using software mirroring or striping, you may not be able to control that on the target server. I'd expect this would mainly be a problem in small environments that use internal disk, and even then it may not be much of an issue.
If the source OS were to change significantly, for example because of an upgrade to the AIX release level, then ideally those changes would be made at the same time on the target system. That may involve reinitialising the replication from scratch. A big problem? Perhaps not, but you need to be aware of it.
The Whole File System is Replicated
One factor which struck me about RecoverNow is that, quite like the AIX Logical Volume Manager (LVM), it works at the logical volume level. As I understand it, you specify a while file system (or a set of file systems) to recover. That's usually no problem at all, since you generally would organise your file systems according to function. For example, you'd have database data file systems separate from redo logs, and application file systems separate again. With RecoverNow you can group a few file systems together (even from different logical partitions), so that you can restore the entirety to a point in time.
However, if you have to implement the product in an AIX environment that is poorly laid out, you'd have some work to do to fix the environment first. It would be unfair to blame the RecoverNow product for a poorly designed source system, but if your AIX source system is in a bit of a mess, RecoverNow won't be able to dig you out of that hole. For example, if all your data is sitting in subdirectories of /usr, and not in separate data file systems, then you'd be in for a bit of remediation work before you could make good use of RecoverNow. As you would be aware, it makes sense to keep data file systems separate and out of rootvg, which ought to be used primarily - if not exclusively - for the operating system.
As RecoverNow works at the Logical Volume / File System level, it would be a drawback if you only wanted to replicate a single directory that is part of a much larger File System. Ordinarily this would be easily prevented by separating data into file systems that you do want to replicate, and ones you don't.
When I say the whole file system is replicated, that's true with regards the initial replication, but from that point on, replication happens at the byte level. This may significantly reduce the network load in comparison with replication at the block level for example, or doing the replication by disk or replicating entire files when only a small part of it has been changed.
Planning for JFS logs
Another point to be aware of is that as the entire logical volume gets replicated, for JFS/JFS2 file systems, you need to replicate the JFS (or JFS2) log. That means either creating a separate JFS/JFS2 log for each file system on the source system, or (better still), create the file system with INLINE logs. On AIX, however, you can't convert a file system from using an external JFS2 log to an INLINE one. You have to create an entirely new file system and move the data from the original file system into the new INLINE-logged file system.
Most people think of Disaster Recovery / Business Continuity in terms of how to minimise business impact when your data centre is replaced by a hole in the ground. One of the benefits that gets less airtime is the use of recovery tools for data or system migration. In the last couple of months, I've been made aware of a project to move an entire data centre (or, more correctly, an entire set of data), from Australia to the US, with minimal downtime. I've also heard of a project for a different company who want to move their data from the US to Australia. Yes, I know what you're thinking, but no, the two companies can't just get everyone to stop logging in across the ocean and start using the local data centre.
Now if you have to migrate data quickly enough to minimise downtime, you can't rely on tapes or shipping physical servers across to a different continent. You need to be able to switch over to the alternate system quickly, and have the data as up to date as it can be. Here is where RecoverNow for AIX is in its element. Sure, you still have to do the initial replication of data, and that may take a long time, but once you're there, you can arrange a short swap to the failover system (the target system, perhaps in a different hemisphere), and then you're (literally) in business.
I've still got a lot to learn about this product. As Disaster Recovery is such a hot topic these days, I'm sure RecoverNow for AIX will generate a lot of interest, especially for system migrations.
As you can see, RecoverNow for AIX certainly has a lot going for it, and if your AIX (or other OS!) recovery procedures take a long time (or is practically non-existent), then it's worth getting in touch with Vision Solutions to see if they can offer some help before the disaster happens. It does depend on a well-laid out file system structure, and requires a certain amount of expertise in preparing the target OS before you can install RecoverNow for AIX. Once it's installed (quick) and the data has completed its first replication (potentially days), it looks to be a very reliable system recovery solution.