I wonder how many businesses have a serious Disaster Recovery plan. I have been thinking about this as I have recently been invited to do a test Disaster Recovery run using Vision Solutions' software, DoubleTake RecoverNow for AIX. I'll post some blogs about how it looks. I'm keen to find DR solutions that are simple and comprehensive. Sadly, many existing plans are neither. For plenty of companies, the DR plan consists of little more than a blind hope that nothing's gonna go wrong.
What does your DR plan look like?
How many DR plans have a Recovery Point Objective (RPO) that would genuinely meet their business needs today? How many managers even know what an RPO is? Basically, it's how much data your business is willing to lose in the event of a system failure. If you're happy to recover to last night's (hopefully successful) backup, then good luck to you. If the failure happened before the backup went out off site, and you no longer have access to the production site, you will just have to get back to the previous good backup. Try telling the business owners that all of today's (and maybe yesterday's) transactions, reports, emails are gone forever.
What about the time it takes to recover? The Return To Operation (RTO) is essentially how much time you're willing to wait to get your systems back into operation. So, if the restore time is (let's say) three days, then a system that goes down on Tuesday afternoon may not be back until Friday afternoon. And when it does get fully recovered, it's back to Monday night's data (assuming that you have a good backup).
DR Plan: Current, Complete, Comprehensive
With these thoughts in mind, I was pleased to read an interview with Richard Dolewski, vice president of business continuity services for Velocity Technology Solutions. The interview was published on the Power IT Pro, and you can find it here. As you probably know I am a regular contributor to Power IT Pro.
Richard Dolewski says that every company should have a disaster recovery plan and able to ask the following questions:
- Is our plan current, complete, and comprehensive? Has the plan been tested within the past 12 months?
Very good questions. I'd go further. I have been involved in many disaster recovery test runs, and I have found they often fail, either because they failed to recover critical systems, or they did not make the recovery point objectives within the recovery time. Better to find that out now rather than wait until the real disaster.
A New Approach to Burning DVDs
As well as several disaster recovery tests I can also claim to have been involved in some real-life system disasters. I can't claim credit for causing the disasters themselves but I did have a hand in putting the pieces back together again. One time, a large commercial bakery's oven caught fire, bringing with it the factory and the office which was part of the same building. We were able to recover the system, although there was a lot of toast that day.
It is vital to plan and test your Disaster Recovery, or else you will be toast.
I would add that if the test has failed it really needs to be rerun. Similarly if you cheat by logging into production systems during the test to get files or information you should already have then that is cheating. You don't want to create false expectations that your disaster recovery's plan is more comprehensive or effective than it really is.
In that Power IT Pro interview, Richard Dolewski pointed out that there are many different aspects of company operations today organisations need to be able to recover much more quickly than ever before. They may have an RTO (Return to Operation) of six hours instead of several days.
You learn a lot from a disaster recovery test run. You learn even more from a true disaster. You realise that the company infrastructure goes far beyond a single system. Your DR plan shouldn't just be there to restore your ERP system. You need to email, perhaps some file shares, transaction servers, web servers, and maybe even a telephone system. There is so much that contributes to the ecosystem of the company and if a single essential link is missing in that chain your RTO is blown and maybe your career (and the company) with it.
Plan, plan and Simplify
Planning is everything. Well, perhaps not everything: you need a lot of money as well. You need a good view of the infrastructure which you can rely on to continue running your business to recover after that major system disaster.
I've recently been asked to work on some of the Vision Solutions software such as DoubleTake RecoverNow for AIX. (Full disclosure: Vision Solutions' Australian distributor, Availability Solutions, have asked me to assist with contract work for their Power Systems-based clients). From what I have seen of the Vision Solutions software, it is storage agnostic, OS agnostic (no need to have the same OS versions on source and target) and it uses Real-Time Replication. In other words, it ought to be able to recover up to the last byte before the production system went AWOL.
AIX DoubleTake RecoverNow for AIX has a number of features which look good. We'll see how they work in the wild.
In the coming weeks I will look at the Vision Solutions offerings and compare it with other DR / Business Recovery approaches I have been involved with.
You may already have experience with other products such as Storix, Tivoli Storage Manager, or a combination of scripts, documents and good old-fashioned experience.
Whatever strategy you use for keeping your business alive, follow the advice of RIchard Dolewski: your DR plan needs to be complete, comprehensive, current and tested recently.
It should be clear, easy to implement and sufficiently robust to save your company when its vital infrastructure decides to take an unexpected holiday.