Improve your ability to respond and recover from disruptive events
What is a disaster recovery plan?
A disaster recovery plan is a formal document created by an organization that contains detailed instructions on how to respond to unplanned incidents such as natural disasters, power outages, cyber attacks and any other disruptive events. The plan contains strategies on minimizing the effects of a disaster, so an organization will continue to operate – or quickly resume key operations. It is more focused than a business continuity plan and does not necessarily cover all contingencies for business processes, assets, human resources and business partners.
A successful disaster recovery plan typically addresses all types of operation disruption and not just the major natural or man-made disasters that make a location unavailable. Disruptions can include power outages, telephone system outages, temporary loss of access to a facility due to bomb threats, a "possible fire" or a low-impact non-destructive fire, flood or other event. A plan should be organized by type of disaster and location. It must contain scripts (instructions) that can be implemented by anyone.
Before the 1970s, most organizations only had to concern themselves with making copies of their paper-based records. Disaster recovery planning gained prominence during the 1970s as businesses began to rely more heavily on computer-based operations. At that time, most systems were batch-oriented mainframes. Another offsite mainframe could be loaded from backup tapes, pending recovery of the primary site.
In 1983 the U.S. government mandated that national banks must have a testable backup plan. Many other industries followed as they understood the significant financial losses associated with long-term outages.
By the 2000s businesses had become even more dependent on digital online services. With the introduction of big data, cloud, mobile and social media, companies had to cope with capturing and storing massive amounts of data at an exponential rate. Disaster recovery plans had to become much more complex to account for much larger amounts of data storage from a myriad of devices. The advent of cloud computing in the 2010s helped to alleviate this disaster recovery complexity by allowing organizations to outsource their disaster recovery plans, also known as disaster recovery as a service (DRaaS).
Another current trend that emphasizes the importance of a detailed disaster recovery plan is the increasing sophistication of cyber attacks. Industry statistics show that many attacks stay undetected for well over 200 days. With so much time to hide in a network, attackers can plant malware that finds its way into the backup sets – infecting even recovery data. Attacks may stay dormant for weeks or months, allowing malware to propagate throughout the system. Even after an attack is detected, it can be extremely difficult to remove malware that is so prevalent throughout an organization.
Why is a disaster recovery plan important?
On average, an infrastructure failure can cost $100,000 an hour and a critical application failure can cost $500,000 to $1 million per hour ⁽¹⁾. Today, digital business channels represent a greater market share and can drive revenue generation. Apart from revenue and productivity losses, customers do not tolerate downtime. They will quickly abandon a business and use a competitive firm to meet their needs.
Other key reasons why a business would want a detailed and tested disaster recovery plan include:
- To minimize interruptions to the normal operations.
- To limit the extent of disruption and damage.
- To minimize the economic impact of the interruption.
- To establish alternative means of operation in advance.
- To train personnel with emergency procedures.
- To provide for smooth and rapid restoration of service.
1. “DevOps and the Cost of Downtime: Fortune 1000 Best Practice Metrics Quantified,” Stephen Elliot, IDC, December 2014, IDC #253155
Using consulting, software and cloud-based solutions for a business continuity plan
Many organizations struggle to evolve their disaster recovery plan strategies quickly enough to address today’s hybrid-IT environments and complex business operations. In an always-on, 24/7-world, an organization can gain a competitive advantage – or lose market share – depending on how quickly it can recover from a disaster and recover core business services.
Some organizations use external disaster recovery and business continuity consulting services to address a company’s needs for assessments, planning and design, implementation, testing and full resiliency program management.
There are proactive services, such as IBM IT Infrastructure Recovery Services to help businesses identify risks and ensure they are prepared to detect, react and recover from a disruption.
With the growth of cyber attacks, companies are moving from a traditional/manual recovery approach to an automated and software-defined resiliency approach. The IBM Cyber Resilience Services approach uses advanced technologies and best practices to help assess risks, prioritize and protect business-critical applications and data. These services can also help business rapidly recover IT during and after a cyberattack.
Other companies turn to cloud-based backup services, such as IBM Disaster Recovery as a Service (DRaaS) to provide continuous replication of critical applications, infrastructure, data and systems for rapid recovery after an IT outage. There are also virtual server options, such as IBM Cloud Virtualized Server Recovery to protect critical servers in real-time. This enables rapid recovery of your applications at an IBM Resiliency Center to keep businesses operational during periods of maintenance or unexpected downtime.
For a growing number of organizations, the answer is with resiliency orchestration, a cloud-based approach that uses disaster recovery automation and a suite of continuity-management tools designed specifically for hybrid-IT environments. For instance, IBM Resiliency Orchestration helps protect business process dependencies across applications, data and infrastructure components. It increases the availability of business applications so that companies can access necessary high-level or in-depth intelligence regarding RPO, RTO and the overall health of IT continuity from a centralized dashboard.
Key features of an effective disaster recovery plan
The objective of a disaster recovery plan is to ensure that an organization can respond to a disaster or other emergency that affects information systems – and minimize the effect on business operations. IBM has created a template to produce a basic disaster recovery plan. The following are the suggested steps as found in the template. Once you have prepared the information, it is recommended that you store the document in a safe, accessible location off site.
Step 1: Major goals The first step is to broadly outline the major goals of a disaster recovery plan.
Step 2: Personnel Record your data processing personnel. Include a copy of the organization chart with your plan.
Step 3: Application profile List applications and whether they are critical and if they are a fixed asset.
Step 4: Inventory profile List the manufacturer, model, serial number, cost and whether each item is owned or leased.
Step 5: Information services backup procedures Include information such as: “Journal receivers are changed at ________ and at ________.” And: “Changed objects in the following libraries and directories are saved at ____.”
Step 6: Disaster recovery procedures For any disaster recovery plan, these three elements should be addressed:
- Emergency response procedures to document the appropriate emergency response to a fire, natural disaster, or any other activities in order to protect lives and limit damages.
- Backup operations procedures to ensure that essential data processing operational tasks can be conducted after the disruption.
- Recovery actions procedures to facilitate the rapid restoration of a data processing system following a disaster.
Step 7: Recovery plan for mobile site The plan should include a mobile site setup plan, a communication disaster plan (including the wiring diagrams) and an electrical service diagram.
Step 8: Recovery plan for hot site An alternate hot site plan should provide for an alternative (backup) site. The alternate site has a backup system for temporary use while the home site is being reestablished.
Step 9: Restoring the entire system To get your system back to the way it was before the disaster, use the procedures on recovering after a complete system loss in Systems management: Backup and recovery.
Step 10: Rebuilding process The management team must assess the damage and begin the reconstruction of a new data center.
Step 11: Testing the disaster recovery plan In successful contingency planning, it is important to test and evaluate the plan regularly. Data processing operations are volatile in nature, resulting in frequent changes to equipment, programs and documentation. These actions make it critical to consider the plan as a changing document.
Step 12: Disaster site rebuilding This step should include a floor plan of the data center, the current hardware needs and possible alternatives – as well as the data center square footage, power requirements and security requirements.
Step 13: Record of plan changes Keep your plan current. Keep records of changes to your configuration, your applications and your backup schedules and procedures.
Blogs about disaster recovery plans
More resources for disaster recovery plans
Use this template to create a disaster recovery plan.
See how to automate your IT recovery management, simplifying the disaster recovery process.
In this Forbes article, learn how executives are addressing critical application risks arising from complex, hybrid-IT environments.
Learn how to achieve zero downtime in four steps, as outlined in this infographic.
Read this white paper and discover the current market trends and that are influencing the need for an always-on platform.
In this IDC white paper, learn why it’s important that as businesses adopt new technologies, their protection strategies must change to keep pace.
Read this analyst report and find out why IBM is positioned as furthest in completeness of vision.
In this brief video, see how you can help your business identify risks and ensure you’re ready to detect, react, and recover from a disruption.