Six tiers of solutions for off-site recovery

One blueprint11 for recovery planning describes a scheme consisting of six tiers of off-site recoverability (tiers 1--6), with a seventh tier (tier 0) that relies on local recovery only, with no off-site backup. The tiers cover a full range of recovery options, ranging from no data moved off-site to full off-site copies with no loss of data. The following figures and text describe them from a CICS® perspective.

Tier 0--no off-site data

Figure 40 summarizes the tier 0 solution.

Figure 40. Disaster recovery tier 0: no off-site backup
 This diagram illustrates the main components of a single data center, with no backup shown. Data is not sent offsite, meaning recovery can only be performed uisng onsite local records (if they survive).

Tier 0 is defined as having no requirements to save information off-site, establish a backup hardware platform, or develop a disaster recovery plan. Tier 0 is the no-cost disaster recovery solution.

Any disaster recovery capability would depend on recovering on-site local records. For most true disasters, such as fire or earthquake, you would not be able to recover your data or systems if you implemented a tier 0 solution.

Tier 1--physical removal

Figure 41 summarizes the tier 1 solution.

Figure 41. Disaster recovery tier 1: physical removal
 This diagram illustrates the main components of a single data center, with no backup data center, but data is sent offsite (shown here by road transport). The other points made by this figure are discussed in the following text.

Tier 1 is defined as having:

Your disaster recovery plan has to include information to guide the staff responsible for recovering your system, from hardware requirements to day-to-day operations.

The backups required for off-site storage must be created periodically. After a disaster, your data can only be as up-to-date as the last backup--daily, weekly, monthly, or whatever period you chose--because your recovery action is to restore the backups at the recovery site (when you have one).

This method may not meet your requirements if you need your online systems to be continuously available.

The major benefit of tier 1 is the low cost. The major costs are the storage site and transportation.

The drawbacks are:

Tier 1

Tier 1 provides a very basic level of disaster recovery. You will lose data in the disaster, perhaps a considerable amount. However, tier 1 allows you to recover and provide some form of service at low cost. You must assess whether the loss of data and the time taken to restore a service will prevent your company from continuing in business.

Tier 2--physical removal with hot site

Figure 42 summarizes the tier 2 solution.

Figure 42. Disaster recovery tier 2: physical removal to a ‘hot’ standby site
 This diagram illustrates the main components of a data center, plus a remote backup data center. As in the previous Tier 1 illustration, data is sent offsite by road transport. The standby site is obviously some distance away, because in the event of disaster recovery being necesary, data is sent by air from the warehouse where it is stored. It summarises the main approach of this solution as: backup data kept off-site; procedures and inventory kept off-site. The other points made by this figure are discussed in the following text.

Tier 2 is similar to tier 1. The difference in tier 2 is that a secondary site already has the necessary hardware installed, which can be made available to support the vital applications of the primary site. The same process is used to backup and store the vital data; therefore the same availability issues exist at the primary site as for tier 1.

The benefits of tier 2 are the elimination of the time it takes to obtain and setup the hardware at the secondary site, and the ability to test your disaster recovery plan.

The drawback is the expense of providing, or contracting for, a ‘hot’ standby site.

Tier 2

Tier 2, like tier 1, provides a very basic level of disaster recovery. You will lose data in the disaster, perhaps a considerable amount. However, tier 2 allows you to recover and provide some form of service at low cost and more rapidly than tier 1. You must assess whether the loss of data and the time taken to restore a service will prevent your company from continuing in business.

Tier 3--electronic vaulting

Figure 43 summarizes the tier 3 solution.

Figure 43. Disaster recovery tier 3: electronic vaulting
 This illustration shows two data centers--the primary and a remote standby site, with a telecommunications link between the two. The other points made by this figure are discussed in the following text.

Tier 3 is similar to tier 2. The difference is that data is electronically transmitted to the hot site. This eliminates physical transportation of data and the off-site storage warehouse. The same process is used to backup the data, so the same primary site availability issues exist in tier 3 as in tiers 1 and 2.

The benefits of tier 3 are:

The drawbacks are the cost of reserving the DASD at the hot standby site, and that you must have a link to the hot site, and the required software, to transfer the data.

Procedures and documentation still have to be available at the hot site, but this can be achieved electronically.

Tier 3

Tier 3, like tiers 1 and 2, provides a basic level of disaster recovery. You will lose data in the disaster, perhaps a considerable amount of data. The advantage of tier 3 is that you should be able to provide a service to your users quite rapidly. You must assess whether the loss of data will prevent your company from continuing in business.

Tier 0-3 solutions

Figure 44 summarizes the solutions for tiers 0 through 3, and shows the approximate time required for a recovery with each tier of solution.

Figure 44. Disaster recovery tier 0-3: summary of solutions
 This diagram summarizes the tier 0 through 3 solutions, indicating the main points in terms of the type of data backup and the time to perform recovery. Tier 0 offers no recovery, and uses DFSMSdss only. Tier 1, using DFSMSdss and DFSMShsm with aggregate backup and recovery support (ABARS), involves transporting data and because there is no stanbdby site, recovery can take a week or more. Tier 2, using DFSMSdss and DFSMShsm with aggregate backup and recovery support (ABARS), also involves transporting data, but because there is a hot standby site, recovery can be only one day or more. Tier 3, using enterprise systems connection (ESCON) architecture and concurrent copy to hot standby site, offers recovery of less than a day.

Tiers 0 to 3 cover the disaster recovery plans of many CICS users. With the exception of tier 0, they employ the same basic design using a point-in-time copy of the necessary data. That data is then moved off-site to be used when required after a disaster.

The advantage of these methods is their low cost.

The disadvantages of these methods are:

Tier 4--active secondary site

Figure 45 summarizes the tier 4 solution.

Figure 45. Disaster recovery tier 4: active secondary site
 This diagram illustrates two fully operational data centers, each a complete replica of the other. Each site backs up the other, and recovery relies on the continuous transmission of data over ESCON and VTAM links. Recovery is indicated as taking from minutes up to hours. The other points made by this diagram are discussed in the text.

Tier 4 closes the gap between the point-in-time backups and current online processing recovery. Under a tier 4 recovery plan, site one acts as a backup to site two, and site two acts as a backup to site one.

Tier 4 duplicates the vital data of each system at the other's site. You must transmit image copies of data to the alternate site on a regular basis. You must also transmit CICS system logs and forward recovery logs, after they have been archived. Similarly, you must transmit logs for IMS™ and DB2 subsystems. Your recovery action is to perform a forward recovery of the data at the alternate site. This allows recovery up to the point of the latest closed log for each subsystem.

You must also copy to the alternate site other vital data that is necessary to run your system. For example, you must copy your load libraries and JCL. You can do this on a regular basis, or when the libraries and JCL change.

The benefits of tier 4 are:

The drawbacks are:

Tier 4

Tier 4 provides a more advanced level of disaster recovery. You will lose data in the disaster, but only a few minutes- or hours-worth. You must assess whether the loss of data will prevent your company from continuing in business, and what the cost of lost data will be.

Tier 5--two-site, two-phase commit

Figure 46 summarizes the tier 5 solution.

Figure 46. Disaster recovery tier 5: two site, two-phase commit
As in the tier 4 diagram, this also illustrates two fully operational data centers, each a complete replica of the other. Each site backs up the other, and recovery relies on the continuous transmission of data over ESCON and VTAM links, with data being synchronized by remote two-phase commit. Recovery takes minutes only, with only the data in-flight being lost. The other points made by this diagram are discussed in the following text.

Tier 5, remote two-phase commit, is an application-based solution to provide high currency of data at a remote site. This requires partially or fully dedicated hardware at the remote site to keep the vital data in image format and to perform the two-phase commit. The vital data at the remote site and the primary site is updated or backed out as a single unit of work (UOW). This ensures that the only vital data lost would be from transactions that are in process when the disaster occurs.

Other data required to run your vital application has to be sent to the secondary site as well. For example, current load libraries and documentation has to be kept up-to-date on the secondary site.

The benefits of tier 5 are fast recovery using vital data that is current. The drawbacks are:

Tier 5

A Tier 5 solution is appropriate for a custom-designed recovery plan with special applications. Because these applications must be designed to use this solution, it cannot be implemented at most CICS sites.

Tier 6--minimal to zero data loss

Figure 47 summarizes the tier 6 solution.

Figure 47. Disaster recovery tier 6: minimal to zero data loss
 This diagram illustrates the same two data centers as in tiers 4 and 5, with the same physical coinnectivity. It summarizes the approach as: updating local and remote copies of the data; using dual online storage; and having network switching capability. It summarizes the recovery as being instantaneous but the most expensive, with non-disruptive terminal switching. Other points regarding this scenario are explained in the following text.

Tier 6, minimal to zero data loss, is the ultimate level of disaster recovery.

There are two tier 6 solutions, one hardware-based and the other software-based. For details of the hardware and software available for these solutions, see Peer-to-peer remote copy (PPRC) and extended remote copy (XRC) (hardware) and Remote Recovery Data Facility (software).

The hardware solution involves the use of IBM 3990-6 DASD controllers with remote and local copies of vital data. There are two flavors of the hardware solution: (1) peer-to-peer remote copy (PPRC), and (2) extended remote copy (XRC).

The software solution involves the use of Remote Recovery Data Facility (RRDF). RRDF applies to data sets managed by CICS file control and to the DB2, IMS, IDMS, CPCS, ADABAS, and SuperMICR database management systems, collecting real-time log and journal data from them. RRDF is supplied by E-Net Corporation and is available from IBM as part of the IBM Cooperative Software Program.

The benefits of tier 6 are:

The drawbacks are the cost of running two sites and the communication overhead. If you are using the hardware solution based on 3990-6 controllers, you are limited in how far away your recovery site can be. If you use PPRC, updates are sent from the primary 3990-6 directly to the 3990-6 at your recovery site using enterprise systems connection (ESCON®) links between the two 3990-6 devices. The 3990-6 devices can be up to 43 km (26.7 miles) apart subject to quotation.

If you use XRC, the 3990-6 devices at the primary and recovery sites can be attached to the XRC DFSMS/MVS host at up to 43 km (26.7 miles) using ESCON links (subject to quotation). If you use three sites, one for the primary 3990, one to support the XRC DFSMS/MVS host, and one for the recovery 3990, this allows a total of 86 km (53.4 miles) between the 3990s. If you use channel extenders with XRC, there is no limit on the distance between your primary and remote site.

For RRDF there is no limit to the distance between the primary and secondary sites.

Tier 6

Tier 6 provides a very complete level of disaster recovery. You must assess whether the cost of achieving this level of disaster recovery is justified for your company.

Tier 4-6 solutions

Figure 48 summarizes the solutions for tiers 4 through 6, and shows the approximate time required for a recovery with each tier of solution.

Figure 48. Disaster Recovery Tier 4-6: Summary of Solutions
I This diagram summarizes tiers 4, 5 and 6. Tier 4 is based on an active secondary site with host-to-host file transfer, and recovery taking lees than 1 day. Tier 5 is application-based, involving products such as CICS and DB2, using two-phase commit, with recovery taking less than an hour. Tier 6, based on ESCON with peer-to-peer remote copy provdes instant recovery.

This summary shows the three tiers and the various tools for each that can help you reach your required level of disaster recovery.

Tier 4 relies on automation to send backups to the remote site. NetView® provides the ability to schedule work in order to maintain recoverability at the remote site.

Tier 5 relies on the two-phase commit processing supported by various database products and your application program’s use of these features. Tier 5 requires additional backup processing to ensure that vital data, other than databases, is copied to the remote system.

Tier 6 is divided into two sections: software solutions for specific access methods and database management systems, and hardware solutions for any data.

RRDF can provide very high currency and recoverability for a wide range of data. However, it does not cover all the data in which you may be interested. For example, RRDF does not support load module libraries.

The 3990-6 hardware solution is independent of the data being stored on the DASD. PPRC and XRC can be used for databases, CICS files, logs, and any other data sets that you need to ensure complete recovery on the remote system.


11.
In a paper presented to IBM® SHARE, prepared by the SHARE automated remote site recovery task force.

[[ Contents Previous Page | Next Page Index ]]