Preparations for disaster recovery

If a Db2 computing center is totally lost, you can recover on another Db2 subsystem at a recovery site. To do this, you must regularly back up the data sets and the log for the recovery subsystem. As with all data recovery operations, the objectives of disaster recovery are to minimize the loss of data, workload processing (updates), and time.

You can provide shorter restart times after system failures by using the installation options LIMIT BACKOUT and BACKOUT DURATION. These options postpone the backout processing of long-running URs during Db2 restart.

For data sharing environments, you can use the LIGHT(YES) or LIGHT(NOINDOUBTS) parameter to quickly bring up a Db2 member to recover retained locks. This option is not recommended for refreshing a single subsystem and is intended only for a cross-system restart for a system that has inadequate capacity to sustain the Db2 IRLM pair. Restart light can be used for normal restart and recovery.

For data sharing, you need to consider whether you want the Db2 group to use light mode at the recovery site. A light start might be desirable if you have configured only minimal resources at the remote site. If this is the case, you might run a subset of the members permanently at the remote site. The other members are restarted and then directly shutdown.

It is important that the disaster recovery process does not convert any objects to or from 10 byte extended RBA or LRSN format during the recovery and rebuild process. If any objects are in still in the 6-byte format, contact IBM Support before you begin a disaster recovery for guidance on how to temporarily disable object conversion, and do not specify the RBALRSN_CONVERSION keyword in the control statements. After the disaster recovery is complete, you can re-enable object conversion based on the guidance provided by IBM Support.

To perform a light start at the remote site:

  1. Start the members that run permanently with the LIGHT(NO) option. This is the default.
  2. Start other members in light mode. The members started in light mode use a smaller storage footprint. After their restart processing completes, they automatically shut down. If ARM is in use, ARM does not automatically restart the members in light mode again.
  3. Members started with LIGHT(NO) remain active and are available to run new work.

Several levels of preparation for disaster recovery exist:

  • Prepare the recovery site to recover to a fixed point in time.

    For example, you could copy everything weekly with a DFSMSdss volume dump (logical), manually send it to the recovery site, and then restore the data there.

  • For recovery through the last archive, copy and send the following objects to the recovery site as you produce them:
    • Image copies of all catalog, directory, and user page sets
    • Archive logs
    • Integrated catalog facility catalog EXPORT and list
    • BSDS lists

      With this approach you can determine how often you want to make copies of essential recovery elements and send them to the recovery site.

    After you establish your copy procedure and have it operating, you must prepare to recover your data at the recovery site.

  • Use the log capture exit routine to capture log data in real time and send it to the recovery site.