Backup, recovery, and restart

Although high availability of data is a goal for all Db2 subsystems, unplanned outages are difficult to avoid entirely. A good backup, recovery, and restart strategy, however, can reduce the elapsed time of an unplanned outage.

To reduce the probability and duration of unplanned outages, you should periodically back up and reorganize your data to maximize the availability of data to users and programs.

Many factors affect the availability of the databases. Here are some key points to be aware of:

  • You should understand the options of utilities such as COPY and REORG.
    • You can recover online such structures as table spaces, partitions, data sets, a range of pages, a single page, and indexes.
    • You can recover table spaces and indexes at the same time to reduce recovery time.
    • With some options on the COPY utility, you can read and update a table space while copying it.
  • I/O errors have the following affects:
    • I/O errors on a range of data do not affect availability to the rest of the data.
    • If an I/O error occurs when Db2 is writing to the log, Db2 continues to operate.
    • If an I/O error is on the active log, Db2 moves to the next data set. If the error is on the archive log, Db2 dynamically allocates another data set.
  • Documented disaster recovery methods are crucial in the case of disasters that might cause a complete shutdown of your local Db2 subsystem.
  • If Db2 is forced to a single mode of operations for the bootstrap data set or logs, you can usually restore dual operation while Db2 continues to run.

Db2 provides extensive methods for recovering data after errors, failures, or even disasters. You can recover data to its current state or to an earlier state. The units of data that can be recovered are table spaces, indexes, index spaces, partitions, and data sets. You can also use recovery functions to back up an entire Db2 subsystem or data sharing group.

Development of backup and recovery procedures is critical in preventing costly and time-consuming data losses. In general, ensure that the following procedures are in place:

  • Create a point of consistency.
  • Restore system and data objects to a point of consistency.
  • Back up and recover the Db2 catalog and your data.
  • Recover from out-of-space conditions.
  • Recover from a hardware or power failure.
  • Recover from a z/OS component failure.

In addition, your site should have a procedure for recovery at a remote site in case of disaster.

Specific problems that require recovery might be anything from an unexpected user error to the failure of an entire subsystem. A problem can occur with hardware or software; damage can be physical or logical. Here are a few examples:

  • If a system failure occurs, a restart of Db2 restores data integrity. For example, a Db2 subsystem or an attached subsystem might fail. In either case, Db2 automatically restarts, backs out uncommitted changes, and completes the processing of committed changes.
  • If a media failure (such as physical damage to a data storage device) occurs, you can recover data to the current point.
  • If data is logically damaged, the goal is to recover the data to a point in time before the logical damage occurred. For example, if Db2 cannot write a page to disk because of a connectivity problem, the page is logically in error.
  • If an application program ends abnormally, you can use utilities, logs, and image copies to recover data to a prior point in time.

Recovery of Db2 objects requires adequate image copies and reliable log data sets. You can use a number of utilities and some system structures for backup and recovery. For example, the REPORT utility can provide some of the information that is needed during recovery. You can also obtain information from the bootstrap data set (BSDS) inventory of log data sets.