DB2 10.5 for Linux, UNIX, and Windows

Backup and restore operations in a DB2 pureScale environment

In a DB2® pureScale® environment, issuing a single BACKUP DATABASE or RESTORE DATABASE command on any member initiates a backup or restore operation on behalf of all members.

Because a DB2 pureScale environment can have only one database partition, a backup operation has only one set of data to process and produces only one backup image for the entire group. In the case of the other members, only the database metadata and transaction logs must be processed, and those are included in the single backup image.

A backup image includes data from the specified table spaces and any required metadata and configuration information for all currently defined members. You do not have to perform additional backup operations on any other member in the DB2 pureScale instance. Moreover, you require only a single RESTORE DATABASE command to restore the database and the member-specific metadata for all members. You do not have to perform additional restore operations on any other member to restore the cluster. The time stamps of consecutive backup images are unique, increasing values, regardless of which member produced them.

All members must be consistent before an offline backup operation can be attempted. Only one offline backup operation can run at one time, because the backup utility acquires super-exclusive access to the database across all members. Although concurrent online backup operations are supported, different backup operations cannot copy the same table spaces simultaneously, and must wait their turn.

All of the reading of data and metadata from the database and all of the writing to a backup image takes place on a single member. Interactions between the backup or restore operation and other members are limited to copying or updating database metadata (such as table space definitions, the log file header, and the database configuration).
Note: Before taking a backup, you need to ensure that the log archiving path is set to a shared directory so that all the members are able to access the logs for subsequent rollforward operations. If the archive path is not accessible from the member on which the rollforward is being executed, SQL1273N is returned. The following command is an example of how to set the log path to the shared directory:
db2 update db cfg using logarchmeth1 
      DISK:/db2fs/gpfs1/svtdbm5/svtdbm5/ArchiveLOGS 
(where gpfs1 is the shared directory for the members and ArchiveLOGS is the actual directory that archives the logs.
Online backup operations can proceed successfully if another member is offline, goes offline, or comes back online while the operation is executing (Table 1). Although database restore operations are not affected by the state of other members, backup operations might have to wait for a short duration while member crash recovery is completed on an offline and inconsistent member.
Table 1. Effect of the state of other members in a DB2 pureScale instance on database backup and restore operations
Operation State of other members
Offline and consistent Offline and inconsistent
Online backup The backup operation succeeds. The other member cannot become active while the backup utility is accessing the log file header (LFH) near the beginning of the backup operation or while the backup utility is accessing the log stream near the end of the backup operation. The backup operation succeeds, but it must wait for member crash recovery to be completed and for the other member to become either active or consistent. The other member cannot become active while the backup utility is accessing the LFH near the beginning of the backup operation or while the backup utility is accessing the log stream near the end of the backup operation.
Restore The restore operation is completed normally. The restore operation is completed normally.

Image and archive naming

File names for backup images that you create on disk consist of a concatenation of several elements, separated by periods:
  DB_alias.Type.Inst_name.DBPARTnnn.Timestamp.Seq_num
DB_alias
The database alias name that you specified when you invoked the backup utility.
Type
The type of backup operation, where 0 represents a full database backup, 3 represents a table space backup, and 4 represents a backup image generated by the LOAD command with the COPY NO option.
Inst_name
The name of the current instance, which is the value of the DB2INSTANCE environment variable.
nnn
The database partition number. In a DB2 pureScale environment, the number is always 000.
Timestamp
A 14-character representation of the date and time when you performed the backup operation. The time stamp is in the form yyyymmddhhnnss, where:
  • yyyy represents the year.
  • mm represents the month (01 to 12).
  • dd represents the day of the month (01 to 31).
  • hh represents the hour (00 to 23).
  • nn represents the minutes (00 to 59).
  • ss represents the seconds (00 to 59).
Seq_num
A 3-digit number used as a file extension.
For example:
   SAMPLE.0.krodger.DBPART000.200802241234.001

Online backup with INCLUDE LOGS

An online backup operation with the INCLUDE LOGS option (the default) produces a backup image that includes the range of log files required to restore and roll the database forward to its minimum recovery time. If this backup image is then used to restore to a new database (perhaps during disaster recovery), and only the logs from the backup image are available during a subsequent rollforward operation, a ROLLFORWARD DATABASE command with the TO END OF LOGS parameter often returns an error message about a missing log file (SQL1273N). This is expected in some situations, because the database manager might have detected that additional logs were written after the backup operation, but that those logs are not available for the current rollforward operation. It might also be the case that one or more of the logs that are necessary to roll the database forward to a consistent point in time are missing. In either case, verify that the end point of the rollforward operation is acceptable and then issue a ROLLFORWARD DATABASE with the AND STOP parameter. If the rollforward operation has reached its minimum recovery time despite the missing log file, the ROLLFORWARD DATABASE with the AND STOP parameter should complete successfully; otherwise, it returns SQL1276N (the rollforward operation did not reach its minimum recovery time using this backup image).

Disaster recovery and high availability through log shipping in a DB2 pureScale environment

Log shipping is the process of copying whole log files to a standby machine, either from an archive device, or through a user exit program running against the primary database. You can choose to keep a standby database up-to-date by applying the logs to it as they are archived, or you can keep the database or table space backup images and log archives on the standby site, and perform restore and rollforward operations only after a disaster has occurred. In either case, the rollforward operation on the standby site might detect that one or more log files are missing and return SQL1273N. Verify that the rollforward operation reached an acceptable time stamp, or take appropriate action to correct the problem.

If, during a log stream merge operation, the DB2 database manager determines that there is a missing log file in one of the log streams, an error is returned. The rollforward utility returns SQL1273N; the db2ReadLog API returns SQL2657N. If you choose to keep a standby database up-to-date by applying logs to it as they are archived, rollforward operations might frequently detect that some logs are missing.

Figure 1 shows an example of how two members could write log records to the log files in their active log stream. Each log file is represented by a box. Consider a scenario where both a primary and standby site have been set up for high availability. A ROLLFORWARD DATABASE command with the END OF LOGS option is attempted on the standby site at time points A, B and C. For any particular point in time, any log files that have been closed before that time have been archived and are accessible on the standby. Otherwise, the log file is still active on the primary and is not available to the standby yet (as shown for log file 4 on log stream 1 at time B).
Figure 1. Log files in a DB2 pureScale environment
Diagram showing log files in a DB2 pureScale environment

At time A, the ROLLFORWARD DATABASE command will complete successfully as log file 1 from log stream 0 was closed and archived at the same time as log file 3 from log stream 1. At time B however, the ROLLFORWARD DATABASE command will return v. This happens because at the time that the command is issued on the standby site, the standby site has access to log files 2 and 3 from log stream 0, but not to log file 4 from log stream 1 because the log file is still open and active on the primary site. Furthermore, since the log records in files 2 and 3 on log stream 0 were written during the same time period as the beginning of log file 4 on log stream 1, the rollforward operation cannot process log files 2 and 3 until log file 4 from log stream 1 is made available. At time C, when log file 4 is finally closed and archived on log stream 1, a ROLLFORWARD DATABASE command will complete successfully. It is possible to force the truncation and archiving of files across all the log streams using the ARCHIVE LOG command, or by deactivating the database across all members. In the case of the ARCHIVE LOG command, the current log file on each log stream is truncated independently and there is no guarantee that it will happen at the exact same point in time across all members. Therefore, even if the ARCHIVE LOG command is issued, it is still possible to get an SQL1273N error when executing the ROLLFORWARD DATABASE command.

While missing log conditions are common and expected when using log shipping in a DB2 pureScale environment, in most cases, each rollforward operation on the standby will make additional progress over the last ROLLFORWARD DATABASE command (even when SQL1273N is returned) and therefore the error itself should often be expected. It is possible, however, for the primary site to have trouble archiving a file for one log stream while successfully archiving logs for the other log streams. This could be the result of a temporary problem accessing the archive storage for one log stream. Such problems can cause the log merge and replay on the standby to be held up, increasing the number of transactions that could be lost in the event of a disaster. To ensure that your standby system is up-to-date, issue a ROLLFORWARD DATABASE command with the QUERY STATUS parameter after each rollforward operation that returns SQL1273N and verify that progress is being made over time. If a rollforward operation on the standby is not making progress over an extended period of time, determine why the log file reported as missing is not available on the standby system and correct the problem. The ARCHIVE LOG command can be used to truncate the log files that are currently being updated on each member, making them eligible for archiving and subsequent replay on the standby system.

In the event of a disaster (for example, fire, earthquake, vandalism, or other catastrophic events) your plan for recovery might be to execute a rollforward operation through all remaining logs, or a restore and rollforward operation through all available logs. As mentioned previously, the rollforward operation might detect that one or more log file is missing, because log files were written on the primary but not yet archived at the time of the disaster (SQL1273N). It is also possible that a log that was archived cannot be found by the rollforward utility for some unexpected reason; this can also cause the rollforward utility to return SQL1273N. It is important to validate the end point of a rollforward operation by using the ROLLFORWARD DATABASE command with the QUERY STATUS parameter, and to decide whether or not the missing log condition is expected. If the missing log condition is expected, or the end point is acceptable, you can issue a ROLLFORWARD DATABASE command with the STOP parameter to complete the rollforward recovery process.

Restrictions

After a change in topology that involves adding or dropping a member, you cannot perform rollforward recovery operations through the point where the topology change occurred. If you add or drop a member, the database is placed in backup pending state, and you must perform either an incremental or a full database backup operation before a connection to the database can be made. To recover, restore this backup image and roll forward to the end of the logs. If you must restore a backup image from before the topology change, you will only be able to roll forward to the point at which the topology change occurred. This can be accomplished by issuing a ROLLFORWARD DATABASE command with the TO END OF LOGS parameter (which returns SQL1546N) followed by a ROLLFORWARD DATABASE command with the STOP parameter. This operation will not recover any transactions that changed the database after the topology change.

In a DB2 pureScale environment, the ON ALL DBPARTITIONNUMS parameter and the ON DBPARTITION (0) parameter of the BACKUP DATABASE command are valid. If you specify a database partition number other than 0, however, an error (SQL0270N) is returned because no other database partitions exist.

Examples