 | Level: Intermediate Mark McConaughy (mmcconau@us.ibm.com), Senior Software Engineer, IBM
23 Apr 2008
This article discusses practically everything a directory administrator needs to know about the archival logging method required to enable online backups. If you want to do online backups of your directory, you must configure archival logging of the database transaction logs. But archival logging is not practical unless you put a process in place to manage the log files, deleting inactive logs when they are no longer needed. This article describes the recommended approach to configuring the logging options and managing the logs.
Introduction
There are several different ways to backup data from an LDAP directory. Data can be exported to LDAP Data Interchange Format (LDIF). Data stored in LDIF can be used as a backup, but is more typically used in cases where the data is to be copied from one directory to another. LDIF data is portable, so given comparable schema defined in the source and target directories, it may be copied to another LDAP directory on a different hardware platform or from a different vendor. IBM®Tivoli® Directory Server (TDS) uses a DB2™ database to store the directory data. DB2 provides the capability to do a backup of a database. This approach is much faster than exporting all the directory data to LDIF. It also has the advantage of including the database configuration settings, which can include settings critical to the performance of a particular directory, in the backup image. Database backups can be done offline or online. An offline backup requires that no applications are accessing (connected to) the database, so no directory access is available during the time the backup is running. In an environment requiring high availability, it is typically unacceptable to stop the directory server and DB2 instance in order to take a backup of the directory. DB2 (and TDS) supports doing online backups of a database. When the online mode is used, directory access can continue during the backup process, and in fact, directory updates are also allowed. However, in order to do online backups, DB2 requires that the database be configured for archival logging. And archival logging requires that a procedure be in place to delete log files after they are no longer needed. This article describes how DB2 transaction logging works and recommends a procedure for log file management.
 |
Choosing an appropriate backup method
What are the factors to consider in choosing a backup method? As alluded to in the introduction, the first choice in backup methods is between exporting data to LDIF and using a database backup. There are some advantages to exporting to LDIF:
- For directories with no more than a few hundred entries, it might be more time and space efficient to export the entries to LDIF as a backup, because there is extra overhead for a full database backup.
- The directory server can be running and accepting updates during the export to LDIF.
- LDIF is a portable format, so if you need to move data from an LDAP directory on Windows® to one on AIX®, using LDIF is the best approach.
- LDIF is also vendor neutral, so if you decide to move your data from some other vendor’s directory server to the Tivoli Directory Server, then exporting to LDIF is a good way to go.
Now, if you are not moving data between directories and you have more than a few hundred entries, you would be better off doing a database backup. For large directories, it is much faster to do a database backup than to export all the data to LDIF. A database backup has these advantages:
- It is faster and more efficient for directories with thousands to millions of entries.
- It saves the database configuration settings, including those for the underlying tablespaces.
- DB2 provides the options to do either offline or online backups.
- It includes an option to do full backups or delta backups, which include only the changes since the last backup.
- Your saved backup images and the transaction logs allows for recovery right up to the last transaction completed prior to a disk crash.
So, database backups are usually the best approach, but you still need to decide if online or offline backups will work best for you. Doing offline backups is the simplest approach. It requires less administrative activity, and if you can afford to stop a directory server long enough to do a periodic backup, then it is probably the best solution for you. Even if you need to provide LDAP access 24x7, you might be able to use offline backups if you have a replica server that could be stopped long enough to run the backup. This can work for you if you have at least three LDAP servers sharing the application load, and two of them are capable of handling the work for up to several hours while the third is stopped to do the backup. I suggest a minimum of three, because in that case you still have two active servers at all times, and in case of a system crash, you are not caught with no LDAP access. The three servers might be made up of a master and two replicas, or three peer master servers, depending on your requirements.
On the other hand, if you cannot afford to stop a directory server to take a backup, then online backups are the way to go. The extra administrative overhead when a database is configured for online backups is related to the accumulation of transaction logs that necessitates some management of them. You must periodically delete old log files that you do not need or available disk space will eventually be consumed. If you want to do online backups, then read this article to learn about transaction log files and a simple procedure to follow to prevent them from getting out of control.
|
|
Whenever an update is made to the database, DB2 adds a record of the update to the transaction log. The transaction log is a persistent record of all updates to the database. If the database crashes before the actual database files have been updated, the changes need not be lost. The transaction log can be rolled forward (replayed) to cause any committed updates missing from the database to be applied at a later time. Keeping the transaction logs on a separate physical disk from the database files improves performance and reducing the risk of data loss due to a disk crash.
The transaction log files are numbered in sequence, like this:
-
S0000000.LOG
-
S0000001.LOG
-
S0000002.LOG
-
S0000003.LOG
-
S0000004.LOG
-
...
Online backup
With the DB2 default database configuration, the database must be offline in order to create a backup image. In order to stop the DB2 instance, the directory server must also be stopped. Offline backups are simpler because there are no updates occurring in the database during the time the backup is created. DB2 does support doing online backups. In this case, the database (and directory server) can remain active during the entire backup process. Since updates to the data are allowed during an online backup, the transaction log files representing those updates must be saved along with the backup image. The backup file cannot contain all the updates that occurred during the time the backup process was running, but the log files make up for that. When a restore is done from an online backup, first the database image is restored, and then the saved transaction logs are rolled forward to replay all the updates that occurred during the backup. At the end of the roll forward step, the database is consistent with the point at which the backup completed. The requirement to save the transaction logs associated with the backup creates an additional administrative burden, as you will see if you continue reading.
DB2 transaction log configuration options
DB2 supports two modes of transaction logging: circular and archival. The database configuration specifies the number of primary and secondary log files to be created and used. Secondary log files are only created if the specified number of primary logs exist and all are active. A log file is considered active until all of the transactions it contains have been committed and written to the database files. With circular logging (the default setting), when the last available log file fills up, then DB2 reuses the first inactive log file. The log space used is a fairly constant amount. With archival logging, DB2 does not reuse or delete any log files. Inactive log files remain on the file system, and new files are created as needed.
Figure 1. Circular logging
Figure 2. Archival logging
With archival logging, a process must be put in place to identify and remove inactive log files that are no longer needed as part of your data recovery strategy. Without such a process, the file system will eventually fill up, and database updates will not be able to proceed.
Due to the added complexity of archival logging, Tivoli Directory Server defaults to circular logging. However, in order to do online backups, you must configure the database for archival logging. This is because the log files are used to capture any updates that occur during the backup. It is important to have plenty of log space for all updates during the entire backup. Saving the transaction logs during and after the backup occurs also allows for point-in-time recovery. For example, if the logs are kept on a separate disk from the database (this is a good idea) and the disk holding the database crashes, the latest backup image together with the log files can be used to restore the database right up to the point when the crash occurred. (Note that the current location of the active logs can be read from the “Path to log files” field in the database configuration.)
Example 1. Sample output from DB2 9.1 get db configuration
command
Database configuration for database ldapdb2
Database configuration release level = 0x0b00
Database release level = 0x0b00
.
.
.
Log file size (4KB) (LOGFILSIZ) = 2000
Number of primary log files (LOGPRIMARY) = 8
Number of secondary log files (LOGSECOND) = 3
Changed path to log files (NEWLOGPATH) =
Path to log files = /home/dbowner/dbinst/NODE0000/SQL00001/SQLOGDIR/
Overflow log path (OVERFLOWLOGPATH) =
Mirror log path (MIRRORLOGPATH) =
First active log file = S0000017.LOG
Block log on disk full (BLK_LOG_DSK_FUL) = NO
Percent max primary log space by transaction (MAX_LOG) = 0
Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0
Group commit count (MINCOMMIT) = 1
Percent log file reclaimed before soft chckpt (SOFTMAX) = 320
Log retain for recovery enabled (LOGRETAIN) = OFF
User exit for logging enabled (USEREXIT) = OFF
HADR database role = STANDARD
HADR local host name (HADR_LOCAL_HOST) =
HADR local service name (HADR_LOCAL_SVC) =
HADR remote host name (HADR_REMOTE_HOST) =
HADR remote service name (HADR_REMOTE_SVC) =
HADR instance name of remote server (HADR_REMOTE_INST) =
HADR timeout value (HADR_TIMEOUT) = 120
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
First log archive method (LOGARCHMETH1) = DISK:/backups/logs
Options for logarchmeth1 (LOGARCHOPT1) =
Second log archive method (LOGARCHMETH2) = OFF
Options for logarchmeth2 (LOGARCHOPT2) =
Failover log archive path (FAILARCHPATH) =
Number of log archive retries on error (NUMARCHRETRY) = 5
Log archive retry Delay (secs) (ARCHRETRYDELAY) = 20
Vendor options (VENDOROPT) =
Auto restart enabled (AUTORESTART) = ON
Index re-creation time and redo index build (INDEXREC) = SYSTEM (RESTART)
Log pages during index build (LOGINDEXBUILD) = OFF
Default number of loadrec sessions (DFT_LOADREC_SES) = 1
Number of database backups to retain (NUM_DB_BACKUPS) = 12
Recovery history retention (days) (REC_HIS_RETENTN) = 366
.
.
.
|
Prior to DB2 8.2 (8.2 = 8.1 with fixpack 7 or later), configuration of archival logging was managed through the logretain parameter. When the logretain parameter was set to ON; archival logging was in effect, and log files were no longer reused in place. New log files were created when needed, and the inactive log files were simply left in their original location and it was up to the administrator to archive or delete them as necessary to manage space.
Beginning with DB2 8.2, you configure log archiving by changing the logarchmeth1 database parameter from OFF to an appropriate value selecting the archiving method desired. The possible values are:
- LOGRETAIN – With this method similar to the previously available logretain parameter, inactive log files are never overwritten. This means that inactive logs must be deleted or moved to some archive location to avoid running out of disk space for primary logs. The database configuration specifies the number of active primary and secondary log files that can be created. With LOGRETAIN set, DB2 will first fill up the primary logs, and then if the first primary log is still active, it will create secondary logs. If the maximum number of primary and secondary logs have been created and filled before the first primary log becomes inactive, a log full condition will occur. As primary logs become inactive, DB2 will create additional primary logs as needed. In LOGRETAIN mode, it is very important to monitor the disk space available for log files, because if the disk fills up, directory updates will not be possible until the condition is corrected.
- USEREXIT – In this mode, archiving and retrieval of logs is performed by a user-supplied user exit program called db2uext2. The user exit program is called to copy a log file to an archive location as soon as that log file is full. This allows DB2 to rename and reuse the file when it becomes inactive. When inactive log files are required during recovery operations (after restoring a database from a backup), DB2 will call the user exit to retrieve the necessary logs from the archive location.
- DISK:directory – With this setting, log management is performed using an algorithm similar to the USEREXIT mode. The difference is that instead of calling the user exit program, DB2 automatically archives the logs from the active log directory to the specified directory. During recovery, DB2 knows how to retrieve these logs from that location.
- TSM:[management class name] – This method is again similar to USEREXIT, except that logs will be automatically archived on the local Tivoli Storage Manager (TSM) server. The management class name parameter is optional. If not specified, the default management class is used.
- VENDOR:library – With this setting, logging operates in a mode similar to USEREXIT, except that the specified vendor library is invoked to archive or retrieve the logs.
If running with DB2 8.2 or later, there are advantages to using a setting like "DISK:directory" that will automatically move inactive logs out of the primary log location. This reduces the risk that the file system containing the active logs will run out of space, which causes very undesirable problems.
Note that when you first configure archival logging, an offline backup must be done first, before any online backups can be done. If you attempt an online backup before doing this one-time offline backup, you will get an error indicating that the "database is not recoverable or a backup pending condition is in effect".
Log management procedures
Now for the specifics of the log management process. The key to this log management strategy is that whenever a full backup is taken, log files that were no longer active at the time the backup was initiated, are no longer needed and can be deleted. This process takes just a few simple steps.
The procedure documented here makes the following assumptions:
- Only full backups are done. (Consideration of incremental backups is not included.)
- No transaction logs created prior to the latest backup image will be saved. This means that if a database needs to be restored, the very latest backup image must be used. This also means there is no need to save more than the one latest backup image.
- Full backups will be done on a regular basis. The frequency of backups will affect how much space must be set aside for transaction logs. The longer the interval between backups, the more space is required. The other major factor affecting the amount of log space required is the rate of updates made to the database.
Based on these assumptions, the following four-step process prevents the accumulation of excess log and backup files:
- Find out which log files are no longer active.
There is a DB2 database configuration parameter that can be read to find the name of the current first active log file. This parameter is blank when the default "circular" logging is in effect, but after archival logging has been configured it will contain a log file name.
> db2 get db cfg for dbname | grep "First active log file"
- Use the DB2 BACKUP command to do the online backup of the database.
> db2 backup db dbname online to /backups
- Now after the new backup has been completed, delete all of the log files that came before the one identified in step one. That is, if the "first active log file" in step one was S0000003.LOG, then after the backup has completed, you can delete files S0000000.LOG, S0000001.LOG and S0000002.LOG. The location of the inactive logs to be deleted will vary depending on the log archival method chosen. If using LOGRETAIN, then the inactive logs will be in the same location as the active logs – see the “Path to log files” field in the database configuration. If using DISK:directory, then the inactive log files will be in the location specified by directory.
- Delete any earlier backup images that you no longer want to retain.
That's it. The file system free space should still be checked periodically to make sure that it is not getting full for any reason. This check should be applied to both the primary transaction log directory and the archive location (if logarchmeth1 was specified as “DISK:directory”). This approach removes the files when they are no longer needed, but keeps all the files that would be required to use the latest backup image to restore the database, and the subsequent log files to roll forward all transactions up to the end of the most recent active transaction log file.
Resources -
"DB2 8.2 Library"
All DB2 v8.2 product publications in PDF format.
(The book Data Recovery and High Availability Guide and Reference contains a section called “Data recovery” that covers all the concepts related to backups and logging.)
-
DB2 InfoCenter
,
DB2 v8 infocenter – documentation in searchable form
-
DB2 Doc
,
DB2 documentation on understanding recovery logs
-
DB2 Backup Command
,
DB2 documentation on backup command
-
DB2 Get DB Config
,
DB2 documentation on the Get Database Configuration command
-
DB2 Update DB Config
,
DB2 documentation on the Update Database Configuration command
-
DB2 InfoCenter
,
DB2 documentation on configuration parameters for database logging
-
DB2 InfoCenter
,
DB2 v8 infocenter – documentation in searchable form
-
Browse the
technology bookstore
for books on these and other technical topics.
About the author  | 
|  | Mark McConaughy is a senior software engineer with the IBM Tivoli Directory Server development team based in Austin, TX. He is the technical lead, and has been with the team since its inception. In a past assignment, Mark worked on the development of RDBM software that was a precursor to today's DB2. He holds a Bachelor's degree in Engineering from the University of New Mexico. |
Rate this page
|  |