Contents


An Overview of Transactional Logging in DB2 Universal Database

Comments

The following article applies to IBM DB2® Universal Database® on UNIX®, Linux®, and Windows®.

Any database management system must have mechanisms for securing data consistency and recoverability. One of the many mechanisms used by relational database systems to insure those all-important qualities is transactional logging. In this article, we will define and discuss the types of transactional logging. We'll go into detail on how log files are allocated, how they are stored, and what errors you might encounter. Finally, we will discuss the new features that are available in Version 8 that make it more scalable and versatile than ever.

What is transactional logging?

Databases store data to be accessed and processed by applications. Those applications insert, read, update, or delete data. Each of these activities is performed within a transaction, defined as "a recoverable sequence of operations within an application process". A transaction, also referred to as a "unit of work," does not affect the database unless that transaction is committed.

Combining database operations into transactions is half of the solution to ensure data consistency. The other half is a database manager implementation called write-ahead logging. Transactions are logged while they occur, regardless of whether or not the transactions commit. Transactions go from the log buffer to log files (transactional logging) before any data is written from the buffer pools to the database structures. The files used to log the transactions are called the transaction logs.

Is there more than one type of transactional logging?

DB2 UDB has two types of logging available - circular and archive logging.

Circular logging

Circular logging is the default logging strategy used for a database. In this strategy, once the last primary log file is filled in the log directory, new transactions will be written to the first log file thereby overwriting existing log data. These new transactions will continue to overwrite each old log file in sequence. This method of logging ensures data consistency for all committed transactions so that crash recovery is possible.

Circular logging is typically used in data warehouse environments where the need to recover a database is just a matter of restoring a database image. This strategy should not be a choice in an on-line transaction processing (OLTP) environment since roll-forward recovery is not possible. Figure 1 below illustrates circular logging:

Figure 1. Circular logging
Circular logging
Circular logging

Archival logging

In contrast to circular logging, the archival logging process creates a new log file when the last log file is filled so that future transactions will not overwrite existing log files. When the database is initialized, the system allocates a certain number of primary log files of a specified size in the active log directory. This number is controlled by a database configuration parameter (discussed in the next section). When the primary log files are all full, secondary log files are created on an "as-needed" basis until the maximum number of secondary log files has been created. Once this number is reached, if additional log space is needed, an error is issued indicating no more log files are available and all database activity is stopped.

With archival logging, it is possible to take an online database backup during which database activity continues to be logged. In the event of a database crash or failure, the database may be restored using a full backup image followed by a roll-forward operation using the archived logs to bring the database to a point-in-time state or to the most recent consistent state by rolling forward to the end of the logs.

Archived logs are of two types:

  • Online archived logs: These are the log files that reside in the database log directory ('online') and are no longer needed for normal database activity.
  • Off-line archived logs: These are the log files that have been moved from the database log directory to an off-line storage location (such as a backup server) and are not needed for normal database activity.

Figure 2 illustrates archival logging:

Figure 2. Archive logging
Archive logging
Archive logging

What parameters are available to control logging?

Transactional logging is controlled at the database level through database configuration parameters. The following are parameters that affect transactional logging:

LOGRETAIN

This parameter causes archived logs to be kept in the database log path directory. Enabling it by setting it to "RECOVERY" allows the database manager to use the roll-forward recovery method. You do not require userexit to be enabled when the logretain configuration parameter is enabled. Either of the two parameters is sufficient to allow the roll-forward recovery method.

Using this parameter means that circular logging (the default) is being overridden. Here are the valid values of logretain:

  • No (default) - to indicate that logs are not retained.
  • Recovery - to indicate that the logs are retained, and can be used for forward recovery. In addition, if you are using data replication, the CAPTURE program can write the updates recorded in the logs to the change table.
  • Capture - to indicate that the logs are only retained so that the Capture program can write the updates to the change table. These logs may be used for forward recovery if they have not been pruned. NOTE: The Capture setting is normally used only if you are setting up your database for data replication.

If logretain is set to "Recovery" or userexit is set to "Yes" (see below), the active log files will be retained and become online archive log files for use in roll-forward recovery. This is called log retention logging.

After logretain is set to "Recovery" or userexit is set to "Yes" (or both), you must make a full backup of the database. This state is indicated by the backup_pending flag parameter.

If logretain is set to "No" and userexit is set to "No", roll-forward recovery is not available for the database, and recoverability is limited to the last database backup.

When logretain is set to "Capture", the Capture program calls the PRUNE LOGFILE command to delete log files when the Capture program completes. Although the logs can be used for forward recovery if they have not been pruned, you should not set logretain to "Capture" if you want to be assured of being able to perform roll-forward recovery on the database.

If logretain is set to "No" and userexit is set to "No", logs are not retained. In this situation, the database manager deletes all log files in the logpath directory (including online archive log files), allocates new active log files, and reverts to circular logging.

When the logretain configuration parameter is set to "RECOVERY", the log files will remain in the active log path. The active log path is determined either by the Path to Log Files (logpath) or Changed Path to Log Files (newlogpath) value in the database configuration file.

USEREXIT

This parameter causes the database manager to call a user exit program for archiving and retrieving logs. With the user exit enabled, roll-forward recovery is allowed. You do not require logretain to be enabled when the userexit configuration parameter is enabled. Either one of the two parameters is sufficient to allow the roll-forward recovery method.

Using this parameter means that the circular logging, that is the default, is being overridden. Userexit implies logretain but the reverse is not true.

The active log path is important when using either the userexit configuration parameter or the logretain configuration parameter to allow roll-forward recovery. When the userexit configuration parameter is enabled, the user exit is called to archive log files and move them to a location away from the active log path.

The following are the valid values for this parameter:

  • No (default)
  • Yes

If this parameter is enabled, log retention logging is performed regardless of how the logretain parameter is set. This parameter also indicates that a user exit program should be used to archive and retrieve the log files. Log files are archived when the database manager closes the log file. They are retrieved when the ROLLFORWARD utility needs to use them to restore a database.

After logretain, or userexit, or both of these parameters are enabled, you must make a full backup of the database. This state is indicated by the backup_pending flag parameter.

If both of these parameters are de-selected, roll-forward recovery becomes unavailable for the database because logs will no longer be retained. In this case, the database manager deletes all log files in the logpath directory (including online archive log files), allocates new active log files, and reverts to circular logging.

LOGPRIMARY

This parameter specifies the number of primary logs that will be created

A primary log, whether empty or full, requires the same amount of disk space. Thus, if you configure more logs than you need, you use disk space unnecessarily. If you configure too few logs, you can encounter a log-full condition. As you select the number of logs to configure, you must consider the size you make each log and whether your application can handle a log-full condition.

If you are enabling an existing database for roll-forward recovery, change the number of primary logs to the sum of the number of primary and secondary logs you are currently using, plus 1. Additional information is logged for long varchar and LOB fields in a database enabled for roll-forward recovery.

The total log file size limit is 32 GB for version 7.2 and 256 GB for version 8.1. That is, the number of log files (LOGPRIMARY + LOGSECOND) multiplied by the size of each log file in bytes (LOGFILSIZ * 4096) must be less than 32 GB or 256 GB respectively.

LOGSECOND

This parameter specifies the number of secondary log files that are created and used for recovery log files (only as needed). Note that the total number of log files is limited by the following equation:

logprimary + logsecond <= 128 (DB2 UDB Version 7.2), 256 (DB2 UDB Version 8.1)

When the primary log files become full, the secondary log files (of size logfilsiz) are allocated one at a time as needed, up to a maximum number as controlled by this parameter. An error code will be returned to the application, and activity against the database will be stopped, if more secondary log files are required than are allowed by this parameter.

LOGFILSIZ

This parameter determines the number of pages for each of the configured logs. A page is 4 KB in size. The size (number of pages) of each primary log has a direct bearing on database performance. When the database is configured to retain logs, each time a log is filled, a request is issued for allocation and initialization of a new log. Increasing the size of the log reduces the number of requests required to allocate and initialize new logs. However, note that with a larger log size it takes more time to format each new log. The formatting of new logs is transparent to applications connected to the database and does not impact database performance.

LOGBUFSZ

This parameter allows you to specify the amount of database shared memory to use as a buffer for log records before writing these records to disk. The log records are written to disk when one of the following occurs:

  • A transaction commits.
  • The log buffer is full.
  • Some other internal database manager event causes the write.

Buffering the log records will result in more efficient logging file I/O, because the log records will be written to disk less frequently and more log records will be written each time.

MINCOMMIT

This parameter allows you to delay the writing of log records to disk until a minimum number of commits have been performed. This delay can help reduce the database manager overhead associated with writing log records and, as a result, improve performance when you have multiple applications running against a database and many commits are requested by the applications within a very short time frame.

This grouping of commits will only occur when the value of this parameter is greater than 1, and when the number of applications connected to the database is greater than the value of this parameter. When commit grouping is being performed, application commit requests are held until the earlier of either one second elapsing or the number of commit requests equals the value of this parameter.

NEWLOGPATH

The database logs are initially created in a directory called SQLOGDIR, a sub-directory of the database directory. You can change the location where active logs and future archive logs are placed by changing the value for this configuration parameter to point to either a different directory, or to a device. Archive logs that are currently stored in the database log path directory are not moved to the new location if the database is configured for roll-forward recovery.

Because you can change the log path location, the logs needed for roll-forward recovery may exist in different directories or on different devices. You can change this configuration parameter during the roll-forward process to allow you to access logs in multiple locations.

The change to the value of newlogpath will not be applied until the database is in a consistent state. An informational database configuration parameter, database_consistent, indicates the status of the database.

Note: The database manager writes to transaction logs one at a time. The total size of transactions that can be active is limited by the database configuration parameters:

log space

>= LOGFILSIZ * LOGPRIMARY * 4096 bytes

<= LOGFILSIZ * (LOGPRIMARY + LOGSECOND) * 4096 bytes <= 32 GB for v7.2 or <= 256 GB for v8.1

How are log files allocated and where are they stored?

How are log files allocated?

Once all of the database configuration parameters that we discussed in the previous section are updated, you will need to make sure all applications disconnect from the database. When an application reconnects, the new settings take effect and log files are pre-allocated on your disk. The number of log files allocated is based on the logprimary parameter value and the total number of log files in the active log directory.

With archive logging, if you don't move old log files from the active log directory, they will accumulate and share the directory with the active log files. For instance, let's say you have five log files in the active log directory. Two of the log files are completely full and committed thereby leaving three log files free. Upon first connection to the database after updating the logprimary parameter from five to eight, five more log files will be allocated to the active log directory ensuring a total of eight free log files (3 free + 5 new = 8 free).

With circular logging, if the logretain or the userexit parameter are set to no, log files are reused. This means that upon first connection to the database, the total number of primary logs is equal to the total number of log files residing in the active log directory. If we take our previous example and update the logprimary parameter from five to eight, only three more log files will be allocated to the active log directory, even if some of the log files are completely full (5 free + 3 new = 8).

As we discussed in the previous section, each log file created will have the same allocation of space based upon the logfilsiz parameter. If the logfilsiz parameter value is changed, the existing log files with transactional data will not be affected while new log files are created and empty log files are increased to the new size.

Note that the size reflected for each log file is mostly pre-allocated space except for some log header control characters. By pre-allocating this space in the log file, the database manager doesn't have to make an allocation of space at the time it needs it. It's time consuming and resource intensive to allocate space to the hard drive, and so it is best to have that space readily available when it is needed.

The logsecond parameter is the only parameter that is differentiated between version 7.2 and version 8. In version 7.2, this parameter only takes effect after all applications disconnect and a reconnection is made. Version 8 allows this parameter to become effective immediately after it is updated. This does not mean you will suddenly see new log files in your log directory after reconnecting to the database. This parameter only causes an allocation of log files when the logprimary value is exceeded, which most likely can happen during a series of long uncommitted transactions.

Here is an example of how logsecond influences log allocation. You just updated logprimary to five and logsecond to two (five pre-allocated and two allocated when required). A transaction runs and uses up all five primary log files but is still logging transactions. When the sixth log file is needed, the database manager will check the logsecond value and allocate another log file to the log directory. The seventh log file is now needed and gets allocated. Since only two log files are specified with the logsecond parameter, the transaction must commit before the seventh log file becomes full or an error will be returned and the transaction will roll back.

So what happens to the log files after a commit? In order to preserve the logprimary value, the database manager will allocate new log files based upon how many log files were used in the previous transaction. For instance, if three log files were filled, another three log files will be created to ensure a total number of free log files equal to the logprimary value.

What happens when all applications disconnect from the database? Upon reconnecting to the database, the last log file to have data placed in it from the previous connections will be truncated to the size of the data in the file. This eliminates free space and allows an exact allocation of space based on the database configuration parameters noted above.

Where are log files stored?

By default log files are stored in the following directory:

On Windows: c:\<instance name>\NODE0000\SQL00001\SQLOGDIR

On UNIX: <instance home directory>/<instance name>/NODE0000/SQL00001/SQLOGDIR

In the above directory path, there will be an SQLxxxxx ('xxxxx' is a number starting with 0) directory for each database that is created. If you have more than one physical database in your DB2 instance, it will be difficult to know which SQLxxxxx directory belongs to which database. To resolve this question, simply type the following db2 command:

 db2 "list database directory on c:"

where c: is the drive letter where your database resides.

On UNIX, you would specify:

 db2 "list database directory on /<instance home directory>"

Here is a sample of the output returned from the above command:

Listing 1. DB2 database directory
Database 1 entry:  

 Database alias                  = DWCTRLDB 
 Database name                   = DWCTRLDB 
 Database directory              = SQL00001 
 Database release level          = 9.00 
 Comment                         = 
 Directory entry type            = Home 
 Catalog node number             = 0 
 Node number                     = 0 

Database 2 entry:  

 Database alias                  = SAMPLE 
 Database name                   = SAMPLE 
 Database directory              = SQL00002 
 Database release level          = 9.00 
 Comment                         = 
 Directory entry type            = Home 
 Catalog node number             = 0 
 Node number                     = 0 

Database 3 entry:  

 Database alias                  = UTFDB 
 Database name                   = UTFDB 
 Database directory              = SQL00003 
 Database release level          = 9.00 
 Comment                         = 
 Directory entry type            = Home 
 Catalog node number             = 0 
 Node number                     = 0

As can be seen above, each SQLxxxxx directory is associated with a database name. Of course, if your database is not created in the default directory or you changed the path to the log files, you can check 'Path to the Log Files' or the newlogpath parameter or in your database configuration file. You might also make sure that the database you are looking for resides in the instance you are searching in.

Here is a visual representation of where the log files are stored (by default) on Windows:

Figure 3. Windows Explorer view of log files
Windows Explorer view of log files
Windows Explorer view of log files

In the picture above, you can see a database directory called SQL0003. This directory is associated with the UTFDB database, as we determined from the output from the previous "list database..." command.

Common scenarios where transactional logging may cause errors

We have talked about what the database configuration parameters are that affect logging and how they are allocated, but it is now time to discuss the problems you can run into if the parameters are not set up correctly. Improperly setting up log parameters can be a big nuisance for not only the user who issues a long running transaction, but also for other users who are sharing the system. Here are some issues that you may come across:

  • During an IMPORT operation, a warning suddenly pops up that looks similar to this:

    Listing 2. Example warning from import operation

     C:\data>db2 "import from temp2.ixf of ixf create into temp2"
    
     SQL3150N The H record in the PC/IXF file has product "DB2    02.00", date
     "20010910", and time "171430". 
    
     SQL3153N The T record in the PC/IXF file has name "temp2.ixf", qualifier "", 
     and source "            ". 
    
     SQL3109N The utility is beginning to load data from file "temp2.ixf". 
    
     SQL3186W Data was not loaded into the database, because the log was full. 
     SQLCODE "-964" was returned.  A commit will be attempted and the operation 
     will continue if the commit is successful. 
    
     SQL0964C The transaction log for the database is full.  SQLSTATE=57011 
    
     SQL3221W  ...Begin COMMIT WORK. Input Record Count = "78". 
    
     SQL3222W  ...COMMIT of any database changes was successful.

    Since an IMPORT operation is essentially the movement of rows using SQL inserts with auto commit turned off, you will see the above warning stating that the log file is full, and a commit will be attempted. If the database configuration parameter logretain is set to ON, log recovery for the database is enabled and new log files will need to be created before the IMPORT operation can continue. This is important to know if you will need a combination of large and/or numerous log files, as these log files will take time to be allocated. If logretain is set to OFF, circular logging will take place thereby eliminating the necessity for new log files to be created.

    If you will be importing a very large number of rows into a database, it might be good to update the database configuration logging parameters during a period of low activity during the day so that the allocation of log files will not put excessive stress on the system. DB2 UDB version 7.2 allows for 32 GB of total log space to be allocated while version 8 allows a massive 256 GB

    An alternate consideration is to use the LOAD operation allowing you to use the NOT LOGGED INITIALLY option on a table. As it implies, the option does not cause logging to occur when the LOAD is executed. But it also prevents the table from being recoverable, so it is important to backup the database soon after executing the LOAD. Note that the NOT LOGGED INITIALLY option is only good for the current execution of the command, and the database manager will continue to log transactions that occur after the LOAD operation is complete.

  • Another common scenario occurs when an application is executing numerous INSERT, DELETE or UPDATE statements within a unit of work. Auto commit is turned off and the commit is executed explicitly only after the unit of work is completed. Unlike the IMPORT utility which can recognize the log file full condition and perform a commit, the logic in your application may not have taken this into account and will fail with the following error:

     SQL0964C The transaction log for the database is full.  SQLSTATE=57011

    So if your application calls a file that has unusually long transactions, it may be wise to include frequent commits based upon your log settings, or to update your database logging parameters.

  • A third problem can occur when performing RUNSTATS or REORG on your table. Both the RUNSTATS and REORG utilities use log files when moving table data around. If you encounter the SQL0964C error, you will need to increase the values of your database log file parameters.

New features for Version 8

DB2 logging has taken a huge leap forward in version 8 with some great new features. Here are some of them:

Logging scalability - For those who have a penchant for extremely long transactions, DB2 has increased the maximum active log space from 32 GB (version 7.2) to 256 GB. This is the total number of bytes possible from the sum of the logprimary and logsecond parameters.

Inflight transactions can now exceed the limit specified by the database logging parameters or even the available disk space in the active log path if the userexit database configuration parameter is set turned on. This new feature is set by updating logsecond to -1 (infinity). The idea is that since log files will be constantly archived by the userexit program during a long transaction, there will always be new room for additional log files in the active log directory. There is no limit on the size or the number of in-flight transactions running on the database. The drawback though, can be a performance impact if a rollback occurs since the userexit program will need to retrieve the uncommitted log files from the archive log path. In addition if logsecond is set to -1, new allocation of secondary logs is not limited to the (logprimary + logsecond) <= 256 equation.

The DB2_BLOCK_ON_DISK_FULL registry variable will be replaced by BLK_LOG_DSK_FUL. When this option is enabled, applications running when a disk full condition occurs on the active log path will not be disconnected. The DBA will then have time to delete files or enlarge the file system, thus allowing transactions to complete. When a log full condition is encountered, read access is still allowed on the database.

Logfile Mirroring - The newlogpath2 registry variable in DB2 UDB v7 is replaced by the mirrorlogpath database configuration parameter in DB2 UDB v8. This change will allow granularity at the database level, rather than at the instance level. The newlogpath2 variable and the mirrorlogpath parameter both serve the same purpose - they are used to implement dual or mirror logging. mirrorlogpath must point to a fully qualified path name, that is, an absolute path name.

Logging is faster - Performance with the logger is improved by asynchronous log writes. This is especially beneficial in SMP environments because they are able to take advantage of asynchronous processing.

Summary

In this article, we discussed many facets of transactional logging such as what transactional logging is, how to control it, where and how they are stored, some common errors that can be encountered, and some of the new features that are available in version 8.1. If you know how logging activities affect your database and operating system, you will be able to successfully and efficiently troubleshoot the problems that can arise from logging errors.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13491
ArticleTitle=An Overview of Transactional Logging in DB2 Universal Database
publish-date=01232003