Diagnosing corruption when using IBM DB2

Understand and troubleshoot

Learn how to identify and categorize the most common corruption issues while using IBM DB2®. In this article, learn corrective and preventive techniques you can implement to combat corruption issues.

Share:

Amitkumar Bamane (amitbamane@in.ibm.com), Software Engineer, IBM

Amitkumar BamaneAmitkumar Bamane is an IBM Certified Advanced Technical Expert for DB2. He is a DB2 advanced technical support analyst at IBM India Software Lab, Pune, where his main focus is to assist IBM DB2 customers worldwide.



16 August 2012

Also available in Chinese Russian Portuguese

Introduction

Perhaps regarded as one of the most troublesome business issues, database corruption often takes place subtly and affects businesses adversely. In simple words, corruption can be defined as any unintentional entry in database. Corruption issues may result in the severe performance hit of the system. In some cases, it may result in frequent system crashes and may become a business-critical downtime situation. Database corruption can occur at any layer, from DB2 to the operating system and right down to the hardware layer. Hence, it is important to understand and troubleshoot it by involving all possible affected layers and collecting any diagnostic data that might be required as soon as possible, while the data is available.

In this article, you will understand why a database can go offline when it encounters corruption. You will also learn to analyze symptoms for corruption and differentiate between easy-to-fix and catastrophic failures. This article will shed light on corruption issues when using IBM DB2 and help DB2 users understand and choose the best approach in dealing with such critical and high-impact issues.

This article begins with discussion about possible sources of corruption. It then explains the following tasks:

  1. Identifying and troubleshooting corruption — Identify and categorize corruption in the database when using DB2, with the help of sample symptom messages that appear in db2diag.log. Corruption issues can be broadly classified in five categories: data page corruption (or table corruption), index corruption, CBIT corruption, log corruption, and packed descriptor corruption.
  2. Using db2dart and INSPECT to identify corruption — Get insight into useful DB2 commands, db2dart and INSPECT, to check database corruption.
  3. Approaches to recover from corruption — Once a corruption issue is identified, how to approach these cases, what data to collect, and how to recover from the situation is crucial. Learn possible recovery approaches and how to choose between available options.
  4. Preventive strategies to avoid possible corruption — Best practices are discussed.

Sources

Database corruption can occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Some of the common reasons of corruption:

  1. Corrupt file system is one the most common reasons for corruption in a database. Abrupt system shutdowns, power surges, file system double-mount, migrating disks, file system-level activities like checking and repairing file system (using utilities like fsck on Linux®) when the database is up and running, and using Ctrl+Alt+Delete while a file is open, viruses can all introduce unintentional changes in database.
  2. Hardware failure.
  3. Memory corruption.
  4. DB2 defect.
  5. I/O and network issues (issues in fiber adapter, switches, etc.).
  6. Incorrect application coding.
  7. Inconsistency in the value of the page in the buffer pool (sqldPage) and the one stored in the file system.
  8. Overwriting of disk data causes corruption.
  9. User interference with critical configuration files of database, log files, log control files, etc. can put database in inconsistent state

Having said that corruption can be due to various reasons, finding out exactly what caused data corruption is challenging. In most cases, it is caused by file system issues and hardware problems.


Identify and troubleshoot

For a DBMS, page is the smallest unit of data for memory allocation performed by the operating system for a program and transfer between main memory and any other auxiliary store, such as a hard drive. So when you say something in a database is corrupt, you actually mean some pages in a database are corrupt.

Panic is a method DB2 uses to cause itself to crash if there is an error condition it cannot handle gracefully. When a page corruption is detected by DB2, it halts all processing by means of a controlled crash (panic), since it can't determine database integrity. This is also to prevent further damage or loss of data.

A number of error messages are dumped in db2diag.log when DB2 encounters corruption in database. When an outage occurs and automatic first occurrence data capture (FODC) is enabled, data is collected based on symptoms. FODC data will be collected automatically on DB2 9.5 when one of the following conditions is met:

  1. FOCD_Trap when an instance wide trap has occurred.
  2. FODC_Panic when a DB2 engine detects an incoherence and decides not to continue.
  3. FODC_BadPage when a bad page has been detected.
  4. FODC_DBMarkedBad when a database has been marked bad due to an error.

It is crucial to involve OS and hardware support in order to gather required information like OS diagnostics (for example, errpt –a, snap, and fileplace outputs on AIX®), as well as any and all hardware diagnostics (state saves, error logs, etc). It is important to ensure that there is adequate disk space for critical file systems like dump space and log directories to ensure that critical events are captured in entirety.

Looking at the db2diag.log, you can confirm if panic is due to corruption or another reason. Below, you will see how to identify and categorize corruption in DB2. Following are some of the most common db2diag.log error messages that identify corruption.

Data page corruption

Data page corruption indicates corruption in actual data in a table. A number of messages are logged to the db2diag.log when data corruption is detected. These messages are necessary to identify the affected object. Listing 1 shows a sample error message in db2diag.log.

Listing 1. Sample error message for data page corruption
2012-02-04-03.13.05.261082-360 I3442A358          LEVEL: Error 
PID     : 393470               TID  : 1           PROC : db2pfchr 0 
INSTANCE: inst1                NODE : 000 
FUNCTION: DB2 UDB, buffer pool services, sqlbReadAndReleaseBuffers, 
probe:13 
RETCODE : ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad" 
DIA8400C A bad page was encountered. 
. 
. 
. 
2012-02-04-03.13.05.264301-360 I3801A437          LEVEL: Error 
PID     : 393470               TID  : 1           PROC : db2pfchr 0 
INSTANCE: inst1                NODE : 000 
FUNCTION: DB2 UDB, buffer pool services, sqlbReadAndReleaseBuffers, 
probe:13 
DATA #1 : String, 158 bytes 
Obj={pool:9;obj:20;type:0} State=x27 Parent={9;20}, EM=1120, PP0=1152 
Page=51235 Cont=17 Offset=2848 BlkSize=15 
sqlbReadAndReleaseBuffers error: num-pages=16

As the above error messages indicate, DB2 encountered a bad page for tablespace ID 9 and table ID 20. The object type filed is marked 0, which indicates data page corruption.

You can query the catalog table to determine which table has corrupt pages:

db2 "select char(tabname,20), char(tabschema,20) from
                    syscat.tables where tableid=20 and tbspaceid=9"

Note that DB2 dumps bad page errors only for those pages it has tried to access. It does not necessarily mean that only those pages are corrupt. You need to explicitly check all pages in your database to find the extent of corruption, with the help of the db2dart or INSPECT commands.

Similar error messages are dumped in db2diag.log for corruption in temporary tables. If the data type field in db2diag.log error message holds a value greater than 128, it indicates corruption in the temporary table. If the object type is 3, it indicates LOB data corruption in a table.

Index corruption

An index is a database object that contains an ordered set of pointers that refer to rows in a base table. There are many flavors of index-related corruption problems, including:

  1. Unique index duplicates with different RID(s) or the same RID(s)
  2. Multiple index entries pointing to the same RID
  3. Out-of-place index key (wrong index key order)
  4. Row exists, but index keys do not exist in any or some of the indices
  5. Index entry pointing to an empty data slot or unused data slot or RID is invalid
  6. Incorrect previous or next index page pointers, incorrect high key or other corruptions on index pages

Following are sample error message for index corruption.

Listing 2. Sample error message for index corruption
2012-01-30-01.35.50.952434+000 I29308542A2532     LEVEL: Severe 
PID     : 1175792              TID  : 33926       PROC : db2sysc 0 
INSTANCE: inst1                NODE : 000         DB   : SAMPLE 
APPHDL  : 0-7                  APPID: *LOCAL.inst1.120130013528 
AUTHID  : TP0ADM 
EDUID   : 33926                EDUNAME: db2redow (TP0) 0 
FUNCTION: DB2 UDB, buffer pool services, sqlb_verify_page, probe:3 
MESSAGE : ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad" 
DIA8400C A bad page was encountered. 
DATA #1 : String, 64 bytes 
Error encountered trying to read a page - information follows : 
DATA #2 : String, 23 bytes 
Page verification error 
DATA #3 : Page ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes 
23046981 
DATA #4 : Object descriptor, PD_TYPE_SQLB_OBJECT_DESC, 72 bytes 
Obj: {pool:9;obj:11076;type:1} Parent={8;11076} 

As you can see, error messages have object type:1, which indicates index page corruption. The index resides in tablespace ID 9, and the index ID is 11076. This index is on a table with tablespace ID 8, table ID 11076. You can retrieve base table name and index name by querying catalog tables.

The above snippet from db2diag.log indicates a bad page for index. Other common errors in db2diag.log that point to index corruption are SQLI_NOKEY and row not found from index function.

CBIT corruption

CBITs are a method used by DB2 to verify that a page being read into the buffer pool from disk is not a partial page or has not been changed from some form of corruption.

The basic idea behind CBITs is that a bit from each sector (512 bytes) on a page is set to the same value before writing the page. Before DB2 flushes a page to disk, the checksum is calculated and recorded on the page. When a page is read back in the buffer pool, this checksum is recalculated and checked against the stored value. If some of the bits are different, it indicates a partial page write or disk corruption.

Listing 3. Sample error message for CBIT corruption
2012-03-12-04.45.17.559235-240 I1104A2616 LEVEL: Severe
PID : 2551866 		       TID : 1 	  PROC : db2pfchr
INSTANCE: inst1   	       NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbVerifyCBITS, probe:1110
MESSAGE : ZRC=0x86020019=-2046689255=SQLB_CSUM "Bad Page, Checksum Error"
DIA8426C A invalid page checksum was found for page "".
DATA #1 : String, 64 bytes
Error encountered trying to read a page - information follows :
DATA #2 : String, 95 bytes
CBIT verification error
bitExpected is 0, userByte is 33, sector 7 (from head of page, 0 based)
DATA #3 : Page ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes

Note that changes made to the page outside of DB2 (file system failures, disk failures, etc.) will not be noticed if the corruption does not touch the bytes holding the CBITs. CBIT errors ( checksum errors ) are most of the time due to hardware or OS error, either during the initial write or during the read.

Log corruption

The transaction log in DB2 is simply a record of all changes that have taken place in the database. To keep track of changes made by transactions, a method is needed to timestamp changes to data as well as to timestamp log records. In DB2, this timestamp mechanism is performed using a Log Sequence Number (LSN). In case you come across a corrupt log, you may have an error message in db2diag.log similar to.

Listing 4. Sample error message for log corruption
2010-06-07-12.37.21.143998+120 I8673583A553       LEVEL: Severe 
PID     : 2498668              TID  : 27358       PROC : db2sysc 56 
INSTANCE: inst1              NODE : 056         DB   : SAMPLE 
APPHDL  : 998-22947            APPID: *N998.inst1.100607192315 
AUTHID  : LOADDSF 
EDUID   : 27358                EDUNAME: db2agntp (SAMPLE) 56 
FUNCTION: DB2 UDB, data protection services, sqlpgrlg, probe:291 
DATA #1 : < reformatted >
Error -2028994519 when reading LSN 00000B1C261B4FD3 from log file 
S0119292.LOG tellMe 1 dpsAcbFlags 1000 setSkipOutputBuf 0 
                
2010-06-07-12.37.21.144202+120 I8674137A487       LEVEL: Severe 
PID     : 2498668              TID  : 27358       PROC : db2sysc 56 
INSTANCE: inst1              NODE : 056         DB   : SAMPLE
APPHDL  : 998-22947            APPID: *N998.inst1.100607192315 
AUTHID  : LOADDSF 
EDUID   : 27358                EDUNAME: db2agntp (SAMPLE) 56 
FUNCTION: DB2 UDB, data protection services, sqlpgrlg, probe:291 
DATA #1 : < preformatted >
HeadLsn 00000B1B8B996EB3, copyLookForLsn 00000B1C261B4FD3                               
                
2010-06-07-12.37.21.158065+120 I8675153A549       LEVEL: Error 
PID     : 2498668              TID  : 27358       PROC : db2sysc 56 
INSTANCE: inst1              NODE : 056         DB   : SAMPLE
APPHDL  : 998-22947            APPID: *N998.inst1.100607192315 
AUTHID  : LOADDSF 
EDUID   : 27358                EDUNAME: db2agntp (SAMPLE) 56 
FUNCTION: DB2 UDB, data protection services, sqlptudo, probe:1010 
RETCODE : ZRC=0x87100029=-2028994519=SQLP_BADLSN "Invalid LSN value." 
DIA8538C An invalid log sequence number (LSN), the value was "".

If logs are corrupt, it can result into a severe problem during roll-forward of the database, crash recovery, HADR logs reply, etc., where you actually reply to logs. Database roll-forward enables you to maintain consistency in the database. It recovers a database by applying transactions recorded in the database log files. Roll-forward is a process invoked after a database or a tablespace backup image has been restored.

Packed descriptor corruption

A packed descriptor is a column within the system catalog tables that DB2 uses to identify the details of a database object. If it is corrupt for some reason, you see errors in db2diag.log as shown below.

Listing 5. Sample error message for packed descriptor corruption
2011-08-22-20.50.00.922275-300 I154161182E497      LEVEL: Severe 
PID     : 14152                TID  : 184633256288 PROC : db2sysc 0 
INSTANCE: inst1             NODE : 000          DB   : SAMPLE
APPHDL  : 0-64465              APPID: 161.166.44.48.27625.11082301341 
AUTHID  : DTADM1 
EDUID   : 606                  EDUNAME: db2agent (SAMPLE) 0 
FUNCTION: DB2 UDB, catcache support, sqlrlc_systables_fetch_from_disk, 
probe:200 
MESSAGE : Corrupt PD->length in table:DTWF    .JBPM_TASKINSTANCE 
                
2011-08-22-20.50.00.922476-300 I154161680E1588     LEVEL: Severe 
PID     : 14152                TID  : 184633256288 PROC : db2sysc 0 
INSTANCE: inst1             NODE : 000          DB   : SAMPLE
APPHDL  : 0-64465              APPID: 161.166.44.48.27625.11082301341 
AUTHID  : DTADM1 
EDUID   : 606                  EDUNAME: db2agent (SAMPLE) 0 
FUNCTION: DB2 UDB, catcache support, sqlrlc_systables_fetch_from_disk, 
probe:201 
MESSAGE : Length in PD=3792, LFD length=10104, DMS length=10104 
DATA #1 : String, 11 bytes 
Corrupt PD: 
DATA #2 : Dumped object of size 3792 bytes at offset 1386248, 53 bytes 
/volumes/work/db2dump/xnawmtp5/14152.606.000.dump.bin 
DATA #3 : LOB Descriptor, PD_TYPE_LOB_DESCRIPTOR, 60 bytes 
SQLDX_LD: Size:60 
    x0000        lfd_check                        0x49 
    x0001        lfd_version                      6 
    x0002        lfd_numsegs                      1 
    x0003        lfd_flags                        0x00 
    x0004        lfd_size                         10104 
    x000C        lfd_life_lsn                     0000009860A6FF2D 
    x0014        lfd_mini_numsegs                 0 
    x0015        lfd_first                        4 
    x0016        lfd_descsize                     60 
    x0018        lfd_last_pages                   16 
    x001C        lfd_last_bytes                   10104 
    x0038        lfd_dir                          Regular Directory 
Offsets 
        lfd_dir[0]: 33888 (16K) 
    Hexdump of LOB descriptor follows: 
        4906 0100 0000 0000 7827 0000 0000 0098 
        60A6 FF2D 0004 3C00 1000 0000 7827 0000 
        0000 0000 0000 0000 0000 0000 0000 0000 
        0000 0000 0000 0000 6084 0000

If you have corrupt PD, the table becomes inaccessible.


Using db2dart and INSPECT to identify corruption

db2dart is a command that can be used to verify the architectural correctness of databases and the objects within them. It can also be used to extract data from tables that might otherwise be inaccessible due to corruption.

db2dart should never be run against a database that still has active connections. db2dart accesses the data and metadata in a database by reading them directly from disk. So, if there are connections, db2dart will be unaware about pages in the buffer pool, control structures in memory, etc. and may report false errors as a result. Similarly, if you run db2dart against a database that requires crash recovery or that has not completed roll-forward recovery, you may get inconsistent results.

To display all possible options, execute the db2dart utility without any parameters. Some options of db2dart require user input, such as tablespace ID and table ID, which are prompted for if not explicitly specified in the command line.

Inspecting databases, table spaces, and tables using db2dart

The default behavior for db2dart is to inspect the entire database. Only the database name must be provided in this case. By default, db2dart will create a report file with the name databaseName.RPT. For single-partition database environments, the file is created in the current directory. For multiple-partition database environments, the file is created under a subdirectory in the diagnostic directory. The subdirectory is called DARTnnnn, where nnnn is the partition number.

If a database is large and you are only interested in one tablespace, you can use the /TS option. When using this, you must provide the tablespace ID on the command line (by specifying the /TSI parameter) or you can let db2dart prompt you for it. If you do not know the tablespace ID, you can obtain it via DB2 LIST TABLESPACES.

Similarly, a single table and its associated objects (LOBs, indices, etc.) can be inspected using the /T option. When using this, you must provide the table name or object ID and the ID of the tablespace in which the table resides. To determine the object ID and tablespace ID for a table, you can query the SYSIBM.SYSTABLES catalog table.

Dumping formatted table data using db2dart

If a tablespace or table becomes corrupt for any reason (for example, due to a bad disk or disk controller), attempts to access the table through SQL may not work. The SQL statement may fail with an error or the database may be marked bad and all connections will be dropped.

If this happens, it may be necessary to extract all the data possible so the tablespace and table can be rebuilt. In such a situation, the /DDEL option of db2dart can be used to extract the table data and place it into a delimited ASCII. Following sessions discuss these options in details.

INSPECT command

The INSPECT command is similar to the db2dart command. It allows you to check databases, tablespaces, and tables for architectural integrity by checking the pages of the database for page consistency. A significant difference between the two commands is that the database needs to be deactivated before you run db2dart, whereas INSPECT requires a database connection and can be run while there are simultaneous active connections to the database.

Unlike db2dart, the INSPECT command cannot be used to format and dump data pages, format and dump index pages, format data rows to delimited ASCII, and mark an index invalid. So, it can only inspect database or its objects online for any corruption.


Approaches to recover from corruption

Database corruption issues are sometimes not very straightforward and need IBM's support expertise to choose the best possible way out of the situation. Here, you will see how to tackle the most common types of corruption issues and look for the best possible recovery options. When corruption is detected, determining the problem source is usually of secondary importance. You would want to correct current situation first.

Recovering from data page corruption

As discussed, you need to identify corrupt pages using tools like db2dart. Looking at the db2dart report file generated, you can gauge the extent of corruption. Depending on the amount of data corrupt and complexity involved, you need to decide on the best possible recovery plan. Here are some options to recover:

  1. If there is a backup of the database, restore the database and roll forward to end of logs. This is the cleanest approach if feasible, preferably if database size is small.
  2. You could also restore the tablespace and roll forward to end of logs. This can be the best option if the corruption is localized.
  3. If you have other ways of recreating the data for the corrupt table or have a copy of the table data, drop and recreate the table. You would need the DDL of the table, and if you have the data for the table, you should be able to drop the table, recreate the table using the DDL, and recreate the data through whatever means you may have.
  4. If you do not have a valid backup image and any way to recreate the table, you may use db2dart with /ddel to salvage data from a corrupt table. Before that, you need to have the DDL of the table using db2look. You can fetch the DDL of a corrupt table using:
    db2look -d <dbname> -e
                            -z <schema_name> -t <table_name> -o <output_file_name>

    Here is an example of using db2dart with /ddel to salvage data from a corrupt table: db2dart <dbname> /ddel. This command requires four input values: a table object ID or table name, tablespace ID, page number to start with, and number of pages. The number of pages can be a specific value or a large enough value (for example, 999999999) to extract all pages in the table. Also, if a particular page in the table contains too much damage, the db2dart /DDEL may need to be executed more than once for the range of pages and omit the damaged page.

    The dumped delimited ASCII file is encoded in the database code page. The db2dart command does not perform code page conversions. Once you have salvages all the data from corrupt table, you can check the output file *.DEL to make sure all the data exists. Once that is done, you can drop the corrupt table and recreate it later with data extracted using db2dart. Note that db2dart /DDEL does not work against lob data.

  5. If you have a way to recreate the corrupt tablespace, you can mark the corrupt tablespace in drop pending state using restart database. You can recreate corrupt tablespace later.
  6. If none of the above options are feasible or give some error during extracting data, dropping the table while table recreate etc., approach IBM support for help. IBM support may help you to drop the corrupt table, initialize corrupt pages to NULL, etc., depending on the situation.

Recovering from index corruption

If there are indications of index corruption in db2diag.log and/or db2dart reports, you can mark an index invalid using db2dart and get rid of bad indices. You can recreate indices later.

db2dart has an option to mark an index invalid and make it drop pending. You can mark a corrupt index invalid with db2dart /MI. For example: db2dart <dbname> /MI /TSI 9 /OI 11076.

You can decide when to recreate an index by setting the INDEXREC parameter to restart, access, etc. To recreate indices when an application tries to access the index, you can update INDEXREC to access: db2 update db cfg for <dbname> INDEXREC ACCESS.

Once you have invalidated the bad index and updated INDEXREC, you can connect to the database.

If the problem is an index NOKEY error, then a number of db2dart commands are printed in the db2diag.log file. You can run these db2dart <dbname> /di commands to dump formatted index data for root-cause analysis of index corruption if required. You need to save these commands using grep on UNIX® or Find on Windows® and save them in a file. Edit the file and replace DBNAME with the database name. If the problem has been hit multiple times, there could be duplicate entries. You only need to keep the latest set of db2dart commands. To confirm the bad index using db2dart, you can issue db2dart <dbname> /t /tsi <tablespace_id> /oi <table_id> where tablespace_id and table_id are the tablespace ID and object ID of the base table on which the index is defined.

Recovering from CBIT errors

To fix CBIT corruption errors, examine the extent of the problem by running a db2dart on at least the corrupt table (better to run it against the whole database). You can decide on approaches discussed above depending on whether CBIT errors are on the data page or index. The most feasible option to recover from CBIT errors is to restore the database or tablespace (if errors are localized).

Recovering from log corruption

Log corruption is a matter of concern during log reply. You need to replay logs during database or tablespace roll-forward, log replay on HADR standby, and crash recovery. In the event of log corruption, the database may be fine and only the log may be damaged.

If you have errors due to bad logs during roll-forward, the very first thing that can be done is to check which log file DB2 is reporting error for. db2flsn can be used to return the name of the file that contains the log record identified by a specified log sequence number (LSN). So if you have 'bad_lsn' messages in db2diag.log, you can use db2flsn to find the corresponding log file.

If it is for a missing log file or a log file from an incorrect log chain, you can look for the correct log file. If roll-forward fails due to a corrupt log, you can go for point-in-time roll-forward. The specified point in time for the roll-forward operation must be equal to or later than minimum recovery time. The minimum recovery time (MRT) is the earliest point in time during a roll-forward when a database is consistent. If you cannot roll forward logs to at least MRT, you need to contact IBM support for assistance. Another option would be to restore from an offline database backup and not roll forward the logs. Transactions in logs would not be applied to the database in this case.

If you have issues in crash recovery due to a bad log file, you need to restore recent backup or contact IBM support for assistance.

In HADR terminology, the server that processes transactions is known as primary and the partner database that receives logs and replays them is referred to as standby. On HADR standby, during log reply, standby may crash due to a bad log. You can check db2diag.log on standby to figure out the bad log file and try shipping a good copy of that log file from primary. Once a good log file is in place, you can start HADR:

  1. Run start on HADR of standby node: db2 start hadr on db <dbname> as standby.
  2. Run start of HADRon primary node: db2 start hadr on db <dbname> as primary.

If the above attempt fails, you may need to reconfigure HADR with fresh backup from primary and restore it on standby. If you pass a copy of the bad log file to IBM DB2 support, they could examine its contents to see what was in it that might give some indication of what went wrong.

You can use db2cklog to check the validity of archive log files to determine whether the log files can be used during roll-forward recovery of a database or tablespace. Either a single archive log file or a range of archive log files can be checked. A log file that returns an error during validation by db2cklog will cause the recovery operation to fail.

Recovering from packed descriptor corruption

You can use the db2cat tool to fix the corrupt the packed descriptor. You need to consult IBM support before modifying a packed descriptor. Perform the following steps to modify the packed descriptor:

  1. export DB2SVCPW=<service_password from IBM support>
  2. db2cat -d <dbname> -s <schema> -n <tablename> -f <raw pd output file> -o <message file>
  3. db2cat -d <dbname> -s <schema> -n <tablename> -g <generated pd output file> -o <message file>
  4. db2cat -d <dbname> -s <schema> -n <tablename> -r <generated pd output file> -o <message file>

    (use the file generated with -g as the input for the replace option "-r").

  5. Export the data from the table (if required)
  6. Drop the table
  7. Recreate the table (if required)
  8. Import the user data (if required)

You can now run the verify option of db2cat on the database again: db2cat -d <dbname> -s % -n % -v .

If you do not need the tables, you can drop them. Otherwise, you should replace the packed descriptor, extract the data, then drop and recreate the table.


Preventive strategies to avoid possible corruption

Database corruption can be subtle and difficult to detect. So there could never be a tool developed that would be able to detect every single corruption that could possibly happen. Corruptions and procedures that risk the possibility of corruption should always be avoided.

It is essential to understand various preventive measures to avoid possible downtime and database crash. Here are some of the best practices to help to identify corruption in a database well before a system crash:

  1. Keep track of all changes.
  2. Be on the latest fixpack and, if possible, the latest version of DB2 and operating system (if applicable).
  3. Regular check on file system health, network issues.
  4. As much as possible, shut down DB2 gracefully.
  5. Run db2dart against your database when offline to check for corruption. If you don't have the luxury of downtime required to run db2dart on the production database, restore recent production backups on to test machines and run db2dart. Alternately, you can also use INSPECT when database is online. It can work as early detection or proactive health-checking for corruption.
  6. Have a good backup policy. Backup does not detect corruption in a page, so it is recommended to have a strong backup policy and enough backup generations.
  7. Disk configurations like RAID help minimize data corruption by using redundant disks to back up data.
  8. Good power backup to combat corruption due to power surges.
  9. Track latest defects in IBM DB2 and operating system.

Conclusion

The article discussed the most common corruption issues when using IBM DB2. It has demonstrated various symptom messages of corruption issues in db2diag.log, how to identify type of corruption based on these messages, and how to troubleshoot these issues. It also described the db2dart and INSPECT commands, which are helpful in dealing with corruption issues.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=830413
ArticleTitle=Diagnosing corruption when using IBM DB2
publish-date=08162012