Perhaps regarded as one of the most troublesome business issues, database corruption often takes place subtly and affects businesses adversely. In simple words, corruption can be defined as any unintentional entry in database. Corruption issues may result in the severe performance hit of the system. In some cases, it may result in frequent system crashes and may become a business-critical downtime situation. Database corruption can occur at any layer, from DB2 to the operating system and right down to the hardware layer. Hence, it is important to understand and troubleshoot it by involving all possible affected layers and collecting any diagnostic data that might be required as soon as possible, while the data is available.
In this article, you will understand why a database can go offline when it encounters corruption. You will also learn to analyze symptoms for corruption and differentiate between easy-to-fix and catastrophic failures. This article will shed light on corruption issues when using IBM DB2 and help DB2 users understand and choose the best approach in dealing with such critical and high-impact issues.
This article begins with discussion about possible sources of corruption. It then explains the following tasks:
- Identifying and troubleshooting corruption — Identify and categorize corruption in the database when using DB2, with the help of sample symptom messages that appear in db2diag.log. Corruption issues can be broadly classified in five categories: data page corruption (or table corruption), index corruption, CBIT corruption, log corruption, and packed descriptor corruption.
- Using
db2dartandINSPECTto identify corruption — Get insight into useful DB2 commands,db2dartandINSPECT, to check database corruption. - Approaches to recover from corruption — Once a corruption issue is identified, how to approach these cases, what data to collect, and how to recover from the situation is crucial. Learn possible recovery approaches and how to choose between available options.
- Preventive strategies to avoid possible corruption — Best practices are discussed.
Database corruption can occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Some of the common reasons of corruption:
- Corrupt file system is one the most common reasons for corruption in a database. Abrupt system shutdowns, power surges, file system double-mount, migrating disks, file system-level activities like checking and repairing file system (using utilities like fsck on Linux®) when the database is up and running, and using Ctrl+Alt+Delete while a file is open, viruses can all introduce unintentional changes in database.
- Hardware failure.
- Memory corruption.
- DB2 defect.
- I/O and network issues (issues in fiber adapter, switches, etc.).
- Incorrect application coding.
- Inconsistency in the value of the page in the buffer pool (sqldPage) and the one stored in the file system.
- Overwriting of disk data causes corruption.
- User interference with critical configuration files of database, log files, log control files, etc. can put database in inconsistent state
Having said that corruption can be due to various reasons, finding out exactly what caused data corruption is challenging. In most cases, it is caused by file system issues and hardware problems.
For a DBMS, page is the smallest unit of data for memory allocation performed by the operating system for a program and transfer between main memory and any other auxiliary store, such as a hard drive. So when you say something in a database is corrupt, you actually mean some pages in a database are corrupt.
Panic is a method DB2 uses to cause itself to crash if there is an error condition it cannot handle gracefully. When a page corruption is detected by DB2, it halts all processing by means of a controlled crash (panic), since it can't determine database integrity. This is also to prevent further damage or loss of data.
A number of error messages are dumped in db2diag.log when DB2 encounters corruption in database. When an outage occurs and automatic first occurrence data capture (FODC) is enabled, data is collected based on symptoms. FODC data will be collected automatically on DB2 9.5 when one of the following conditions is met:
- FOCD_Trap when an instance wide trap has occurred.
- FODC_Panic when a DB2 engine detects an incoherence and decides not to continue.
- FODC_BadPage when a bad page has been detected.
- FODC_DBMarkedBad when a database has been marked bad due to an error.
It is crucial to involve OS and hardware support
in order to gather required information like OS diagnostics (for example, errpt –a, snap, and fileplace outputs on AIX®), as well as any and all hardware
diagnostics (state saves, error logs, etc). It is important to ensure that there is
adequate disk space for critical file systems like dump space and log directories to
ensure that critical events are captured in entirety.
Looking at the db2diag.log, you can confirm if panic is due to corruption or another reason. Below, you will see how to identify and categorize corruption in DB2. Following are some of the most common db2diag.log error messages that identify corruption.
Data page corruption indicates corruption in actual data in a table. A number of messages are logged to the db2diag.log when data corruption is detected. These messages are necessary to identify the affected object. Listing 1 shows a sample error message in db2diag.log.
Listing 1. Sample error message for data page corruption
2012-02-04-03.13.05.261082-360 I3442A358 LEVEL: Error
PID : 393470 TID : 1 PROC : db2pfchr 0
INSTANCE: inst1 NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbReadAndReleaseBuffers,
probe:13
RETCODE : ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad"
DIA8400C A bad page was encountered.
.
.
.
2012-02-04-03.13.05.264301-360 I3801A437 LEVEL: Error
PID : 393470 TID : 1 PROC : db2pfchr 0
INSTANCE: inst1 NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbReadAndReleaseBuffers,
probe:13
DATA #1 : String, 158 bytes
Obj={pool:9;obj:20;type:0} State=x27 Parent={9;20}, EM=1120, PP0=1152
Page=51235 Cont=17 Offset=2848 BlkSize=15
sqlbReadAndReleaseBuffers error: num-pages=16
|
As the above error messages indicate, DB2 encountered a bad page for tablespace ID 9 and table ID 20. The object type filed is marked 0, which indicates data page corruption.
You can query the catalog table to determine which table has corrupt pages:
db2 "select char(tabname,20), char(tabschema,20) from
syscat.tables where tableid=20 and tbspaceid=9" |
Note that DB2 dumps bad page errors only for those pages
it has tried to access. It does not necessarily mean that only those pages are
corrupt. You need to explicitly check all pages in your database to find
the extent of corruption, with the help of the db2dart or INSPECT commands.
Similar error messages are dumped in db2diag.log for corruption in temporary tables. If the data type field in db2diag.log error message holds a value greater than 128, it indicates corruption in the temporary table. If the object type is 3, it indicates LOB data corruption in a table.
An index is a database object that contains an ordered set of pointers that refer to rows in a base table. There are many flavors of index-related corruption problems, including:
- Unique index duplicates with different RID(s) or the same RID(s)
- Multiple index entries pointing to the same RID
- Out-of-place index key (wrong index key order)
- Row exists, but index keys do not exist in any or some of the indices
- Index entry pointing to an empty data slot or unused data slot or RID is invalid
- Incorrect previous or next index page pointers, incorrect high key or other corruptions on index pages
Following are sample error message for index corruption.
Listing 2. Sample error message for index corruption
2012-01-30-01.35.50.952434+000 I29308542A2532 LEVEL: Severe
PID : 1175792 TID : 33926 PROC : db2sysc 0
INSTANCE: inst1 NODE : 000 DB : SAMPLE
APPHDL : 0-7 APPID: *LOCAL.inst1.120130013528
AUTHID : TP0ADM
EDUID : 33926 EDUNAME: db2redow (TP0) 0
FUNCTION: DB2 UDB, buffer pool services, sqlb_verify_page, probe:3
MESSAGE : ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad"
DIA8400C A bad page was encountered.
DATA #1 : String, 64 bytes
Error encountered trying to read a page - information follows :
DATA #2 : String, 23 bytes
Page verification error
DATA #3 : Page ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes
23046981
DATA #4 : Object descriptor, PD_TYPE_SQLB_OBJECT_DESC, 72 bytes
Obj: {pool:9;obj:11076;type:1} Parent={8;11076}
|
As you can see, error messages have object type:1, which indicates index page corruption. The index resides in tablespace ID 9, and the index ID is 11076. This index is on a table with tablespace ID 8, table ID 11076. You can retrieve base table name and index name by querying catalog tables.
The above snippet from db2diag.log indicates a bad page for index. Other common errors in db2diag.log that point to index corruption are SQLI_NOKEY and row not found from index function.
CBITs are a method used by DB2 to verify that a page being read into the buffer pool from disk is not a partial page or has not been changed from some form of corruption.
The basic idea behind CBITs is that a bit from each sector (512 bytes) on a page is set to the same value before writing the page. Before DB2 flushes a page to disk, the checksum is calculated and recorded on the page. When a page is read back in the buffer pool, this checksum is recalculated and checked against the stored value. If some of the bits are different, it indicates a partial page write or disk corruption.
Listing 3. Sample error message for CBIT corruption
2012-03-12-04.45.17.559235-240 I1104A2616 LEVEL: Severe
PID : 2551866 TID : 1 PROC : db2pfchr
INSTANCE: inst1 NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbVerifyCBITS, probe:1110
MESSAGE : ZRC=0x86020019=-2046689255=SQLB_CSUM "Bad Page, Checksum Error"
DIA8426C A invalid page checksum was found for page "".
DATA #1 : String, 64 bytes
Error encountered trying to read a page - information follows :
DATA #2 : String, 95 bytes
CBIT verification error
bitExpected is 0, userByte is 33, sector 7 (from head of page, 0 based)
DATA #3 : Page ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes
|
Note that changes made to the page outside of DB2 (file system failures, disk failures, etc.) will not be noticed if the corruption does not touch the bytes holding the CBITs. CBIT errors ( checksum errors ) are most of the time due to hardware or OS error, either during the initial write or during the read.
The transaction log in DB2 is simply a record of all changes that have taken place in the database. To keep track of changes made by transactions, a method is needed to timestamp changes to data as well as to timestamp log records. In DB2, this timestamp mechanism is performed using a Log Sequence Number (LSN). In case you come across a corrupt log, you may have an error message in db2diag.log similar to.
Listing 4. Sample error message for log corruption
2010-06-07-12.37.21.143998+120 I8673583A553 LEVEL: Severe
PID : 2498668 TID : 27358 PROC : db2sysc 56
INSTANCE: inst1 NODE : 056 DB : SAMPLE
APPHDL : 998-22947 APPID: *N998.inst1.100607192315
AUTHID : LOADDSF
EDUID : 27358 EDUNAME: db2agntp (SAMPLE) 56
FUNCTION: DB2 UDB, data protection services, sqlpgrlg, probe:291
DATA #1 : < reformatted >
Error -2028994519 when reading LSN 00000B1C261B4FD3 from log file
S0119292.LOG tellMe 1 dpsAcbFlags 1000 setSkipOutputBuf 0
2010-06-07-12.37.21.144202+120 I8674137A487 LEVEL: Severe
PID : 2498668 TID : 27358 PROC : db2sysc 56
INSTANCE: inst1 NODE : 056 DB : SAMPLE
APPHDL : 998-22947 APPID: *N998.inst1.100607192315
AUTHID : LOADDSF
EDUID : 27358 EDUNAME: db2agntp (SAMPLE) 56
FUNCTION: DB2 UDB, data protection services, sqlpgrlg, probe:291
DATA #1 : < preformatted >
HeadLsn 00000B1B8B996EB3, copyLookForLsn 00000B1C261B4FD3
2010-06-07-12.37.21.158065+120 I8675153A549 LEVEL: Error
PID : 2498668 TID : 27358 PROC : db2sysc 56
INSTANCE: inst1 NODE : 056 DB : SAMPLE
APPHDL : 998-22947 APPID: *N998.inst1.100607192315
AUTHID : LOADDSF
EDUID : 27358 EDUNAME: db2agntp (SAMPLE) 56
FUNCTION: DB2 UDB, data protection services, sqlptudo, probe:1010
RETCODE : ZRC=0x87100029=-2028994519=SQLP_BADLSN "Invalid LSN value."
DIA8538C An invalid log sequence number (LSN), the value was "".
|
If logs are corrupt, it can result into a severe problem during roll-forward of the database, crash recovery, HADR logs reply, etc., where you actually reply to logs. Database roll-forward enables you to maintain consistency in the database. It recovers a database by applying transactions recorded in the database log files. Roll-forward is a process invoked after a database or a tablespace backup image has been restored.
A packed descriptor is a column within the system catalog tables that DB2 uses to identify the details of a database object. If it is corrupt for some reason, you see errors in db2diag.log as shown below.
Listing 5. Sample error message for packed descriptor corruption
2011-08-22-20.50.00.922275-300 I154161182E497 LEVEL: Severe
PID : 14152 TID : 184633256288 PROC : db2sysc 0
INSTANCE: inst1 NODE : 000 DB : SAMPLE
APPHDL : 0-64465 APPID: 161.166.44.48.27625.11082301341
AUTHID : DTADM1
EDUID : 606 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, catcache support, sqlrlc_systables_fetch_from_disk,
probe:200
MESSAGE : Corrupt PD->length in table:DTWF .JBPM_TASKINSTANCE
2011-08-22-20.50.00.922476-300 I154161680E1588 LEVEL: Severe
PID : 14152 TID : 184633256288 PROC : db2sysc 0
INSTANCE: inst1 NODE : 000 DB : SAMPLE
APPHDL : 0-64465 APPID: 161.166.44.48.27625.11082301341
AUTHID : DTADM1
EDUID : 606 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, catcache support, sqlrlc_systables_fetch_from_disk,
probe:201
MESSAGE : Length in PD=3792, LFD length=10104, DMS length=10104
DATA #1 : String, 11 bytes
Corrupt PD:
DATA #2 : Dumped object of size 3792 bytes at offset 1386248, 53 bytes
/volumes/work/db2dump/xnawmtp5/14152.606.000.dump.bin
DATA #3 : LOB Descriptor, PD_TYPE_LOB_DESCRIPTOR, 60 bytes
SQLDX_LD: Size:60
x0000 lfd_check 0x49
x0001 lfd_version 6
x0002 lfd_numsegs 1
x0003 lfd_flags 0x00
x0004 lfd_size 10104
x000C lfd_life_lsn 0000009860A6FF2D
x0014 lfd_mini_numsegs 0
x0015 lfd_first 4
x0016 lfd_descsize 60
x0018 lfd_last_pages 16
x001C lfd_last_bytes 10104
x0038 lfd_dir Regular Directory
Offsets
lfd_dir[0]: 33888 (16K)
Hexdump of LOB descriptor follows:
4906 0100 0000 0000 7827 0000 0000 0098
60A6 FF2D 0004 3C00 1000 0000 7827 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 6084 0000
|
If you have corrupt PD, the table becomes inaccessible.
Using db2dart and INSPECT to identify corruption
db2dart is a command that can be used to verify the
architectural correctness of databases and the objects within them. It can also be used
to extract data from tables that might otherwise be inaccessible due to
corruption.
db2dart should never be run against a database that still
has active connections. db2dart accesses the data and
metadata in a database by reading them directly from disk. So, if there are
connections, db2dart will be unaware about pages in the
buffer pool, control structures in memory, etc. and may report false errors as a
result. Similarly, if you run db2dart against a database
that requires crash recovery or that has not completed roll-forward recovery, you
may get inconsistent results.
To display all possible options, execute the db2dart utility without any parameters. Some options of db2dart require user input, such as tablespace ID
and table ID, which are prompted for if not explicitly specified in the command
line.
Inspecting databases, table spaces, and tables using
db2dart
The default behavior for db2dart is to inspect the entire
database. Only the database name must be provided in this case. By default, db2dart will create a report file with the name
databaseName.RPT. For single-partition database environments, the file is created in
the current directory. For multiple-partition database environments, the file is
created under a subdirectory in the diagnostic directory. The subdirectory is called
DARTnnnn, where nnnn is the partition number.
If a database is large and you are only interested in one tablespace, you can
use the /TS option. When using this, you must provide the tablespace
ID on the command line (by specifying the /TSI parameter) or you can let db2dart prompt you for it. If you do not know the
tablespace ID, you can obtain it via DB2 LIST
TABLESPACES.
Similarly, a single table and its associated objects (LOBs, indices, etc.) can be
inspected using the /T option. When using this, you must provide the
table name or object ID and the ID of the tablespace in which the table resides. To
determine the object ID and tablespace ID for a table, you can query the
SYSIBM.SYSTABLES catalog table.
Dumping formatted table data using db2dart
If a tablespace or table becomes corrupt for any reason (for example, due to a bad disk or disk controller), attempts to access the table through SQL may not work. The SQL statement may fail with an error or the database may be marked bad and all connections will be dropped.
If this happens, it may be necessary to extract all the data possible so the
tablespace and table can be rebuilt. In such a situation, the /DDEL option of db2dart can be used to
extract the table data and place it into a delimited ASCII. Following sessions
discuss these options in details.
The INSPECT command is similar to the db2dart command. It allows you to check databases, tablespaces, and
tables for architectural integrity by checking the pages of the database for page
consistency. A significant difference between the two commands is that the database
needs to be deactivated before you run db2dart, whereas
INSPECT requires a database connection and can be run
while there are simultaneous active connections to the database.
Unlike db2dart, the INSPECT
command cannot be used to format and dump data pages, format and dump index pages,
format data rows to delimited ASCII, and mark an index invalid. So, it can only
inspect database or its objects online for any corruption.
Approaches to recover from corruption
Database corruption issues are sometimes not very straightforward and need IBM's support expertise to choose the best possible way out of the situation. Here, you will see how to tackle the most common types of corruption issues and look for the best possible recovery options. When corruption is detected, determining the problem source is usually of secondary importance. You would want to correct current situation first.
Recovering from data page corruption
As discussed, you need to identify corrupt pages using tools
like db2dart. Looking at the db2dart report file
generated, you can gauge the extent of corruption. Depending on the amount of data
corrupt and complexity involved, you need to decide on the best possible
recovery plan. Here are some options to recover:
- If there is a backup of the database, restore the database and roll forward to end of logs. This is the cleanest approach if feasible, preferably if database size is small.
- You could also restore the tablespace and roll forward to end of logs. This can be the best option if the corruption is localized.
- If you have other ways of recreating the data for the corrupt table or have a copy of the table data, drop and recreate the table. You would need the DDL of the table, and if you have the data for the table, you should be able to drop the table, recreate the table using the DDL, and recreate the data through whatever means you may have.
- If you do not have a valid backup image and any way to recreate the table, you may
use
db2dartwith/ddelto salvage data from a corrupt table. Before that, you need to have the DDL of the table usingdb2look. You can fetch the DDL of a corrupt table using:db2look -d <dbname> -e -z <schema_name> -t <table_name> -o <output_file_name>
Here is an example of using
db2dartwith/ddelto salvage data from a corrupt table:db2dart <dbname> /ddel. This command requires four input values: a table object ID or table name, tablespace ID, page number to start with, and number of pages. The number of pages can be a specific value or a large enough value (for example, 999999999) to extract all pages in the table. Also, if a particular page in the table contains too much damage, thedb2dart /DDELmay need to be executed more than once for the range of pages and omit the damaged page.The dumped delimited ASCII file is encoded in the database code page. The
db2dartcommand does not perform code page conversions. Once you have salvages all the data from corrupt table, you can check the output file *.DEL to make sure all the data exists. Once that is done, you can drop the corrupt table and recreate it later with data extracted usingdb2dart. Note thatdb2dart /DDELdoes not work against lob data. - If you have a way to recreate the corrupt tablespace, you can mark
the corrupt tablespace in drop pending state using
restart database. You can recreate corrupt tablespace later. - If none of the above options are feasible or give some error during extracting data, dropping the table while table recreate etc., approach IBM support for help. IBM support may help you to drop the corrupt table, initialize corrupt pages to NULL, etc., depending on the situation.
Recovering from index corruption
If there are indications of index corruption in db2diag.log and/or db2dart reports,
you can mark an index invalid using db2dart and get rid of
bad indices. You can recreate indices later.
db2dart has an option to mark an index invalid and
make it drop pending. You can mark a corrupt index invalid with db2dart /MI. For example: db2dart
<dbname> /MI /TSI 9 /OI 11076.
You can decide when to recreate an index by setting the INDEXREC parameter to restart,
access, etc. To recreate indices when an application tries to access
the index, you can update INDEXREC to access:
db2 update db cfg for <dbname> INDEXREC
ACCESS.
Once you have invalidated the bad index and updated INDEXREC, you can connect to the database.
If the problem is an index NOKEY error, then a number of db2dart commands are printed in the db2diag.log file. You can run these
db2dart <dbname> /di commands to dump formatted
index data for root-cause analysis of index corruption if required. You need to
save these commands using grep on
UNIX® or Find on Windows® and save them in
a file. Edit the file and replace DBNAME with the database name. If the problem has
been hit multiple times, there could be duplicate entries. You only need
to keep the latest set of db2dart commands. To
confirm the bad index using db2dart, you can
issue db2dart <dbname> /t /tsi <tablespace_id> /oi
<table_id> where tablespace_id
and table_id are the tablespace ID and object ID of the base
table on which the index is defined.
To fix CBIT corruption errors, examine the extent of the problem
by running a db2dart on at least the corrupt table
(better to run it against the whole database). You can decide on approaches
discussed above depending on whether CBIT errors are on the data page or index. The most
feasible option to recover from CBIT errors is to restore the database or tablespace
(if errors are localized).
Recovering from log corruption
Log corruption is a matter of concern during log reply. You need to replay logs during database or tablespace roll-forward, log replay on HADR standby, and crash recovery. In the event of log corruption, the database may be fine and only the log may be damaged.
If you have errors due to bad logs during roll-forward, the very first thing that can be
done is to check which log file DB2 is reporting error for. db2flsn can be used to return the name of the file that contains
the log record identified by a specified log sequence number (LSN). So if you have
'bad_lsn' messages in db2diag.log, you can use db2flsn to find the corresponding log file.
If it is for a missing log file or a log file from an incorrect log chain, you can look for the correct log file. If roll-forward fails due to a corrupt log, you can go for point-in-time roll-forward. The specified point in time for the roll-forward operation must be equal to or later than minimum recovery time. The minimum recovery time (MRT) is the earliest point in time during a roll-forward when a database is consistent. If you cannot roll forward logs to at least MRT, you need to contact IBM support for assistance. Another option would be to restore from an offline database backup and not roll forward the logs. Transactions in logs would not be applied to the database in this case.
If you have issues in crash recovery due to a bad log file, you need to restore recent backup or contact IBM support for assistance.
In HADR terminology, the server that processes transactions is known as primary and the partner database that receives logs and replays them is referred to as standby. On HADR standby, during log reply, standby may crash due to a bad log. You can check db2diag.log on standby to figure out the bad log file and try shipping a good copy of that log file from primary. Once a good log file is in place, you can start HADR:
- Run start on HADR of standby node:
db2 start hadr on db <dbname> as standby. - Run start of HADRon primary node:
db2 start hadr on db <dbname> as primary.
If the above attempt fails, you may need to reconfigure HADR with fresh backup from primary and restore it on standby. If you pass a copy of the bad log file to IBM DB2 support, they could examine its contents to see what was in it that might give some indication of what went wrong.
You can use db2cklog to check the validity of
archive log files to determine whether the log files can be used
during roll-forward recovery of a database or tablespace. Either a single archive
log file or a range of archive log files can be checked. A log file that returns an
error during validation by db2cklog will
cause the recovery operation to fail.
Recovering from packed descriptor corruption
You can use the db2cat tool to fix the corrupt the packed descriptor. You need to consult IBM support before modifying a packed descriptor. Perform the following steps to modify the packed descriptor:
export DB2SVCPW=<service_password from IBM support>db2cat -d <dbname> -s <schema> -n <tablename> -f <raw pd output file> -o <message file>db2cat -d <dbname> -s <schema> -n <tablename> -g <generated pd output file> -o <message file>db2cat -d <dbname> -s <schema> -n <tablename> -r <generated pd output file> -o <message file>(use the file generated with -g as the input for the replace option "-r").
- Export the data from the table (if required)
- Drop the table
- Recreate the table (if required)
- Import the user data (if required)
You can now run the verify option of db2cat on
the database again: db2cat -d <dbname> -s % -n %
-v .
If you do not need the tables, you can drop them. Otherwise, you should replace the packed descriptor, extract the data, then drop and recreate the table.
Preventive strategies to avoid possible corruption
Database corruption can be subtle and difficult to detect. So there could never be a tool developed that would be able to detect every single corruption that could possibly happen. Corruptions and procedures that risk the possibility of corruption should always be avoided.
It is essential to understand various preventive measures to avoid possible downtime and database crash. Here are some of the best practices to help to identify corruption in a database well before a system crash:
- Keep track of all changes.
- Be on the latest fixpack and, if possible, the latest version of DB2 and operating system (if applicable).
- Regular check on file system health, network issues.
- As much as possible, shut down DB2 gracefully.
- Run
db2dartagainst your database when offline to check for corruption. If you don't have the luxury of downtime required to rundb2darton the production database, restore recent production backups on to test machines and rundb2dart. Alternately, you can also useINSPECTwhen database is online. It can work as early detection or proactive health-checking for corruption. - Have a good backup policy. Backup does not detect corruption in a page, so it is recommended to have a strong backup policy and enough backup generations.
- Disk configurations like RAID help minimize data corruption by using redundant disks to back up data.
- Good power backup to combat corruption due to power surges.
- Track latest defects in IBM DB2 and operating system.
The article discussed the most common corruption issues when using IBM DB2. It
has demonstrated various symptom messages of corruption issues in db2diag.log, how to
identify type of corruption based on these messages, and how to troubleshoot these issues.
It also described the db2dart and INSPECT commands, which are helpful in dealing with corruption
issues.
Learn
- Learn more about DB2 in the DB2
for Linux, UNIX, and Windows page on developerWorks You'll find technical
documentation, how-to articles, education, downloads, product information, and more.
- Learn more about Information Management at the developerWorks Information Management
zone. Find technical documentation,
how-to articles, education, downloads, product information, and
more.
- Stay current with
developerWorks technical events and webcasts.
- Follow developerWorks on
Twitter.
Get products and technologies
- Build your next
development project with
IBM trial software,
available for download directly from developerWorks.
- Now you can use
DB2 for free. Download DB2 Express-C, a no-charge
version of DB2 Express Edition for the community that offers the same core
data features as DB2 Express Edition and provides a solid base to build
and deploy applications.
Discuss
- Participate in the discussion forum.
- Check out the
developerWorks
blogs and get involved in the
developerWorks community.





