Topic
  • No replies
SystemAdmin
SystemAdmin
17917 Posts

Pinned topic Interesting issues with incremental backups...

‏2013-02-20T15:34:52Z |
This is on 9.5.10 WSE 32-bit on Windows 2003 Enterprise and 9.7.6 WSE 64-bit on Windows 2008 R2.

Scenario:
We have been testing in preparation to move DBs for several applications from 9.5.10 WSE 32-bit servers to 9.7.6 WSE 64-bit servers. This requires having offline, uncompressed backups of the source databases. Since my production backups are online, compressed, and incremental, and I did not want to wait for maintenance period to take offline backups of them, I used an intermediary 9.5.10 WSE 32-bit server to do incremental restores of the source backups and create full offline backups which I then restored on the destination 9.7.6 WSE 64-bit server.

On the surface, everything worked beautifully - databases restored fine, got upgraded by DB2 fine, applications worked with them correctly.

…and then I tried to back up one of the databases, and I got:

SQL1655C The operation could not be completed due to an error accessing data on disk. SQLSTATE=58030

Accompanied by:

ADM14001C An unexpected and critical error has occurred: "BadPage". The
instance may have been shutdown as a result. "Automatic" FODC (First Occurrence
Data Capture) has been invoked and diagnostic information has been recorded in
directory
"C:\ProgramData\IBM\DB2\DB2COPY1\DB2\FODC_BadPage_2013-02-12-10.26.09.186000_00
00\". Please look in this directory for detailed evidence about what happened
and contact IBM support if necessary to diagnose the problem.

And

ADM6006E DB2 encountered an error while reading page "11679" from table space
"4" for object "32" (located at offset "11679" of container "D:\xxx\xxx\8ff87f5f7a3d4220829555257e0897e4\SQL00032.DAT").

That made me a little (just a bit) apprehensive. I started verifying other databases and the steps I took to restore them and established this:

1)Databases which were NOT spliced from incremental backups (i.e. they came from original full backups only) were fine.
2)That made me think that I bungled the incremental restores on the intermediary server, so I tried an automatic incremental restore, with the same result.
3)Neither db2dart nor inspect have been showing any errors on the affected databases, db2ckbkp, however, displayed something like this on every single one of the “spliced” backups:

db2ckbkp e:\DB2Backups\xxx.0.DB2.NODE0000.CATN0000.20130121113409.001
[1] Buffers processed: ####################ERROR - Tablespace page size inconsistent for tablespace ID 4.
#############ERROR - Tablespace page size inconsistent for tablespace ID 4.
ERROR - Tablespace page size inconsistent for tablespace ID 4.
ERROR - Tablespace page size inconsistent for tablespace ID 4.
ERROR - Tablespace page size inconsistent for tablespace ID 4.
ERROR - Tablespace page size inconsistent for tablespace ID 4.
ERROR - Tablespace page size inconsistent for tablespace ID 4.
#

Image Verification Complete - ERRORS DETECTED: 14

4)After some more tests, I figured out the solution. If I did a reorg of the “spliced” database on the intermediary server, the db2ckbkp did not have a problem with the backup of that database taken after the reorg. Then I tried doing a reorg of one of the affected databases on the destination 9.7.6 server and then taking a backup of it. This time, everything worked fine…

In retrospect, I should have attempted a reorg to begin with. In my defense, the errors seemed to indicate some physical problem with the disk or data, and I thought that if a backup fails, the reorg will most certainly fail as well.

The moral:
The procedure for restoring incremental backups (http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.ha.doc/doc/t0006070.html) does NOT suggest running a reorg afterwards; however, it may be prudent to do so.