A fix is available
APAR status
Closed as program error.
Error description
With OLR performance improvement made in V11, single-call backout does not work correctly because database change logging is not done in chronological sequence. This is because of full-block logging and KSDS mass inserts. After an OLR deadlock, single-call backout is used prior to retrying the current unit of recovery, and this leads to database corruption.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: IMS V11 users who run HALDB Online * * Reorganization (OLR). * **************************************************************** * PROBLEM DESCRIPTION: A HALDB partition can be corrupted if * * an OLR for that partition encounters * * a deadlock with applications that are * * using the same partition. Symptoms * * of the database corruption include * * incorrect segment data, ABENDU0832, * * ABENDU0852, ABENDU0853, STATUSLB, and * * others. These symptoms can occur * * during the OLR or in any applications * * using the partition. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** To minimize OLR's impact on other applications, OLR is chosen as the victim when it is involved in a deadlock with another application. DFSORP30 handles the ABENDU0777 (deadlock) by backing out its work to the end of the previous completely copied database record (PHIDAM) or to the end of the previous completely copied RAP (PDHAM). This approach allows all of the completely copied database records within the failing unit of reorganization to be kept in the output data sets and committed rather than being backed out to the last committed cursor. This improves OLR performance, especially when many deadlocks occur. Now consider the OLR's new full-block logging approach in which log records are not written at the time the segments are copied into the output data set buffers. Instead, logging can be deferred until just before buffer handler purges a buffer that can contain many roots and their dependent segments. The effect is that until a unit of reorganization is committed, not all applicable log records have been written. Also, the log records themselves are not in the chronological sequence of OLR's copying process because each full-block log record can represent all of the segments copied to a single block or control interval. At the time of a deadlock, backing out to a previous record or RAP no longer works. This is because full-block log records can contain segments from several database records or RAPs, thus not providing a clear record or RAP boundary to which to back out. When the backout is attempted, either too few or too many segments are backed out, and the data is corrupted. Also consider that OLR now defers the insertion of primary index records and ILDS records until just before a unit of reorganization is committed. (Prior to Version 11 these KSDS records were added individually when the database segments were copied.) The new code has no coordination between this new mechanism and the backout to some intermediate point within the unit of reorganization. This also causes inconsistencies in the data.
Problem conclusion
AIDS: RIDS/DBS RIDS/DBCALL DBS DBCALL GEN: KEYWORDS: *** END IMS KEYWORDS *** Because of the conflict between deadlock handling for OLR and both full-block logging and deferral of KSDS inserts, the deadlock handling code to commit completely copied database records or RAPs is removed. For an OLR deadlock, uncommitted data will now be backed out completely to the beginning of the unit of reorganization. DFSORP30 -------- There are four places where this OLR module does a single-call backout when terminating with ABENDU0777. (This is to back out any changes beyond the completely copied database records or RAPs, as noted above.) This code is completely removed, thus resulting in a exit from DFSORP30 with return code 8 for any abend. With this change, a full backout will be done by DFSORP20 to backout all changes after the previously committed cursor, that is, to the cursor at the beginning of the unit of reorganization. Locks are released, and another application involved in the deadlock should be able to run. For the deadlock case DFSORP20 restarts the unit of reorganization from the original committed cursor position. Most likely the OLR will then wait for any locks it just released. DFSDLA00 -------- Similar code to that described above for DFSORP30 is bypassed in the case of OLR. DFSOLRW ------- The flag OLRABTRM is removed because it was set only by the code removed from DFSORP30, above.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PK79011
Reported component name
IMS V11
Reported component ID
5635A0200
Reported release
100
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2009-01-20
Closed date
2009-03-04
Last modified date
2009-10-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK44548
Modules/Macros
DFSDLA00 DFSOLRW DFSORP30
Fix information
Fixed component name
IMS V11
Fixed component ID
5635A0200
Applicable component levels
R100 PSY UK44548
UP09/03/05 P F903
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
01 October 2009