IBM Support

PK79011: POSSIBLE DATABASE CORRUPTION AFTER OLR DEADLOCK

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • With OLR performance improvement made in V11, single-call
    backout does not work correctly because database change
    logging is not done in chronological sequence. This is because
    of full-block logging and KSDS mass inserts. After an OLR
    deadlock, single-call backout is used prior to retrying the
    current unit of recovery, and this leads to database corruption.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: IMS V11 users who run HALDB Online           *
    *                 Reorganization (OLR).                        *
    ****************************************************************
    * PROBLEM DESCRIPTION: A HALDB partition can be corrupted if   *
    *                      an OLR for that partition encounters    *
    *                      a deadlock with applications that are   *
    *                      using the same partition.  Symptoms     *
    *                      of the database corruption include      *
    *                      incorrect segment data, ABENDU0832,     *
    *                      ABENDU0852, ABENDU0853, STATUSLB, and   *
    *                      others. These symptoms can occur        *
    *                      during the OLR or in any applications   *
    *                      using the partition.                    *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
    To minimize OLR's impact on other applications, OLR is chosen
    as the victim when it is involved in a deadlock with another
    application.  DFSORP30 handles the ABENDU0777 (deadlock) by
    backing out its work to the end of the previous completely
    copied database record (PHIDAM) or to the end of the previous
    completely copied RAP (PDHAM).  This approach allows all of the
    completely copied database records within the failing unit of
    reorganization to be kept in the output data sets and committed
    rather than being backed out to the last committed cursor.
    This improves OLR performance, especially when many deadlocks
    occur.
    
    Now consider the OLR's new full-block logging approach in which
    log records are not written at the time the segments are copied
    into the output data set buffers.  Instead, logging can be
    deferred until just before buffer handler purges a buffer that
    can contain many roots and their dependent segments.  The
    effect is that until a unit of reorganization is committed,
    not all applicable log records have been written.  Also, the
    log records themselves are not in the chronological sequence of
    OLR's copying process because each full-block log record can
    represent all of the segments copied to a single block or
    control interval.
    
    At the time of a deadlock, backing out to a previous record or
    RAP no longer works.  This is because full-block log records
    can contain segments from several database records or RAPs,
    thus not providing a clear record or RAP boundary to which to
    back out.  When the backout is attempted, either too few or too
    many segments are backed out, and the data is corrupted.
    
    Also consider that OLR now defers the insertion of primary
    index records and ILDS records until just before a unit of
    reorganization is committed.  (Prior to Version 11 these KSDS
    records were added individually when the database segments were
    copied.)  The new code has no coordination between this new
    mechanism and the backout to some intermediate point within the
    unit of reorganization.  This also causes inconsistencies in
    the data.
    

Problem conclusion

  • AIDS: RIDS/DBS RIDS/DBCALL DBS DBCALL
     GEN:
    KEYWORDS:
    
    *** END IMS KEYWORDS ***
    Because of the conflict between deadlock handling for OLR and
    both full-block logging and deferral of KSDS inserts, the
    deadlock handling code to commit completely copied database
    records or RAPs is removed.  For an OLR deadlock, uncommitted
    data will now be backed out completely to the beginning of the
    unit of reorganization.
    
    DFSORP30
    --------
    There are four places where this OLR module does a single-call
    backout when terminating with ABENDU0777.  (This is to back out
    any changes beyond the completely copied database records or
    RAPs, as noted above.)  This code is completely removed, thus
    resulting in a exit from DFSORP30 with return code 8 for any
    abend.  With this change, a full backout will be done by
    DFSORP20 to backout all changes after the previously committed
    cursor, that is, to the cursor at the beginning of the unit of
    reorganization.  Locks are released, and another application
    involved in the deadlock should be able to run.  For the
    deadlock case DFSORP20 restarts the unit of reorganization from
    the original committed cursor position.  Most likely the OLR
    will then wait for any locks it just released.
    
    DFSDLA00
    --------
    Similar code to that described above for DFSORP30 is bypassed
    in the case of OLR.
    
    DFSOLRW
    -------
    The flag OLRABTRM is removed because it was set only by the
    code removed from DFSORP30, above.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PK79011

  • Reported component name

    IMS V11

  • Reported component ID

    5635A0200

  • Reported release

    100

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2009-01-20

  • Closed date

    2009-03-04

  • Last modified date

    2009-10-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UK44548

Modules/Macros

  • DFSDLA00 DFSOLRW  DFSORP30
    

Fix information

  • Fixed component name

    IMS V11

  • Fixed component ID

    5635A0200

Applicable component levels

  • R100 PSY UK44548

       UP09/03/05 P F903 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
01 October 2009