IBM Support

PK20271: LOOP OR HANG WHEN PERFORMING DRF PITR OF 20 DEDB AREAS. POSSIBLE DATA INTEGRITY ISSUE IF NO LOOP OR HANG.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When performing recovery of 20 DEDB areas
    using DRF PITR, the processing CPU time exceeded 14 hours and
    the problem was judged to be a hang or loop and was canceled.
    This problem can occur for Full Function DBs. The problem occurs
    when processing lots of log data, causing data to be spilled to
    dataspaces after all buffers have been used/filled. This is a
    data integrity issue.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of IMS Database Recovery Facility  *
    *                 Version 2 Release 1 running recovery to any  *
    *                 prior point in time PITR.                    *
    ****************************************************************
    * PROBLEM DESCRIPTION: While running a PITR recovery the DRF   *
    *                      master address space runs into a loop   *
    *                      or hang condition.                      *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
    PITR results in a hang or endless loop.  This occurs when the
    amount of log data is a high enough percentage of private
    storage that it needs to be moved to data spaces.
    
    As part of the problem determination for the originally
    reported problem, it was disovered that under some high load
    circumstances, log data is lost as part of PITR processing.
    In addition, it was discovered that SDEP log records were not
    processed correctly and SDEP information was lost resulting in
    a data integrity problem.  The SDEP problem is when updates to
    the SDEPs cause it to wrap.
    
    As part of testing the fix, an ABENDS40D was encountered as a
    shortage of available private storage.
    

Problem conclusion

  •  AIDS: RIDS/UTIL RIDS/DBS DBS/UTIL
      DEP: NONE
      GEN:
    
    *** END IMS KEYWORDS ***
    The initial problem reported by the customer of a hang or
    endless loop during large recoveries is fixed in the following
    ways.  As part of testing the fix, multiple problems were
    encountered and fixed as documented below.
    
    The hang and endless loop ended up being several different
    problems.  They are fixed in the following ways.
    
    First, buffer contention caused a hang and is fixed by
    separating the buffers in to two pools.  One for log read and
    one for buffer send to the subordinate address spaces.  A loop
    is fixed by using awes from the awe pool instead of local
    module storage.
    
    The missing data is fixed in three major ways.  First, spill
    management is fixed to always return the data spilled, not
    extraneous data due to residual data in the token on a spill.
    If the token on a spill is non-zero, the spill manager
    interprets the request as a retrieve and the spilled data is
    lost because the caller does not expect to have data back from
    a spill request.  The other way buffers were lost was through
    awe's in local storage being enqueued and the thread not
    waiting.  When control returned from the module enqueuing the
    awe, the awe storage is reused.  The awe field is cleared at
    times resulting in the buffer being enqueued over being lost.
    The last is for fast path SDEP processing.  PK11200 added
    support for the LOG5957 record.  The support is not complete
    and did not use FRXRRR to copy the LOG5957 from the input
    buffer to the buffer sent to the subordinate address space.
    Also, processing the LOG5957 depended on a UOR token but the
    LOG5957 does not contain a UOR token.  The LOG5957 is sent to
    random subordinate address spaces and not applied to the area
    intended.
    
    The ABENDS40D is fixed by separating the buffer pool into two
    pools.  One for read and one for buffer send to subordinate
    address spaces.  It is also fixed by sending all log data
    through FRXQBUF0 and FRXLMRG0 to have FRXUORM0 process the data
    in order.  This way, the UOR related storage can be released
    when end of UOR notification is encountered.  The hash table is
    reduced from a maximum of five levels to one hash table.  This
    significantly reduces the private storage utilization for
    extremely large recoveries.
    
    The code is changed in the following parts:
    
    Parameter change for multiple buffer types
    FRXCAMG0, FRXCBDM0, FRXICLI0, FRXICTL0, FRXLMRG0, FRXMNP,
    FRXMSTR1, FRXPDIR0, FRXPDIS0, FRXPDSR0, FRXPDSS0, FRXPSDR0,
    FRXPSDS0, FRXQBUF0, FRXRBUF0, FRXRCTL0, FRXRDTH0, FRXUORM0,
    FRXHBUF0
    
    Dump formatter recompile
    FRXADF00, FRXADM10, FRXADM20
    
    FRXBDCB0  Add rvur as a fixed length control block for
              performance enhancement on get/release rvur storage
    FRXCON    define buffersend and bufferread
    FRXGFST   support added to release buffers from multiple pool
              types
    FRXHBUF0  Add support for multiple buffer pool types
    FRXLMRG0  fix intermittent hang on end of read when end of log
              read is not propogated to unit of recovery manager due
              to timing window.  Add diagnostic count on buffer
              release.
    FRXMINI0 and FRXMSTR0
              move upper limit of buffer percentage of private
              storage to half instead of three quarters to avoid
              storage shortage and ABENDS40D
    FRXMSTR0  Clear storage before reusing to avoid endless loop
    FRXPDSS0  Process no-op notification from FRXUORM0 at end of
              data
    FRXQBUF0  send end of log data notification to FRXUORM0 if no
              log data sets are to be read for recovery to avoid
              hang.
              Separate the OLR buffer logic from the non-OLR logic.
    FRXRBUF0  Add diagnostic buffer counts
    FRXRCTL0  Simplify the buffer freed process
    FRXRDTH0  Use awe from awe storage pool instead of local
              storage. Add diagnostic count for buffers.
    FRXRLRA0  Add logic to track uor token for 5950, 5937, and
              5938.
    FRXRRR    Add diagnostic count for each log record
    FRXRVCS   Add olr indicator flag
    FRXRVDL   Add data space free space diagnostic information
    FRXRVGB   Add diagnostic count fields for buffer and record
              counting
    FRXRVQB   Add support for separate olr code path in FRXQBUF0
    FRXRVUR   Add support for spill and add diagnostic fields for
              uor tracking
    FRXURHS   Reduce number of hash table levels to 1
    FRXUORM0  Add support to spill log data buffers on input if
              recovery running low on private storage. Fix lost
              buffer problem when spilling data.  Fix support
              for SDEPs (LOG5957) and copy the LOG5957 to the
              output buffers via FRXRRR calls instead of MVCL.
              Fix hang and endless loop on buffer pool storage
              contention by supporting multiple buffer pool types.
    FRXWSPL0  If request is to spill data, clear the remote token
              to avoid ABENDU0385 - RSN 0015 in FRXWSPL0
    FRXWSPM0  Add free space diagnostic field and tracking
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

  • ×**** PE07/09/26 FIX IN ERROR. SEE APAR PK52261  FOR DESCRIPTION
    

APAR Information

  • APAR number

    PK20271

  • Reported component name

    IMS DB RECOVERY

  • Reported component ID

    5655I4400

  • Reported release

    210

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2006-02-22

  • Closed date

    2006-09-19

  • Last modified date

    2007-10-24

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UK18149

Modules/Macros

  •    FRXADF00 FRXADM10 FRXADM20 FRXAWEX  FRXBDCB0
    FRXCAMG0 FRXCBDM0 FRXCON   FRXDCB   FRXDDRF  FRXGFST  FRXHBUF0
    FRXICLI0 FRXICTL0 FRXLMRG0 FRXMINI0 FRXMINI1 FRXMNP   FRXMSTR0
    FRXMSTR1 FRXPDIR0 FRXPDIS0 FRXPDSR0 FRXPDSS0 FRXPSDR0 FRXPSDS0
    FRXQBUF0 FRXRBUF0 FRXRCTL0 FRXRDTH0 FRXRLRA0 FRXRRR   FRXRVCS
    FRXRVDL  FRXRVGB  FRXRVQB  FRXRVUR  FRXUORM0 FRXURHS  FRXWSPL0
    FRXWSPM0
    

Fix information

  • Fixed component name

    IMS DB RECOVERY

  • Fixed component ID

    5655I4400

Applicable component levels

  • R210 PSY UK18149

       UP06/09/22 P F609 Ž

[{"Line of Business":{"code":null,"label":null},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCX88Z","label":"IMS Database Recovery Facility"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"210"}]

Document Information

Modified date:
09 November 2020