IBM Support

PK84761: DRF JOB HANGS WITH ERROR(STOP)

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • From the dump, we see that both master and subordinate address
    spaces are waiting.  The Master address is getting 'error stop'
    notification and requesting and waiting for subordinate address
    space to shut down, the subordinate address space was waiting
    for a response on the permission request to use the tape device
    (device request) from the master address space (this was where
    the hang occurred). Since master was shutting
    down and the response was not sent, the subordinate was
    waiting there forever.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All IMS DRF V3R1 users with TAPECHK(Y) and   *
    *                 ERROR(STOP) might be affected.               *
    ****************************************************************
    * PROBLEM DESCRIPTION: DRF hangs with ERROR(STOP) when error   *
    *                      is detected.                            *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
    When TAPECHK(Y) was specified with ERROR(STOP) to run recovery,
    we had hang instead of early end if there is any IC allocation
    failure.
    

Problem conclusion

  • The problem occurred due to a coordination problem between
    the master address space and subordinate address spaces when
    recovery is run with the ERROR(STOP) option specified.  When
    a subordinate address space experiences a failure resulting in
    early end of recovery and image copies, all subordinate address
    spaces are supposed to be notified and shutdown.  In this case,
    one subordinate address space had sent a request to read an
    image copy from a tape device when early end processing was
    initiated. The thread asking permission is in a wait state at
    this point until permission is granted or denied. Then the
    early end notification was sent to the subordinate address
    space. The early end process did not detect the wait for a
    tape device state and enqueued an early end notification to the
    already waiting thread. The early end notification could not be
    processed without the tape device wait being posted first. Since
    early end processing happened, no tape device permission will
    be granted. This results in a hang.
    
    The problem is fixed by including code in the early end
    processing in the subordinate address space that detects and
    posts a thread waiting for a tape device.  If early end
    processing is in progress when the thread wakes up, a nonzero
    return code is set causing the subordinate address space to
    terminate the current recovery process.  This allows early end
    processing to continue.
    
    FRXIRTH0, FRXICTL0, FRXIDYN0 have been changed to enhance the
    communication between master and subordinate.
    
    The status of the subordinate address spaces is changed in this
    scenario in the recovery report. The final status in SUMMARY
    Report has been changed in this situation from general message
    'Recovery processing error' to more specific 'Sub address
    space failure'.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PK84761

  • Reported component name

    IMS DB RECOVERY

  • Reported component ID

    5655I4400

  • Reported release

    310

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2009-04-15

  • Closed date

    2009-06-19

  • Last modified date

    2009-07-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UK47599

Modules/Macros

  • FRXCON   FRXICTL0 FRXIDYN0 FRXIRTH0 FRXMSTR1
    FRXPSDR0 FRXRVGB
    

Fix information

  • Fixed component name

    IMS DB RECOVERY

  • Fixed component ID

    5655I4400

Applicable component levels

  • R310 PSY UK47599

       UP09/06/23 P F906

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCX88Z","label":"IMS Database Recovery Facility"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.1.0","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
01 July 2009