IBM Support

OA43875: ADDRESS SPACE HANG, TYPICALLY DB2. STATUS STOP SRBS PENDING, ASCBSRBS NON-ZERO DUE TO SRB WITH PMC SET.

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • An address space hangs.  Typically it is a DB2 region.  A dump
    of the hung address space shows that a request for Status Stop
    of SRBs is pending in the address space, but it is being
    prevented from completing due to a non-zero value in ASCBSRBS.
    The reason for this non-zero value is that the address space
    owns a pre-emptable SSRB that has requested Process Must
    Complete (PMC) protection.  The Status Stop function needs this
    SSRB to get dispatched and run to the point that it turns off
    Process Must Complete, which in turn would decrement the
    ASCBSRBS count.  However, the SSRB will never run because it is
    not on the system dispatching queue (the WUQ).
    
    The reason that the SSRB with PMC requested is not on the
    dispatching queue is due to a failure to properly handle the
    unit of work following a CML lock request.  The SRB with PMC set
    (SRBPMCS bit on) got suspended for a CML lock (a local lock for
    another address space).  When the lock was released, the
    dispatcher dequeued the SSRB's WEB from the local lock suspend
    queue and tried to give the CML lock to this SSRB, but the
    dispatcher backed out of this processing when it discovered that
    there was a Status Stop of SRBs going on in the SSRB's address
    space.  In backing out, it set up the SSRB to redrive the CML
    lock obtain request, but it did nothing with the WEB, expecting
    that the STATUS START SRBS function (which would reset the
    STATUS STOP SRBS) would find the WEB and queue it to the WUQ.
    Due to the deadlock described previously, the STATUS STOP SRBs
    never completed, so the STATUS START was never done, so the WEB
    never got queued to be dispatched.
    
    Verification Steps:
    1) Obtain an SVC dump of the hung address space.  Note that the
       dump may lack local storage for the address space due to the
       STATUS STOP of SRBs request that is pending.
    
    2) Issue: IPCS SUMM FORMAT ASID(X'yy') where yy is the ASID of
       the hung address space.  Verify that ASCBDSP1=18 or 98,
       indicating that a STATUS STOP of SRBs is pending.  Verify
       that ASCBSRBS is non-zero, indicating that there is at least
       one SRB in a state that will hold up completion of the STATUS
       STOP of SRBs request.
    
    3) In the same report, do:  FIND SAWQ  .  Get the address
       zzzzzzzz from this field and use it in the following command:
    
        IP RUNC ADDR(zzzzzzzz) LI(X'1C') EX((CBF X+18? STR(SRB)))
    
       This will format out all SRBs associated with the address
       space.
    
       Max to the bottom of the output.
    
       Enter: REPORT VIEW on the command line.  (Do *not* preface
       with "IPCS".) Once in Report View mode, enter the following
       on the command line:
    
          X ALL
          FIND HLHI ALL
    
       Now check the FLGS1 field or the FLGS field for each line.
       If the X'04' bit (SRBPMCS) is on in any of these lines, then
       this means that the SRB/SSRB is in Process Must Complete
       mode.  Note that other bits may be on.   A common content for
       this byte when the PMC bit is on is X'16'.  Get the addresses
       of the SSRBs that have PMC set.
    
    4) PF3 out of Report View.  This will bring you back to the IPCS
       SUMM FORMAT report.  Locate the SSRB(s) that have the PMC bit
       on.  Locate the CPSW field in the SSRB.  Do a WHERE on the
       address in the last word of the CPSW it to verify that it
       points to approximate offset +X'1200' into suspend lock
       manager module IEAVESLK (entry point CMLUOBT).
    
    5) Find the SSRBWEB pointer in each SSRB with PMC on, and format
       the WEB:
    
        IPCS CBF wwwwwwww STR(WEB)  where wwwwwwww is the WEB addr.
    
       Verify that the second byte of the WEBFLAG1 field at
       offset +4 is X'00'.  In particular, this means that the
       WEBCMLND bit is off, indicating that the SSRB is no longer
       suspended for a CML lock even though its CPSW points into
       the CML lock obtain entry point of the lock manager.
    
    If all these checks are met, then this APAR will address your
    problem.
    
    Also see APAR OA42611 for another address space hang problem
    with similar symptoms involving Status Stop of SRBs and Process
    Must Complete.
    

Local fix

  • None.
    Though we do have ++APARs for OA43875 if you are interested
    in a fixtest.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: Users running z/OS HBB7770 and above.        *
    ****************************************************************
    * PROBLEM DESCRIPTION: Address space hang with                 *
    *                      ASCBDSP1 = either X'1x' or X'9x'        *
    *                      (similar to OA42611).                   *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    When the Dispatcher is processing a CML lock request for a
    preemptable SRB which set STATUS MC,PROCESS, and there is a
    pending STATUS STOP,SRB request, Dispatcher fails to execute
    this preemptable SRB (so it can reset STATUS MC,PROCESS).
    Since the STATUS STOP was waiting for this SSRB to complete,
    it also will never finish - resulting in a deadlock and a
    hang of the address space where the STATUS STOP was issued.
    

Problem conclusion

  • Fix Dispatcher CML lock request processing to handle a SSRB
    with Process Must Complete to override the STATUS STOP
    condition (ASCBDSP1=x'1x' or x'9x')
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    OA43875

  • Reported component name

    RTM

  • Reported component ID

    5752SCRTM

  • Reported release

    780

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2013-11-13

  • Closed date

    2014-07-21

  • Last modified date

    2015-06-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UA74248 UA74250 UA74249

Modules/Macros

  • IEAVEDS0
    

Fix information

  • Fixed component name

    SUPERVISOR CONT

  • Fixed component ID

    5752SC1C5

Applicable component levels

  • R770 PSY UA74248

       UP14/07/31 P F407 Ž

  • R780 PSY UA74249

       UP14/07/31 P F407 Ž

  • R790 PSY UA74250

       UP14/07/31 P F407 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"780","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"780","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
02 June 2015