IBM Support

PK65285: MULTIPLE CQSS IN A SYSPLEX ABENDED U0100-04

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • This error involves multiple CQSs. The reported case
    was 4 CQSs in a Sysplex, but the error is not specific
    to that number.
    Problem:  All 4 CQSs in a 4-way sysplex abended U0100-04
    Environment: All CQSs run with an overflow structure defined
                 and DUPLEX(ENABLE)
      sequence of events:
      . All 4 CQSs are running
      . The primary structure hits the overflow threshold value.
      . The IXLALTER to expand the primary structure size fails
        (the structure has reached its maxSize)
      . The Overflow Threshold Process begins
      . The overflow structure is allocated/initialized
      . Since the overflow structure defined as DUPLEX(ENABLE), as
        soon as it's allocated and other CQSs connect to it, the
        DUPLEX process starts.
      . All connectors that connected to the overflow structure
        receive the Structure Temporarily Unavailable Event
      . All CQSs quiesces the structure
      . Meanwhile, the STE1 thread is receiving the OVERFLOW
        IXLUSYNC #3 to complete the Overflow Threshold phase2
      . At label OFTC1400 (in CQSSTE10), the GETLATCH call fails
        because the structure quiesce latch is being heldby the
        STE2 thread (The Structure Temporarily Unavailable Event)
      . Thus all CQSs abend at label OFTC1500 in CQSSTE10
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: IMS V9 CQS users of shared queues            *
    *                 overflow structure and duplexing.            *
    ****************************************************************
    * PROBLEM DESCRIPTION: When multiple CQSs go into overflow     *
    *                      mode and connect to an overflow         *
    *                      structure that is defined with          *
    *                      DUPLEX(ALLOWED) or DUPLEX(ENABLED)      *
    *                      and duplexing is being started          *
    *                      while overflow mode is being            *
    *                      established, CQS abends with            *
    *                      ABENDU0100-00000004.                    *
    *                                                              *
    *                      Multiple CQSs starting up at the        *
    *                      same time hang with MSGIXL040E          *
    *                      DUPLEXING CANNOT CONTINUE for the       *
    *                      overflow structure.                     *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
    Multiple CQSs exist in an IMSplex.  An overflow structure is
    defined to the CFRM policy as DUPLEX(ALLOWED) or
    DUPLEX(ENABLED) .  A CQS attempts to put a message on the
    shared queues and detects that the overflow threshold has
    been reached.  The CQS overflow master initiates overflow
    threshold phase 1 by issuing IXLUSYNC #1 to tell all the
    CQSs to quiesce the structure.  Once all of the CQSs respond,
    the overflow master selects queues for overflow and then
    issues IXLUSYNC #2, to tell all the CQSs (including the
    master) to connect to the overflow structure.  After all the
    CQSs connect to the overflow structure, the master starts
    moving queues to the overflow structure.  For a
    DUPLEX(ENABLED) overflow structure, the Structure Temporarily
    Unavailable event can come in at any time to notify
    CQS to quiesce the structure while duplexing is being
    established.  For a DUPLEX(ALLOWED) overflow structure,
    a setxcf start,rebuild,duplex command can come in at any
    time to establish duplexing.  The timing of the Structure
    Temporarily Unavailable event is such that overflow
    is between phases and has resumed the structure and CQS
    is able to quiesce the structure to establish duplexing.
    When the next overflow phase attempts to quiesce the
    structure and cannot, CQS abends with ABENDU0100-00000004
    in CQSSTE10 at offset X'418'.  This means overflow could
    not get the structure quiesce latch.
    
    Additional problem:
    Multiple CQSs start up at the same time.  Let's call the first
    CQS1 and the second CQS2.  CQS initialization attempts
    to connect to the overflow structure, to ensure that the
    overflow structure is defined correctly.  The overflow structure
    is defined as DUPLEX(ENABLED) .
    CQS1 locks the primary structure and proceeds with
    initialization.  The other CQSs wait for the primary structure
    lock.  Once CQS1 connects to the overflow structure, z/OS
    notifies CQS1 with the "structure temporarily unavailable"
    event, in order to quiesce the structure to establish
    duplexing.  CQS1 attempts to quiesce the structure before
    responding to the event, but waits because CQS1 initialization
    holds the structure quiesce latch.  After CQS1 determines
    that the primary and overflow structures don't need to be
    rebuilt, CQS1 releases the primary structure lock.
    This allows CQS2 to get the primary structure lock and
    proceed with its initialization.  CQS2 attempts to connect to
    the overflow structure, but it fails with
    IXLRSNCODECONNPREVENTED.  CQS2 waits for an ENF 35 event while
    holding the structure lock.  When CQS1 initialization tries
    to lock the primary structure again to read and write control
    list entries, CQS1 waits because CQS2 has the structure locked.
    CQS2 is waiting for a duplexing established ENF 35 event
    while holding the primary structure lock, while CQS1 is
    waiting to get the structure quiesce latch that CQS1
    initialization holds to establish duplexing, while CQS1
    initialization is waiting for the primary structure lock
    that CQS2 has.
    CQS1 and CQS2 are deadlocked and CQS initialization hangs.
    

Problem conclusion

  • AIDS: RIDS/SYS RIDS/CNTRL SYS CNTRL
     GEN:
    KEYWORDS:
    SYSPLEXSQ
    
    POSTREQ PM04096
    
    *** END IMS KEYWORDS ***
    CQSSTE20 is changed in the "structure temporarily unavailable"
    event logic.  If CQS is initializing or establishing overflow,
    skip getting the terminate and structure quiesce latches.
    This allows duplexing to be established as fast as possible
    and prevents the deadlock.  Duplexing is established fast
    because the overflow structure has just been allocated and
    is empty.
    z/OS doesn't require CQS to quiesce the structure for this
    system-managed duplex rebuild process, because the system will
    defer any incoming requests until the structure is available
    again.  IBM recommends that the structure be quiesced, because
    this minimizes system resources required to quiesce operations
    and improves system performance.  Since duplexing is
    established very quickly when no latches are gotten, it is
    likely to be established before CQS initialization or
    overflow threshold need to access the overflow structure.
    
    In the "structure available" event logic, skip releasing
    the structure quiesce and terminate latches, if they
    aren't held, and skip resuming the structure, if it
    wasn't quiesced.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

  • REPINNED RP10/02/16 (ATXT) TO ADD POSTREQ PM04096 INFO.
     **** PE10/02/16 PTF IN ERROR. SEE APAR PM04096 FOR DESCRIPTION
    ×**** PE10/02/16 FIX IN ERROR. SEE APAR PM04096  FOR DESCRIPTION
    

APAR Information

  • APAR number

    PK65285

  • Reported component name

    IMS V9

  • Reported component ID

    5655J3800

  • Reported release

    900

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2008-04-29

  • Closed date

    2008-11-07

  • Last modified date

    2010-03-18

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    PK73227 UK41442

Modules/Macros

  • CQSFM020 CQSSTE20 CQSSTRUC
    

Fix information

  • Fixed component name

    IMS V9

  • Fixed component ID

    5655J3800

Applicable component levels

  • R900 PSY UK41442

       UP08/11/13 P F811 Ž

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
18 March 2010