IBM Support

PM98694: WMQ Z/OS: ABENDS026 REASON=08118001 DUE TO ENQ FOR SYSZCSQE CSQESTOP_RESTARTING_QMGR_&_PEERS_RECOVERING_AT_SAME TIME

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Peer Level Recovery for a CF structure can hang when multiple
    queue managers are restarted at the same time.  They previously
    disconnected in an unclean fashion, causing
    EeplExistingConnection events to be triggered.  GRS ENQs are
    held for the CSQ_ADMIN structure and an application structure.
    
    Resulting symptoms include:
    
    - /D GRS,C,LATCHID shows a GRS ENQ wait for SYSZCSQE
      CSQESTOP_RESTARTING_QMGR_&_PEERS_RECOVERING_AT_SAME TIME
    
    - Title: ABEND=S026,REASON=08118001,CONNECTOR HANG: CONNAME
      =name,JOBNAME=ssid9MSTR
    
    - Title: ABN=5C6-00C510A4,U=SYSOPR  ,C=R3600.710.CFM -
      CSQERWI2,M=CSQGFRCV,LOC=CSQELPLM.CSQERWI2+00001C02
    
      00C510A4 was preceded by IXLRSNCODERSPNOTREC EQU X'00000C27'
      "All surviving connections have not responded via IXLEERSP
       for the requested connection."
    
    - Title: QUEUE MANAGER TERMINATION REQUESTED, REASON=00C510AB
    
    - CSQE007I CSQESTE EEPLEXISTINGCONNECTION event received for
      structure <structure> connection name <name>
    
    - CSQE007I CSQESTE EEPLDISCFAILCONNECTION event received for
      structure <structure> connection name <name>
    
    - CSQE021I CSQECONN Structure <structure> connection
      as <name> warning, RC=00000004 reason=02010407
      codes=00000000 00000000 00000000
    
      CSQE008I CSQESTE Recovery event from <queue manager>
      received for structure <structure>
    
      IXL040E CONNECTOR NAME: <name>, JOBNAME: ssidMSTR, ASID: nnnn
      HAS NOT RESPONDED AFTER CONNECTING DURING A USER SYNC POINT.
      USER SYNC POINT PROCESSING FOR STRUCTURE MQSPARTSP CANNOT
      CONTINUE...
    
      IXL049E HANG RESOLUTION ACTION FOR CONNECTOR NAME: <name>
      TO STRUCTURE <structure> , JOBNAME: ssidMSTR, ASID: nnnn ...
    
      IXL050I CONNECTOR NAME: <name> TO STRUCTURE <structure> ,
      JOBNAME: ssidMSTR, ASID: nnnn
      HAS NOT PROVIDED A REQUIRED RESPONSE AFTER 1020 SECONDS.
      TERMINATING CONNECTOR TASK TO RELIEVE THE HANG.
    
    
    Additional Symptom(s) Search Keyword(s):
    QSG queue sharing group CFSTRUCT
    PLR peer level recovery  timing deadlock
    ABEND 5C6 S5C6 S05C6 ABEND5C6 ABENDS5C6 00C510A4 00C510AB
    

Local fix

  • Stop and restart the queue manager that holds the
    "CSQESTOP_RESTARTING_QMGR_&_PEERS_RECOVERING_AT_SAME TIME"
    enqueue for the CSQ_ADMIN structure
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 *
    *                 Release 1 Modification 0.                    *
    ****************************************************************
    * PROBLEM DESCRIPTION: The QMGR hangs during peer level        *
    *                      recovery for an application structure   *
    *                      for an EEPLEXISTINGCONNECTION event.    *
    *                      Abend 026 is issued, followed by an     *
    *                      early termination of the queue manager  *
    *                      with reason code 00C510AB.              *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    The queue manager is processing a USYNC initiated for a
    EEPLEXISTINGCONNECTION event. It is checking whether the queue
    manager instance is the same as when the event occurred by
    looking at instance number stored for the application structure
    and admin structure. They match, indicating that the queue
    manager has not yet stopped.
    The queue manager processing the recovery is requesting an ENQ
    for the queue manager subject of the recovery, expecting it to
    be released when the queue manager terminates. As the queue
    manager is not terminating, the ENQ is never released and the
    hang condition occurs.
    When the CF time out limit for connections is reached, abend 026
    is issued in the connected TCB to resolve the hang.
    

Problem conclusion

  • The code was changed to get the ENQ conditionally. If the ENQ
    is obtained, the connection will be recovered on behalf of the
    queue manager owning the connection. If the ENQ is not obtained,
    the queue manager will have recovered its own connection, thus
    no further processing is required.
    100Y
    CSQESTE
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PM98694

  • Reported component name

    WMQ Z/OS V7

  • Reported component ID

    5655R3600

  • Reported release

    100

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2013-10-08

  • Closed date

    2013-11-20

  • Last modified date

    2014-01-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UI12722

Modules/Macros

  • CSQESTE
    

Fix information

  • Fixed component name

    WMQ Z/OS V7

  • Fixed component ID

    5655R3600

Applicable component levels

  • R100 PSY UI12722

       UP13/12/24 P F312 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
02 January 2014