IBM Support

PM69098: WMQ Z/OS V6 : WMQ HANGS AFTER THE OWNER OF A COUPLING FACILITY LOCK ABENDS IN CSQEUCAT

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The first report of this problem was that queue managers in a
    Queue Sharing Group ( QSG ) are hung and show GRS contention.
    
    DISPLAY GRS STATUS shows QP23MSTR owns an exclusive lock and ALL
    other QMGRs in the QSG are waiting.
    
    D GRS,C showed:
    S=SYSTEMS SYSZCSQE PeerLevelRecoveryQPP30000000C00000039
    SYSNAME    JOBNAME      ASID   TCBADDR     EXC/SHR     STATUS
    MFOS       QP23MSTR     008C   00960140    EXCLUSIVE    OWN
    MBOS       QP11MSTR     0095   00962140    EXCLUSIVE    WAIT
    MBOS       QPG2MSTR     010E   0096C7F0    EXCLUSIVE    WAIT
    MAOS       QPS3MSTR     0093   00973A48    EXCLUSIVE    WAIT
    MAOS       QP10MSTR     017E   0096B988    EXCLUSIVE    WAIT
    etc.
    
    MQ error messages logged:
    CSQ3201E +QP22 ABNORMAL EOT IN PROGRESS FOR USER=WSSRWDP
    CONNECTION-ID=RRSBATCH THREAD-XREF=
    CSQE007I +QP20 EEPLDISCFAILCONNECTION event received for
    structure MQRBSDJ02 connection name CSQEQPP3QP220C
    CSQE008I +QP20 Recovery event from QP22 received for structure
    MQRBSDJ02
    
    Dumps taken included:
    Dump Title: QP22,ABN=5C6-00C51027,U=SYSOPR  ,C=L8200.600.CFM
    -CSQERAD1,M=CSQGFRCV,LOC=CSQELPLM.CSQERAD1+0912
    
    Dump Title: ABEND=S026,REASON=08118001,CONNECTOR HANG:
    CONNAME=CSQEQPP3QP230D,JOBNAME=QP23MSTR
    
    SYSLOG also showed IXL041E and IXL049E received for the
    structure.
    
    LOGREC shows abend s13E occurring in XES code.
    *
    ADDITIONAL SYMPTOMS:
    Subsequent reports of this problem did not necessarily have the
    symptom of GRS contention.  Their symptoms included:
    - An MQ dump shows that MQ system threads and application
      threads were waiting for latch SSSCONN / DMCSEGAL (the
      csObjectLatch).  The latch owner was in a wait in CSQEUCAT
      for a lock in the Coupling Facility ( CF ).  The lock owner
      had abended after CSQEUCAT obtained the lock and before it
      updated it's flag to indicate the lock is held. Therefore,
      when the abend occurred, the lock was not released.
      LOGREC will have an abend entry, possibly for S13E or S222,
      with SUBFUNCTION: CFM  CSQEUCATCHG LHQC
    - The command processor ( thread RTSSRV01 ) was not working
      because it was waiting for the latch.
    - Application or channel threads may be hung in CSQE* modules,
      e.g. waiting in CSQEMPUT with the TCB waiting in IXLRQRSU .
      An attempt to restart the channel fails with
        CSQX514E channel is active.
    - CSQE020E CSQ1 Structure <strucid> connection as <connection>
      failed, RC=0000000C reason=02010C27 codes=00000002 00000008
      00000C27
    - CSQE007I CSQ1 EEPLDISCFAILCONNECTION event received
      for structure APPLICATION1 connection name CSQECSQSCSQ101
    - ABEND=S026 REASON=08118001
      ABENDS026 ABEND026 ABEND S026 026
    - ABEND878-10 due to a build-up of DXWB control blocks
      ABENDS878 ABEND878 ABEND S878 878
    .
    L2 Verification Steps:  See details in the internal Level 2
    forum
    

Local fix

  • Recycle the queue manager that owns the lock
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 6 *
    ****************************************************************
    * PROBLEM DESCRIPTION: Multiple queue managers in a qsg hang   *
    *                      while waiting for a lock on a list      *
    *                      header. Symptoms include:               *
    *                      - Queue Manager hangs during startup    *
    *                        performing peer level recovery        *
    *                      - Tasks accessing shared queues hang    *
    *                      - Abend 5C6-00C51027 in CSQERAD1        *
    *                      - Abend S026 due to CONNECTOR HANG      *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    A task opening a shared queue calls CSQEUCA1 to update the list
    header for that queue, and this calls IXLLSTC to lock the list
    header.
    After the lock is granted, but before CSQEUCA1 flags this, the
    task abends and CSQEUCA1's recovery routine is invoked, but
    because the successful granting of the lock was not yet
    recorded, no attempt is made to release the lock.
    Subsequently any task on any qmgr in the qsg attempting to get
    the lock hangs until the qmgr where the abend occurred is
    recycled.
    

Problem conclusion

  • CSQEUCA1 is changed to close the timing window where a lock has
    been granted but this has not been flagged for the recovery
    routine.
    
    Additionally, CSQGFRCV is changed to save the time of an abend
    in the FRE to aid in diagnosing similar problems in the future.
    000Y
    CSQEUCA1
    CSQGFRCV
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PM69098

  • Reported component name

    WMQ Z/OS V6

  • Reported component ID

    5655L8200

  • Reported release

    000

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2012-07-18

  • Closed date

    2012-08-30

  • Last modified date

    2013-04-03

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    PM69584 UK81431

Modules/Macros

  • CSQEUCA1 CSQGFRCV
    

Fix information

  • Fixed component name

    WMQ Z/OS V6

  • Fixed component ID

    5655L8200

Applicable component levels

  • R000 PSY UK81431

       UP12/10/10 P F210 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.0","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
03 April 2013