IBM Support

PM76986: CICS TASKS APPEAR TO HANG IN WEBSPHERE MQ AND/OR COMMAND SERVER SEEMS TO HANG

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • CLEAR QLOCAL is issued via the command-server against the
    SYSTEM.ADMIN.QMGR.EVENT queue.
    This resulted in an exclusive lock being obtained on the queue.
    The clear processing (CSQICCLR) has then detected that
    the non-persistent scavenger needs to process this queue, so it
    has suspended, waiting for the scavenger to complete. The
    non-persistent scavenger is busy processing another queue
    and is generating expiry report messages for it. However, a
    replytoq has been deleted, so the scavenger tries to put an
    event message to SYSTEM.ADMIN.QMGR.EVENT, and in doing so
    needs to get a lock (IX) on the queue.
    However, as the lock is not available, the scavenger suspends.
    There is now a deadly embrace between the command server
    and the scavenger.
    Transactions get caught-up in this as they are trying to put
    messages to the SYSTEM.ADMIN.QMGR.EVENT queue as well and hence
    they get suspended indefinitely.
    .
    Additional keywords/symptoms:
    2033 (MQRC_NO_MESSAGE_AVAILABLE) due to a timeout on an
    MQGET of the reply data from the command processor.
     For a CSQUTIL job:
      CSQU051E Command responses not received after <n> seconds
     For CSQOREXX, the error is:
      CSQO015E Command issued but no reply received
    
    SYSTEM.COMMAND.INPUT has a non-zero depth curdepth
    
    RTSSRV01 CSQISYNS SCAVNGxx SCAVNG
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 *
    *                 Release 0 Modification 1 and Release 1       *
    *                 Modification 0.                              *
    ****************************************************************
    * PROBLEM DESCRIPTION: Following a                             *
    *                      CLEAR QL(SYSTEM.ADMIN.QMGR.EVENT)       *
    *                      command, the following symptoms occur:  *
    *                      - Applications generating event         *
    *                        messages hang                         *
    *                      - Performance degradation on MQGET      *
    *                        calls which do not use a queue index  *
    *                      - Queue Manager commands are not        *
    *                        processed                             *
    *                                                              *
    *                      The queue manager must be canceled to   *
    *                      resolve the problem.                    *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    While processing a CLEAR QL or DELETE QL command on an event
    queue (for example SYSTEM.ADMIN.QMGR.EVENT), the command
    processor obtained an X lock on the queue name with the fpgsync
    qualifier. The command processor then detected that the queue
    was scheduled to be processed by the page scavenger, and so it
    was suspended until the scavenger had completed processing.
    At the same time the page scavenger was processing an expired
    message on another queue,  and attempted to put an expiry report
    message. This MQPUT failed, and caused the scavenger to generate
    an event message.
    While putting the event message to the event queue, the
    scavenger requested an IX lock on the queue name with the
    fpgsync qualifier. This was not available due to the lock held
    by the command processor, and so the scavenger suspended until
    the lock was released.
    This led to the command processor and scavenger tasks each
    waiting the other task in a deadlock situation.
    
    Any other applications attempting to put or get from the event
    queue will also hang due to the lock held by the command
    processor.
    
    While the scavenger is hung, empty pages will not be removed
    from page chains, leading to decreased performance for
    applications getting from queues and having to run the page
    chains.
    

Problem conclusion

  • CSQICCLR, CSQIDDEL and CSQIRSQD are changed to release the
    fpgsync qualified lock prior to synchronising with the
    scavenger, to prevent the deadlock situation from occurring.
    010Y
    100Y
    CSQICCLR
    CSQIDDEL
    CSQIRSQD
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PM76986

  • Reported component name

    WMQ Z/OS V7

  • Reported component ID

    5655R3600

  • Reported release

    010

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2012-11-13

  • Closed date

    2012-11-30

  • Last modified date

    2013-07-05

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UK83905 UK83906

Modules/Macros

  • CSQICCLR CSQIDDEL CSQIRSQD
    

Fix information

  • Fixed component name

    WMQ Z/OS V7

  • Fixed component ID

    5655R3600

Applicable component levels

  • R010 PSY UK83905

       UP13/01/16 P F301 Ž

  • R100 PSY UK83906

       UP13/01/16 P F301 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.0.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
05 July 2013