IBM Support

IT29434: An MQ classes for JMS application hangs when using the automaticclient reconnect function

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An MQ classes for JMS application using the automatic client
    reconnect function may hang during reconnect processing after a
    multi-instance queue manager is failed over to the standby
    instance.  Application threads will remain in a receive() call
    on a JMS MessageConsumer object and not return.  A Javacore
    (thread dump) of the application JVM will show that application
    and internal MQ classes for JMS threads become stuck in
    conditional wait states until the JVM is killed and restarted.
    
    Below shows come example threads with their associated Java call
    when this problem occurs:
    
    An application thread attempting to consume a message:
    
    "Application-Thread-1" J9VMThread:0x0000000031D6F600,
    j9thread_t:0x0000000042D9CE30,
    java/lang/Thread:0x000000000964B6D0, state:CW, prio=5
        at java/lang/Object.wait
        at com/ibm/mq/jmqi/remote/impl/RemoteSession.exchangeTSH
           (entered lock:
    com/ibm/mq/jmqi/remote/impl/RemoteSession$RemoteRequestEntry@0x0
    00000000970E028, entry count: 1)
        at
    com/ibm/mq/jmqi/remote/impl/RemoteProxyQueue.requestMessagesReco
    nnectable
        at
    com/ibm/mq/jmqi/remote/impl/RemoteProxyQueue.requestMessages
        at com/ibm/mq/jmqi/remote/impl/RemoteProxyQueue.flushQueue
        at com/ibm/mq/jmqi/remote/impl/RemoteProxyQueue.proxyMQGET
        at
    com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiGetInternalWithRecon
        at com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiGetInternal
        at com/ibm/mq/jmqi/internal/JmqiTools.getMessage
        at com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiGet
        at com/ibm/mq/ese/jmqi/InterceptedJmqiImpl.jmqiGet
        at com/ibm/mq/ese/jmqi/ESEJMQI.jmqiGet
        at com/ibm/msg/client/wmq/internal/WMQConsumerShadow.getMsg
           (entered lock: java/lang/Object@0x00000000093C19B8, entry
    count: 1)
        at
    com/ibm/msg/client/wmq/internal/WMQSyncConsumerShadow.receiveInt
    ernal
        at com/ibm/msg/client/wmq/internal/WMQConsumerShadow.receive
        at
    com/ibm/msg/client/wmq/internal/WMQMessageConsumer.receive
        at
    com/ibm/msg/client/jms/internal/JmsMessageConsumerImpl.receiveIn
    boundMessage
        at
    com/ibm/msg/client/jms/internal/JmsMessageConsumerImpl.receive
        at com/ibm/mq/jms/MQMessageConsumer.receive
    
    
    An MQ classes for JMS "Remote Receive Thread" which is
    responsible for reading data sent by the queue manager over a
    TCP/IP connection:
    
    "RcvThread:
    com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection@1641630639[qmid=
    QM1_2019-07-01_10.31.17,fap=13,channel=JMS.SVRCONN,ccsid=850,sha
    recnv=10,hbint=300,peer=xxx.xxx.xxx.MIQM01/xxx.xxx.xxx.MIQM01(14
    14),localport=50832,ssl=no]" J9VMThread:0x0000000031D99B00,
    j9thread_t:0x0000000040681550,
    java/lang/Thread:0x00000000092C3A40, state:CW, prio=5
    Waiting on:
    com/ibm/mq/jmqi/remote/api/RemoteHconn$ReconnectMutex@0x00000000
    0931B348 Owned by: <unowned>
          Java callstack:
              at java/lang/Object.wait
              at
    com/ibm/mq/jmqi/remote/api/RemoteHconn.checkForReconnect
                 (entered lock:
    com/ibm/mq/jmqi/remote/api/RemoteHconn$ReconnectMutex@0x00000000
    0931B348, entry count: 1)
              at com/ibm/mq/jmqi/remote/api/RemoteHconn.getSession
              at com/ibm/mq/jmqi/remote/api/RemoteHconn.getSession
              at
    com/ibm/mq/jmqi/remote/impl/RemoteProxyQueueManager.receiveNotif
    ication
              at com/ibm/mq/jmqi/remote/impl/RemoteRcvThread.run
              at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.runTas
    k
              at
    com/ibm/msg/client/commonservices/workqueue/SimpleWorkQueueItem.
    runItem
              at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.run
              at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueManager.run
    WorkQueueItem
              at
    com/ibm/msg/client/commonservices/j2se/workqueue/WorkQueueManage
    rImplementation$ThreadPoolWorker.run
    
    
    The MQ classes for JMS "Remote Reconnect Thread" which is
    responsible for creating connection and object handles for JMS
    resources used by the application:
    
    "JMSCCThreadPoolWorker-4" J9VMThread:0x0000000031C08C00,
    j9thread_t:0x0000000045AE7B48,
    java/lang/Thread:0x0000000028303C50, state:CW, prio=5
    Waiting on:
    com/ibm/mq/jmqi/remote/api/RemoteHconn$CallLock@0x000000000931B2
    A8 Owned by: <unowned>
          Java callstack:
              at java/lang/Object.wait
              at com/ibm/mq/jmqi/remote/util/ReentrantMutex.acquire
                 (entered lock:
    com/ibm/mq/jmqi/remote/api/RemoteHconn$CallLock@0x000000000931B2
    A8, entry count: 2)
              at com/ibm/mq/jmqi/remote/util/ReentrantMutex.acquire
                 (entered lock:
    com/ibm/mq/jmqi/remote/api/RemoteHconn$CallLock@0x000000000931B2
    A8, entry count: 1)
              at com/ibm/mq/jmqi/remote/api/RemoteHconn.enterCall
              at com/ibm/mq/jmqi/remote/api/RemoteHconn.enterCall
              at
    com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.reconnect
              at
    com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.run
              at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.runTas
    k
              at
    com/ibm/msg/client/commonservices/workqueue/SimpleWorkQueueItem.
    runItem
              at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.run
              at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueManager.run
    WorkQueueItem
              at
    com/ibm/msg/client/commonservices/j2se/workqueue/WorkQueueManage
    rImplementation$ThreadPoolWorker.run
    

Local fix

  • Use a sharing conversations value of 1 on the server-connection
    channel used by the MQ classes for JMS application.
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects MQ classes for JMS applications that use the
    automatic client reconnect function.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    This APAR addresses two related, but subtly different deadlocks
    that could have resulted from the same scenario depending upon
    undeterministic timing windows.
    
    In both scenarios, the MQ classes for JMS received two
    notification flows from a queue manager indicating a waiting get
    had failed.  This could be due to a multi-instance queue manager
    failing over to the standby instance, for example.  The
    notification flow contained the MQ reason code 2009.  An
    internal thread known as the "Remote Receive Thread" (named
    "RcvThread: ..." as shown in Javacores) received the
    notification flows sent by the queue manager over the channel
    instance.
    
    When the first notification flow was processed, the automatic
    client reconnection logic was invoked.  The connection object
    associated with the channel instance was not marked as
    disconnected however, because the TCP/IP socket was still valid.
     The Remote Receive Thread then began processing the second
    notification and became blocked, waiting for the reconnect
    processing to complete.
    
    For the threads to deadlock with the Java callstacks noted in
    the Problem Description section of this APAR, the application
    thread performing the waiting get needed to hold a lock on the
    connection handle (hConn) used to issue the message get API
    call.  It held the lock and was waiting for data to be made
    available by the Remote Receive Thread.  However this thread was
    blocked due to reconnect processing.
    
    The internal "Remote Reconnect Thread" (shown in the Problem
    Description as the "JMSCCThreadPoolWorker-4" thread) couldn't
    complete the reconnect processing because it was blocked waiting
    for the lock held by the application thread.
    
    Therefore, there was a three way deadlock between an application
    thread, a Remote Receive Thread and the Remote Reconnect Thread.
    
    
    A second deadlock could have occurred when the Remote Receive
    Thread is in the same state as described above but the Remote
    Reconnect Thread had the following callstack:
    
    Waiting on:
    com/ibm/mq/jmqi/remote/impl/RemoteSession$AsyncTshLock@0x0000000
    017A9A060 Owned by: <unowned>
          Java callstack:
              at java/lang/Object.wait
              at
    com/ibm/mq/jmqi/remote/impl/RemoteSession.receiveAsyncTsh
                 (entered lock:
    com/ibm/mq/jmqi/remote/impl/RemoteSession$AsyncTshLock@0x0000000
    017A9A060, entry count: 1)
              at
    com/ibm/mq/jmqi/remote/impl/RemoteSession.receiveTSH
              at
    com/ibm/mq/jmqi/remote/impl/RemoteSession.startConversation
              at
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.sessio
    nFromEligible
              at
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.getSes
    sionFromEligibleConnection
                 (entered lock:
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification$Connec
    tionsLock@0x000000000868CF50, entry count: 1)
              at
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.getSes
    sion
              at
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionPool.getSession
              at com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiConnect
              at
    com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.reconnect
              at
    com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.run
              at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.runTas
    k
              at
    com/ibm/msg/client/commonservices/workqueue/SimpleWorkQueueItem.
    runItem
              at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.run
              at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueManager.run
    WorkQueueItem
              at
    com/ibm/msg/client/commonservices/j2se/workqueue/WorkQueueManage
    rImplementation$ThreadPoolWorker.run
    
    In this scenario, the Remote Reconnect Thread attempted to
    reconnect a connection handle over an existing channel instance.
     It was waiting for a response to be made available by the
    Remote Receive Thread.  However the Remote Receive Thread was
    blocked waiting for the reconnection to complete.
    

Problem conclusion

  • Three changes have made to the MQ classes for JMS to address the
    deadlocks described by this APAR:
    
    1) On receipt of a notification from the queue manager
    containing a connection broken type reason code, mark the
    channel instance as broken even if no exception has been thrown
    when using the java.net.Socket object associated with the
    channel instance.
    
    2) Allow Remote Receive Threads to continue processing and
    reading data over an existing channel instance when automatic
    client reconnect is invoked.
    
    3) Version channel instances such that the Remote Reconnect
    Thread does not attempt to reuse existing channel instances
    created before its current reconnect cycle.  Instead, create a
    new channel instance and then attempt to use this to reconnect
    any broken connection handles (hConns).  This may result in a
    small increase in the number of channel instances as potentially
    still connected, and valid, channel instances that were created
    before the reconnect cycle will not be considered for
    multiplexing reconnected hConns.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v8.0       8.0.0.14
    v9.0 LTS   9.0.0.9
    v9.1 CD    9.1.4
    v9.1 LTS   9.1.0.4
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT29434

  • Reported component name

    IBM MQ BASE MP

  • Reported component ID

    5724H7251

  • Reported release

    800

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-06-12

  • Closed date

    2019-10-04

  • Last modified date

    2019-10-04

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM MQ BASE MP

  • Fixed component ID

    5724H7251

Applicable component levels

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0.0.0","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
04 October 2019