IBM Support

IT29729: An MQ classes for JMS application may hang when using automatic client reconnect after network interruptions

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An MQ classes for JMS application using the automatic client
    reconnect function may hang if there are frequent network
    interruptions or packet loss between the client and queue
    manager systems.  When this occurs, messages are not delivered
    to the application and the depth of the MQ queue increases.  A
    Javacore (thread dump) of the application JVM will show that
    application and internal MQ classes for JMS threads are stuck
    with the following callstacks until the JVM is killed and
    restarted:
    
    Java callstack of an application thread attempting to consume a
    message:
    
    "Application-Thread-1" prio=5 os_prio=0 tid=0x0000000019051000
    nid=0x1e8c in Object.wait()
       java.lang.Thread.State: WAITING (on object monitor)
            at java.lang.Object.wait
            at
    com.ibm.mq.jmqi.remote.api.RemoteHconn.checkForReconnect
            - locked <0x00000000c8fc53b0> (a
    com.ibm.mq.jmqi.remote.api.RemoteHconn$ReconnectMutex)
            at
    com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.requestMutex
            at
    com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.requestMessagesReco
    nnectable
            at
    com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.requestMessages
            at
    com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.flushQueue
            at
    com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.proxyMQGET
            at
    com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiGetInternalWithRecon
            at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiGetInternal
            at com.ibm.mq.jmqi.internal.JmqiTools.getMessage
            at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiGet
            at com.ibm.mq.ese.jmqi.InterceptedJmqiImpl.jmqiGet
            at com.ibm.mq.ese.jmqi.ESEJMQI.jmqiGet
            at
    com.ibm.msg.client.wmq.internal.WMQConsumerShadow.getMsg
            - locked <0x00000000c8fc5430> (a java.lang.Object)
            at
    com.ibm.msg.client.wmq.internal.WMQSyncConsumerShadow.receiveInt
    ernal
            at
    com.ibm.msg.client.wmq.internal.WMQConsumerShadow.receive
            at
    com.ibm.msg.client.wmq.internal.WMQMessageConsumer.receive
            at
    com.ibm.msg.client.jms.internal.JmsMessageConsumerImpl.receiveIn
    boundMessage
            at
    com.ibm.msg.client.jms.internal.JmsMessageConsumerImpl.receive
            at com.ibm.mq.jms.MQMessageConsumer.receive
    
    
    An MQ classes for JMS "Remote Receive Thread" which is
    responsible for reading data sent by the queue manager over a
    TCP/IP connection:
    
    "RcvThread:
    com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection@1303583763[qmid=
    QM1_2019-07-10_15.32.22,fap=13,channel=JMS.SVRCONN,ccsid=819,sha
    recnv=10,hbint=300,peer=localhost/127.0.0.1(1414),localport=5025
    1,ssl=no]" #334 daemon prio=5 os_prio=0 tid=0x0000000015b93000
    nid=0x1ef0 in Object.wait()
    java.lang.Thread.State: WAITING (on object monitor)
         at java.lang.Object.wait
         at com.ibm.mq.jmqi.remote.api.RemoteHconn.checkForReconnect
         - locked <0x00000000c8f6e618> (a
    com.ibm.mq.jmqi.remote.api.RemoteHconn$ReconnectMutex)
         at com.ibm.mq.jmqi.remote.api.RemoteHconn.getSession
         at com.ibm.mq.jmqi.remote.api.RemoteHconn.getSession
         at com.ibm.mq.jmqi.remote.api.RemoteFAP.spiOpen
         at com.ibm.mq.jmqi.remote.api.RemoteFAP.spiOpen
         at com.ibm.mq.jmqi.remote.api.RemoteHconn.dummyJmqiCall
         at
    com.ibm.mq.jmqi.remote.api.RemoteHconn.eligibleForReconnect
         at com.ibm.mq.jmqi.remote.api.RemoteHconn.deliverException
         at
    com.ibm.mq.jmqi.remote.impl.RemoteSession.deliverException
         at
    com.ibm.mq.jmqi.remote.impl.RemoteConnection.asyncConnectionBrok
    en
         - locked <0x00000000c8f293e8> (a
    com.ibm.mq.jmqi.remote.impl.RemoteConnection$SessionsMutex)
         at com.ibm.mq.jmqi.remote.impl.RemoteRcvThread.run
         at
    com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTas
    k
         at
    com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.
    runItem
         at
    com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run
         at
    com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.run
    WorkQueueItem
         at
    com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManage
    rImplementation$ThreadPoolWorker.run
    
    
    The MQ classes for JMS "Remote Reconnect Thread" which is
    responsible for creating connection and object handles for JMS
    resources used by the application:
    
    "JMSCCThreadPoolWorker-2" #92 daemon prio=5 os_prio=0
    tid=0x00000000190e7000 nid=0x1e18 waiting for monitor entry
    [0x000000002be7e000]
       java.lang.Thread.State: BLOCKED (on object monitor)
            at
    com.ibm.mq.jmqi.remote.impl.RemoteConnection.removeSession
            - waiting to lock <0x00000000c8f293e8> (a
    com.ibm.mq.jmqi.remote.impl.RemoteConnection$SessionsMutex)
            at com.ibm.mq.jmqi.remote.impl.RemoteSession.disconnect
            at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiConnect
            at
    com.ibm.mq.jmqi.remote.impl.RemoteReconnectThread.reconnect
            at com.ibm.mq.jmqi.remote.impl.RemoteReconnectThread.run
            at
    com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTas
    k
            at
    com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.
    runItem
            at
    com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run
            at
    com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.run
    WorkQueueItem
            at
    com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManage
    rImplementation$ThreadPoolWorker.run
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects users of the IBM MQ classes for JMS automatic
    client reconnect function.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    A JMS Connection and a JMS Session created by an MQ classes for
    JMS application each have a connection to a queue manager.
    These connections are referred to as "conversations" or
    "connection handles" (hConns) and multiple hConns can be
    multiplexed over a server-connection channel instance - as
    determined by the value of the channel's sharing conversations
    property.
    
    A deadlock occurred within the MQ classes for JMS automatic
    client reconnect function when a connection error was detected
    by the hConn associated with a JMS Session and reconnect
    processing was invoked.  This prevented the reconnection
    processing from completing.
    
    The "RcvThread" that was associated with the channel instance
    (TCP/IP connection) used by the JMS Session's hConn attempted to
    verify that its "parent" hConn (the one associated with the JMS
    Connection) was either still valid, already reconnected or in
    need of reconnection.  This is because the JMS Session must
    always connect to the same queue manager as the JMS Connection
    from which it was created and uses connection information from
    this parent hConn.
    
    In MQ V9.1, it did this by attempting to issue a lightweight MQ
    API call to the queue manager because, for the most part, the
    hConn associated with a JMS Connection is used as a controlling
    hConn for asynchronous consume operations and so few MQ API
    calls are issued using it.  Before issuing the MQ API call, the
    "RcvThread" took a lock on a list of hConns multiplexed over a
    channel instance and checked to see if the hConn was in the
    process of being reconnected.  It was and so the "RcvThread"
    blocked, waiting for the reconnect to complete.
    
    An internal "RemoteReconnectThread" is responsible for
    reconnecting hConns.  It was in the process of attempting to
    reconnect a particular hConn and required the lock on the list
    of hConns for the channel instance in order to perform some
    clean-up.  This was because it initially tried to reconnect the
    hConn using an existing channel instance but failed because that
    channel instance was in the process of disconnected due to the
    original connection error.
    
    The "RemoteReconnectThread" could not obtain the lock which was
    held by the "RcvThread", which would not release it until the
    reconnect processing was completed by the
    "RemoteReconnectThread".
    

Problem conclusion

  • Two changes have been made to the MQ classes for JMS.
    
    The first is to ensure that the "RemoteReconnectThread" does not
    attempt to reuse old channel instances to reconnect broken
    connection handles (hConns).  At the start of each reconnect
    cycle, a new connection is created which can then used to
    reconnect disconnected hConns.
    
    The second change is to remove the need for the "RcvThread" to
    make an MQ API call on a "parent" hConn when a connection error
    is detected on a child hConn.  The channel heartbeating function
    is sufficient to detect errors on the channel instance
    associated with a parent hConn even if it is not being used to
    issue regular MQ API calls.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.1 CD    9.1.4
    v9.1 LTS   9.1.0.4
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT29729

  • Reported component name

    IBM MQ BASE MP

  • Reported component ID

    5724H7271

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-07-15

  • Closed date

    2019-10-04

  • Last modified date

    2019-10-04

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM MQ BASE MP

  • Fixed component ID

    5724H7271

Applicable component levels

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
04 October 2019