IBM Support

IT15408: JMS client hangs during reconnect to a queue manager after HA failover

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When using the MQ classes for JMS with the MQ Automatic Client
    Reconnect functionality, a hang can occur when a multi-instance
    queue manager is failed over to the standby instance.  A
    Javacore of the JVM process would show a RemoteRcvThread
    blocked, waiting for an internal "RemoteConnectionSpecification"
    lock in:
    
    
    com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection(RemoteConnection
    ).asyncFailure(RemoteTls, Throwable, boolean)
    
    Example output from a Javacore:
    
    RcvThread:
    com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection@288628329[qmid=.
    ..]
    Blocked on:
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification$Connec
    tionsLock@0x000000000239BE18 Owned by: "JMSCCThreadPoolWorker-2"
    (J9VMThread:0x000000002255A100,
    java/lang/Thread:0x000000002204C030)
    Java callstack:
        at
    com/ibm/mq/jmqi/remote/impl/RemoteConnection.asyncFailureNotify
        at
    com/ibm/mq/jmqi/remote/impl/RemoteConnection.notifyReconnect
        at com/ibm/mq/jmqi/remote/impl/RemoteRcvThread.run
        at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.runTas
    k
        at
    com/ibm/msg/client/commonservices/workqueue/SimpleWorkQueueItem.
    runItem
        at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.run
        at
    com/ibm/msg/client/commonservices/workqueue/WorkQueueManager.run
    WorkQueueItem
    
    
    A RemoteReconnectThread thread (started to reconnect JMS
    Connections and JMS Sessions as part of the MQ Automatic Client
    Reconnect function) would typically be seen in a conditional
    wait state, but it would wake up every five seconds to check the
    TCP/IP connection it was using is still marked as connected.
    Example thread state from a Javacore when the problem arises:
    
    "JMSCCThreadPoolWorker-2" J9VMThread:0x000000002255A100,
    j9thread_t:0x00007FA72000B1E0,
    java/lang/Thread:0x000000002204C030, state:CW, prio=5
    (java/lang/Thread getId:0x12, isDaemon:true)
    Waiting on:
    com/ibm/mq/jmqi/remote/impl/RemoteSession$AsyncTshLock@0x0000000
    02204D788 Owned by: <unowned>
    Java callstack:
        at java/lang/Object.wait(Native Method)
        at java/lang/Object.wait
        at com/ibm/mq/jmqi/remote/impl/RemoteSession.receiveAsyncTsh
           (entered lock:
    com/ibm/mq/jmqi/remote/impl/RemoteSession$AsyncTshLock@0x0000000
    02204D788, entry count: 1)
        at com/ibm/mq/jmqi/remote/impl/RemoteSession.receiveTSH
        at
    com/ibm/mq/jmqi/remote/impl/RemoteSession.startConversation
        at
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.sessio
    nFromEligible
        at
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.getSes
    sionFromEligibleConnection
           (entered lock:
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification$Connec
    tionsLock@0x000000000239BE18, entry count: 1)
        at
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.getSes
    sion
        at
    com/ibm/mq/jmqi/remote/impl/RemoteConnectionPool.getSession
        at com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiConnect
        at
    com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.reconnect
        at com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.run
    
    
    The classes for JMS application would hang and would not
    reconnect to the standby queue manager instance.
    

Local fix

  • Set the server-connection channel attribute "SHARECNV" to the
    value 1.
    
    This ensures only one conversation (hConn) to the queue manager
    can occur over a single TCP/IP socket.  As such, new connection
    requests and those being reconnected by the
    RemoteReconnectThread have a dedicated TCP/IP socket and no
    attempt to multiplex conversations over a single socket is
    attempted.
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    This issue affects users of:
    
    - The WebSphere MQ classes for JMS v7.1.0.7
    - The WebSphere MQ classes for Java v7.1.0.7
    - The WebSphere MQ Resource Adapter v7.1.0.7
    
    - The WebSphere MQ classes for JMS v7.5.03, v7.5.0.4, v7.5.0.5
    and v7.5.0.6
    - The WebSphere MQ classes for Java v7.5.03, v7.5.0.4, v7.5.0.5
    and v7.5.0.6
    - The WebSphere MQ Resource Adapter v7.5.03, v7.5.0.4, v7.5.0.5
    and v7.5.0.6
    
    after APAR IC93973
    
    http://www-01.ibm.com/support/docview.wss?uid=swg1IC93973
    
    and all versions of:
    
    - The IBM MQ classes for JMS v8
    - The IBM MQ classes for Java v8
    - The IBM MQ Resource Adapter v8
    
    - The IBM MQ classes for JMS v9
    - The IBM MQ classes for Java v9
    - The IBM MQ Resource Adapter v9
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    After failing over a multi-instance queue manager from an active
    to a standby instance, a hang could have occurred within the
    classes for JMS automatic client reconnection feature.  The
    classes for JMS application would not reconnect and the JVM
    required terminating and restarting in order to recover.
    
    The issue occurred when the internal "RemoteReconnectThread"
    (responsible for reconnecting JMS Connections and JMS Sessions
    as part of automatic client reconnection) attempted to establish
    a new conversation on an existing TCP/IP connection (also known
    as a channel instance) that was in the process of being closed.
     In this scenario, there was a race condition between this
    RemoteReconnectThread and an internal "RemoteRcvThread"
    (responsible for reading the data from the TCP/IP connection)
    for this connection whereby the RemoteRcvThread would not notify
    the RemoteReconnectThread that the TCP/IP connection was no
    longer valid.
    
    As such, the RemoteReconnectThread would wait for a response
    from the queue manager to its MQCONNX request that would not be
    received.  Furthermore, the RemoteReconnectThread held onto an
    internal"connections lock" - the RemoteRcvThread required that
    lock to to complete the closure of the failed connection,  and
    so blocked indefinitely.
    
    The same hang issue could also occur between a RemoteRcvThread
    and a standard application thread creating a new conversation on
    an existing connection.  For classes for JMS applications, this
    occurred when recreating a JMS Connection, JMS Session or JMS
    Context object.  For classes for Java applications, this
    occurred when instantiating a new MQQueueManager object.
    

Problem conclusion

  • The MQ classes for Java and MQ classes for JMS have been updated
    such that the RemoteRcvThread now notifies either the
    RemoteReconnectThread or an application thread if a TCP/IP
    connection is no longer valid, should it attempt to allocate a
    new conversation on that TCP/IP connection at the same time as a
    a failure has been detected on that connection.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v7.1       7.1.0.8
    v7.5       7.5.0.8
    v8.0       8.0.0.6
    v9.0 CD    9.0.1
    v9.0 LTS   9.0.0.1
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v7.1       7.1.0.8
    v7.5       7.5.0.8
    v8.0       8.0.0.6
    v9.0 CD    9.0.1
    v9.0 LTS   9.0.0.1
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT15408

  • Reported component name

    WMQ BASE MULTIP

  • Reported component ID

    5724H7241

  • Reported release

    750

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-05-23

  • Closed date

    2016-07-26

  • Last modified date

    2017-06-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WMQ BASE MULTIP

  • Fixed component ID

    5724H7241

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSDEZSF","label":"IBM WebSphere MQ Managed File Transfer for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.5","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
31 March 2023