IT15408: JMS client hangs during reconnect to a queue manager after HA failover

APAR status

Closed as program error.

Error description

When using the MQ classes for JMS with the MQ Automatic Client
Reconnect functionality, a hang can occur when a multi-instance
queue manager is failed over to the standby instance.  A
Javacore of the JVM process would show a RemoteRcvThread
blocked, waiting for an internal "RemoteConnectionSpecification"
lock in:


com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection(RemoteConnection
).asyncFailure(RemoteTls, Throwable, boolean)

Example output from a Javacore:

RcvThread:
com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection@288628329[qmid=.
..]
Blocked on:
com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification$Connec
tionsLock@0x000000000239BE18 Owned by: "JMSCCThreadPoolWorker-2"
(J9VMThread:0x000000002255A100,
java/lang/Thread:0x000000002204C030)
Java callstack:
    at
com/ibm/mq/jmqi/remote/impl/RemoteConnection.asyncFailureNotify
    at
com/ibm/mq/jmqi/remote/impl/RemoteConnection.notifyReconnect
    at com/ibm/mq/jmqi/remote/impl/RemoteRcvThread.run
    at
com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.runTas
k
    at
com/ibm/msg/client/commonservices/workqueue/SimpleWorkQueueItem.
runItem
    at
com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.run
    at
com/ibm/msg/client/commonservices/workqueue/WorkQueueManager.run
WorkQueueItem


A RemoteReconnectThread thread (started to reconnect JMS
Connections and JMS Sessions as part of the MQ Automatic Client
Reconnect function) would typically be seen in a conditional
wait state, but it would wake up every five seconds to check the
TCP/IP connection it was using is still marked as connected.
Example thread state from a Javacore when the problem arises:

"JMSCCThreadPoolWorker-2" J9VMThread:0x000000002255A100,
j9thread_t:0x00007FA72000B1E0,
java/lang/Thread:0x000000002204C030, state:CW, prio=5
(java/lang/Thread getId:0x12, isDaemon:true)
Waiting on:
com/ibm/mq/jmqi/remote/impl/RemoteSession$AsyncTshLock@0x0000000
02204D788 Owned by: <unowned>
Java callstack:
    at java/lang/Object.wait(Native Method)
    at java/lang/Object.wait
    at com/ibm/mq/jmqi/remote/impl/RemoteSession.receiveAsyncTsh
       (entered lock:
com/ibm/mq/jmqi/remote/impl/RemoteSession$AsyncTshLock@0x0000000
02204D788, entry count: 1)
    at com/ibm/mq/jmqi/remote/impl/RemoteSession.receiveTSH
    at
com/ibm/mq/jmqi/remote/impl/RemoteSession.startConversation
    at
com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.sessio
nFromEligible
    at
com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.getSes
sionFromEligibleConnection
       (entered lock:
com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification$Connec
tionsLock@0x000000000239BE18, entry count: 1)
    at
com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.getSes
sion
    at
com/ibm/mq/jmqi/remote/impl/RemoteConnectionPool.getSession
    at com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiConnect
    at
com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.reconnect
    at com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.run


The classes for JMS application would hang and would not
reconnect to the standby queue manager instance.

Local fix

Set the server-connection channel attribute "SHARECNV" to the
value 1.

This ensures only one conversation (hConn) to the queue manager
can occur over a single TCP/IP socket.  As such, new connection
requests and those being reconnected by the
RemoteReconnectThread have a dedicated TCP/IP socket and no
attempt to multiplex conversations over a single socket is
attempted.

Problem summary

****************************************************************
USERS AFFECTED:
This issue affects users of:

- The WebSphere MQ classes for JMS v7.1.0.7
- The WebSphere MQ classes for Java v7.1.0.7
- The WebSphere MQ Resource Adapter v7.1.0.7

- The WebSphere MQ classes for JMS v7.5.03, v7.5.0.4, v7.5.0.5
and v7.5.0.6
- The WebSphere MQ classes for Java v7.5.03, v7.5.0.4, v7.5.0.5
and v7.5.0.6
- The WebSphere MQ Resource Adapter v7.5.03, v7.5.0.4, v7.5.0.5
and v7.5.0.6

after APAR IC93973

http://www-01.ibm.com/support/docview.wss?uid=swg1IC93973

and all versions of:

- The IBM MQ classes for JMS v8
- The IBM MQ classes for Java v8
- The IBM MQ Resource Adapter v8

- The IBM MQ classes for JMS v9
- The IBM MQ classes for Java v9
- The IBM MQ Resource Adapter v9


Platforms affected:
MultiPlatform

****************************************************************
PROBLEM DESCRIPTION:
After failing over a multi-instance queue manager from an active
to a standby instance, a hang could have occurred within the
classes for JMS automatic client reconnection feature.  The
classes for JMS application would not reconnect and the JVM
required terminating and restarting in order to recover.

The issue occurred when the internal "RemoteReconnectThread"
(responsible for reconnecting JMS Connections and JMS Sessions
as part of automatic client reconnection) attempted to establish
a new conversation on an existing TCP/IP connection (also known
as a channel instance) that was in the process of being closed.
 In this scenario, there was a race condition between this
RemoteReconnectThread and an internal "RemoteRcvThread"
(responsible for reading the data from the TCP/IP connection)
for this connection whereby the RemoteRcvThread would not notify
the RemoteReconnectThread that the TCP/IP connection was no
longer valid.

As such, the RemoteReconnectThread would wait for a response
from the queue manager to its MQCONNX request that would not be
received.  Furthermore, the RemoteReconnectThread held onto an
internal"connections lock" - the RemoteRcvThread required that
lock to to complete the closure of the failed connection,  and
so blocked indefinitely.

The same hang issue could also occur between a RemoteRcvThread
and a standard application thread creating a new conversation on
an existing connection.  For classes for JMS applications, this
occurred when recreating a JMS Connection, JMS Session or JMS
Context object.  For classes for Java applications, this
occurred when instantiating a new MQQueueManager object.

Problem conclusion

The MQ classes for Java and MQ classes for JMS have been updated
such that the RemoteRcvThread now notifies either the
RemoteReconnectThread or an application thread if a TCP/IP
connection is no longer valid, should it attempt to allocate a
new conversation on that TCP/IP connection at the same time as a
a failure has been detected on that connection.

---------------------------------------------------------------
The fix is targeted for delivery in the following PTFs:

Version    Maintenance Level
v7.1       7.1.0.8
v7.5       7.5.0.8
v8.0       8.0.0.6
v9.0 CD    9.0.1
v9.0 LTS   9.0.0.1

The latest available maintenance can be obtained from
'WebSphere MQ Recommended Fixes'
http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037

If the maintenance level is not yet available information on
its planned availability can be found in 'WebSphere MQ
Planned Maintenance Release Dates'
http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
---------------------------------------------------------------

---------------------------------------------------------------
The fix is targeted for delivery in the following PTFs:

Version    Maintenance Level
v7.1       7.1.0.8
v7.5       7.5.0.8
v8.0       8.0.0.6
v9.0 CD    9.0.1
v9.0 LTS   9.0.0.1

The latest available maintenance can be obtained from
'WebSphere MQ Recommended Fixes'
http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037

If the maintenance level is not yet available information on
its planned availability can be found in 'WebSphere MQ
Planned Maintenance Release Dates'
http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
---------------------------------------------------------------

Temporary fix

Comments

APAR Information

APAR number
IT15408
Reported component name
WMQ BASE MULTIP
Reported component ID
5724H7241
Reported release
750
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2016-05-23
Closed date
2016-07-26
Last modified date
2017-06-01

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
WMQ BASE MULTIP
Fixed component ID
5724H7241

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSDEZSF","label":"IBM WebSphere MQ Managed File Transfer for z\/OS"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.5","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
31 March 2023

Tips

IT15408: JMS client hangs during reconnect to a queue manager after HA failover

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

Document Information

Share your feedback

Need support?