I've had a number of conversations recently with customers looking at how to use client connections to connect to multiple MQ queue managers.
In the MQ client workload-management (WLM) section of this article for Java EE, I discuss why I chose to provide a code stub sample to do WLM of outbound message sending, rather than describing how to use a CCDT. However, I realize customers looking to increase the availability characteristics of existing applications might be willing to accept the limitations of CCDT based approaches to minimize code changes. I also don't even mention options such as using load balancing hardware to create a Virtual IP address (VIP) for multiple queue managers.
So here is an attempt at a balanced comparison of the pros and cons of all the various approaches.
I've used a 😐 negative / 😶 equal / 😊 positive colour coding, to try and help you compare.
Note: these choices only relate to applications sending messages, or initiating synchronous request/reply messaging. The considerations for applications servicing those messages/requests (e.g. the listeners) are completely separate, and discussed in detail in the "Connecting a message listener to a queue" section of this article.
|
CONNAME list
|
CCDT (multi-QMGR)1
|
Load balancer2
|
Code stub
|
Scale of code change required for existing apps that connect to a single QM
|
😊
MQCONN("QMNAME") to MQCONN("*QMNAME")
QMName might be in JNDI config for Java EE apps. Otherwise requires a one character code-change. |
😐
Replace existing JMS/MQI connection logic with code stub. |
||
Support for different WLM strategies
|
😐
Prioritized only |
😶
Prioritized or Random1 |
😊
Any, including per-connect round-robin |
😊
Any, including per-message round-robin. |
Performance overhead while primary QM is down
|
😐
Always tries first in list |
😊
Remembers last good |
😊
Port monitoring avoids bad QMs |
😊
Can remember last good, and retry intelligently |
XA Transaction Support
|
😐
The transaction manager needs to store recovery information that reconnects to the same QM resource. An MQCONN that resolves to different QMs generally invalidates this. e.g. in Java EE, a single Connection Factory should resolve to a single QM when using XA. |
😊
Code stub can meet the XA transaction manager’s requirements. e.g. multiple Connection Factories. |
||
Connection rebalancing on failback.
e.g. when a QM restarts after a failure or planned outage, how long till apps use it again |
😐
Connection pooling in Java EE will hold onto connections indefinitely, unless connections are configured with an aged timeout. Using an aged timeout might drive exceptions in some cases. An aged timeout also introduces a (small/occasional) performance overhead during normal operation. Conversation sharing might need to be disabled (SHARCNV=1) with an aged timeout to ensure reconnects always establish a new socket.
The ‘remembers last good’ CCDT behaviour might also delay failback.
|
😊
Code stub can handle failback flexibly, with little/no performance overhead. |
||
Admin flexibility to hide infrastructure changes from apps
|
😐
DNS only |
😶
DNS and/or shared file-system / CCDT file push |
😊
Dynamic Virtual IP address (VIP) |
😶
DNS or single-QMGR CCDT entries |
1 - CCDT (multi-QMGR):
I mean a CCDT file that contains multiple CLNTCONN channels with the same group (QMNAME CLNTCONN attribute), where different CLNTCONN entries resolve to different queue managers. This is distinct from a CCDT file that contains multiple CLNTCONN entries that are simply different IP addresses / hostnames for the same Multi-Instance queue manager, which is an approach you might choose to combine with a code stub.
If you do choose a CCDT (multi-QMGR) approach, you need to choose whether to prioritize the entries or have randomized WLM.
Prioritized: Use multiple alphabetically ordered entries with CLNTWGHT(1) and AFFINITY(PREFERRED) to remember the last good connection.
Randomized WLM: Use CLNTWGHT(1) and AFFINITY(NONE). You can adjust the WLM weighting across differently scaled MQ servers by adjusting the CLNTWGHT, but I suggest you generally avoid large differences in CLNTWGHT between channels. 99/1 for example would give undesirable failover characteristics - think of throwing stones into 99 red buckets and 1 blue bucket - it's going to take you a long time to hit a blue bucket if all the red buckets are all broken.
2 - Load balancer:
I mean a network appliance with a Virtual IP address (VIP) configured with port monitoring of the TCP/IP listeners of multiple MQ queue managers. How the VIP is configured in the network appliance will depend on the network appliance you are using.
Avoiding disruption during planned maintenance
There is another consideration not yet discussed, which is how to avoid disruption to applications (errors/timeouts visible to the end users) during planned maintenance of a queue manager. The general approach here is to drain all work from a queue manager before it is stopped.
Think about a request/reply scenario. You want all in-flight requests to complete, and the replies to be processed by the application, but you don't want any additional work to be submitted into the system. Simply quiescing the queue manager doesn't fulfill this need, as well-coded applications will receive a 2161/MQRC_Q_MGR_QUIESCING exception before they receive their reply messages for in-flight requests.
There is a tool and approach built in to MQ that helps: setting PUT(DISABLED) on the request queues used to submit work, while leaving the reply queues both PUT(ENABLED) and GET(ENABLED). Then you can monitor the depth of the request, transmission and reply queues, and once they all stabilize (in-flight requests complete or time out) you can stop the queue manager.
However, this relies on good coding in the requesting applications to handle a PUT(DISABLED) request queue, which will result in 2051/MQRC_PUT_INHIBITED errors when when they try and send a message. The exception won't occur when creating the connection to MQ, or opening the request queue, only when an attempt is made to actually send (MQPUT) a message.
Building a code stub that includes this error handling logic for request/reply scenarios, and asking your app teams to use such a code stub going forwards, can help you develop applications with consistent behavior.
As of today I haven't included MQRC_PUT_INHIBITED handling in the 'simple' synchronous request/reply case code samples I provided with the articles. It's covered in the 'advanced' synchronous request/reply sample, but that is more complex than most projects require.
So it's on my todo list to enhance the 'simple' synchronous request/reply sample to demonstrate how you might handle MQRC_PUT_INHIBITED to enable planned queue manager restart without any end-user visible errors/timeouts.