I have an application that uses WebSphere MQ 7 as a queue manager. The application runs inside Mule 3.2.0, and until recently used the WebSphere MQ JMS Java libraries version 18.104.22.168. There are two instances of Mule, running on different machines. With the MQ 22.214.171.124 libraries, the pattern of load was 50/50 between the two servers. I.e. one server processed 50% of the messages, and the other processed the remaining 50%. The applications use Mule to listen for incoming messages from a particular MQ Queue (both servers listen to the same queue). This is all fine, and it was as I suspected.
Unfortunately, we found a memory leak in the MQ 126.96.36.199 Java client libraries, so we upgraded to 188.8.131.52. The memory leak went away, but suddenly we are in a situation where one server is processing around 90% of the messages, while the other only processes 10% (the change from 50/50 to 90/10 is very distinct and coincides exactly with the deployment of the new version of the application). There were some other minor changes to the application, but none that would have any change to how messages are pulled off the queue. We cannot see any other explanation for the change. CPU, Disk, and Network all look normal (corresponding to the number of messages being processed), and log files show that individual messages are being processed at the same speed. I have been told that no changes were made to the queue manager at this time.
It appears to me that the algorithm by which messages are allocated to clients changed with the introduction of the new MQ 184.108.40.206 libraries. My theory is that if the two servers are waiting for a message to arrive, one of the servers is always favoured over the other (has a higher priority), where before there was a random allocation. The server that has the apparent lower priority only is allocated a message if the higher-priority server is not currently waiting for a message (the higher-priority server at this time would be busy processing another request). This theory is supported by our QA environment where the rate of message creation is much lower, and we have a split of 1250 messages for one server and 3 for the other.
Can anybody describe whether my theory is correct, and whether we can move back to the random allocation between clients that are waiting for messages to arrive? If my theory is wrong, it is possible that we have a different issue that could result in one server receiving very slowly from the queue which would require further investigation.
Pinned topic Unequal load across two Java MQ Clients in Mule
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2012-03-25T20:48:38Z at 2012-03-25T20:48:38Z by Mark_Sim-Smith
neekrish 0600011NDE6 Posts
Re: Unequal load across two Java MQ Clients in Mule2012-03-20T08:37:29ZThis is the accepted answer. This is the accepted answer.May be its due to this APAR http://www-01.ibm.com/support/docview.wss?rs=171&uid=swg1IZ97460 - IZ97460: MULTIPLE JMS APPLICATIONS ATTACHED TO SINGLE QUEUE SHOW POOR PERFORMANCE AND UNEVEN MESSAGE DISTRIBUTION.
Have you tried with latest fix pack?
T.Rob 100000R8QH30 Posts
Re: Unequal load across two Java MQ Clients in Mule2012-03-20T18:50:57ZThis is the accepted answer. This is the accepted answer.Do you see this uneven distribution under heavy load? If the message rate is low, WMQ normally distributes load unevenly. It's actually surprising that you ever saw 50/50 distribution unless it was measured under heavy load. For more on this, see my recent post on the IBM IMPACT Blog
Re: Unequal load across two Java MQ Clients in Mule2012-03-20T23:09:45ZThis is the accepted answer. This is the accepted answer.
- neekrish 0600011NDE
No, we haven't tried the latest fixpack. 220.127.116.11 and 18.104.22.168 has a bug in it that causes the MQ client to hang (I haven't gotten around to submitting a bug report yet), so we are currently operating with 22.214.171.124 I think. I didn't want to try out every version as I have plenty of other stuff to get on with ;).
My main concern was to understand whether this was a bug or a feature... As long as it means that if the apparently higher-processing node fails for some reason, the other node will be able to process at full capacity, then I'm happy enough.
Re: Unequal load across two Java MQ Clients in Mule2012-03-20T23:14:39ZThis is the accepted answer. This is the accepted answer.
- T.Rob 100000R8QH
Your description in your blog post describes exactly what I am seeing now.
The 50/50 processing we saw was when we had an MQ client version 126.96.36.199 (although nothing has changed on the server as far as I am aware). Now, we definitely see a greater disparity in the distribution of messages across the nodes under light load, and under heavier load this disparity appears to decrease.
If we can move to a later version of the MQ client (188.8.131.52 or beyond), I'll be interested to see if the fix highlighted by Neekrish causes an equal distribution of load again.
T.Rob 100000R8QH30 Posts
Re: Unequal load across two Java MQ Clients in Mule2012-03-21T19:16:26ZThis is the accepted answer. This is the accepted answer.
- Mark_Sim-Smith 2700048CEJ
fjb_saper 110000H916237 Posts
Re: Unequal load across two Java MQ Clients in Mule2012-03-25T03:42:52ZThis is the accepted answer. This is the accepted answer.
- Mark_Sim-Smith 2700048CEJ
and I believe the memory leak was fixed in a later 6.0.2 release...
As for V 7.0.1 I would not go below 184.108.40.206 (look at all the JMS fixes) and by the way the GA version is 220.127.116.11 so you are bound to run into a lot of problems in 18.104.22.168
Re: Unequal load across two Java MQ Clients in Mule2012-03-25T20:48:38ZThis is the accepted answer. This is the accepted answer.
- fjb_saper 110000H916
Fair call about 22.214.171.124. It was early days in the development cycle and a developer just grabbed the JARs off an installation CD that was lying around.
It takes a reasonable amount of testing to work out whether we have problems with a particular revision of the MQ client JARs, and as I've seen, later versions can introduce bugs too, so moving to a newer version cannot just be a drop in place exercise, but requires a fair amount of testing plus now that we're live also a production risk. Many bugs are quite specific and may not affect us. We seem to be happy with 126.96.36.199 at this stage, and I think it's a matter of finding something and sticking to it, rather than continually attempting to upgrade for often unknown benefits. However, with the other bug in 188.8.131.52 and 184.108.40.206, I'll definitely raise them with IBM when I have the time to describe them fully (the bug looks fairly simple and easy to fix).