Within 24 hours we had two similar questions asking about where CICS triggered transactions run in a shared queue environment, and why were all the transactions in one CICS region.
The easy answer is that MQ does not do workload balancing. When an application asks for a message, MQ says - 'take the next available message'. MQ does not say 'You have processed more than you fair share of messages so I am not giving you a message'.
The next question is CICS1 on LPAR1 is processing most of the messages even though CICS2 on LPAR2 has spare capacity, and CICS 1 is overloaded.
Let me give a real world example,
When people arrive at London airport and want to stay at the Paice Hotel, there is a special red carpet, and a sign saying press this button if you want the hotel to send the shuttle bus for you. There are two Paice Hotels close to the airport, one is 5 minute drive away, one is 10 minute drive away. You press the button and the light goes on in both hotels. The shuttle bus driver drives to the airport. Usually the driver from the closest hotel gets there first - and collects the passenger. The other driver arrives and there is no one there - so he goes back to his hotel and waits.
The driver takes you to the hotel and you go into the lobby - and there are lots of people queuing to check in! As a result all of the people are being taken to the busiest hotel, and there is no one at the other hotel.
As the arrival rate increase, there is a slight change in behavior.
The driver from the closest hotel takes the people who are waiting, and goes back to the hotel. Meanwhile other people are arriving and have pressed the button. The other shuttle bus arrives and so can take these people have to the hotel which is further away. So now people are being taken to both hotels - but most are going to the closest hotel.
With shared queue the key factor is the time to get from the queue manager to the coupling facility. RMF reports these times.
What can affect the time to the Coupling facility
There are several factor which can affect the time to access the CF structure as seen by MQ
- Physical distance. If the CF is co-located with one LPAR within one physical machine, and the other LPAR is in a different machine, the co located my have a response time of 5 microseconds, compared to 50 for the other
- Even if the CF is in its own machine, the amount of traffic to the CF can affect the response times - like any IO. A DB2 on one LPAR may be heavily using its structure, and so the MQ requests may be sharing the same channels and so the response time is affected. A different LPAR may be using different channels and so not be impacted.
The configuration and load on the LPAR.
What can I do to get the MQ work to run on the optimum system?
Trigger every should be used when the message rates are under a few messages a second. If the message rate is high, then the costs increase ( eg create the trigger message, getting the trigger message, starting CICS transactions etc). It is often better to have a transaction that is long running. It does MQGET.. process the message, Commit, and loops. This can have its problems. This was started at 0900 on a lightly loaded system - it is now 1200 and this systems is over loaded.
A better solution is for the transaction to process 1000 messages and then end. A new trigger is produced, and the EXEC CICS START is issued. If this transaction is managed by CPSM, then this can run on the best place. As the load on the systems change, CPSM can route the transactions to other systems.
What catches are there?
If you have a trigger monitor from two CICS regions each on a different queue manager, then two trigger message can be produced. If you have a shared initiation queue, then both messages can be got by the same CICS region. This may not be what you want.
Some customers use a private initiation queue so you can be sure that each CICS will get a trigger message