I was at a customer in Asia who was stress testing a huge new application and they asked me what do they need to check MQ is OK.
I started preparing a presentation and I found this was too complex - they wanted practical instructions.
So for getting started with MQ performance
Most systems should be able to log at 30MB/Second. Use MP1B to look at the log report for Log write rate XMB/s per copy.
If you have a few transactions running concurrently processing 2KB messages you should be able to log at 30 MB/Second or higher.
Using larger messages such as 1MB you should be able to log at 100 MB/Second.
People with mirrored DASD over a large distance may find they cannot achieve 100MB/Second
The disk response time of MQ logs should typically be under 1 ms - many people are down at 250 microseconds. Use tools like RMF to display the disk response time. Collect this at a good time, so you can compare with when you have problems.
The time for a commit or log force request (Out of syncpoint, or buffer pool filling up) is typically 2 * average log IO write time.
If your log IO time is 1 ms - one server/transaction/channel will be unlikely for to do more than 500 commits a second. If your log response time doubles - you will be unlikely to do 250 commits a second.
As you log more data the response time increases - so check the log response time at peak time for the LPAR and under maximum load for MQ.
Using multiple applications in parallel can improve throughput
Processing more messages before a commit can improve throughput
If your IO response time changes throughout the day - the maximum rate an application can process will vary during the day.
Monitor queue depths
An empty queue is a good queue. If the queue depth increases - it is usually the application getting from the queue which has the problem - but it may be the putting program processing many messages in a unit of work
Keep buffer pools below 75% usage
If the buffer pool fills up then there will be page set IO and it will be very slow to process messages as each 4KB page will require at least 2 IOs. One for the log and one for the page set.
Use MQ V8 and larger buffer pools.
With QREP ensure the buffer pool is sufficient for at least batchsize * message size * 2 * channels. We found in QREP with batches of >= 200MB, lots of uncommitted messages (bear in mind the apply also used large UoW's) meant buffer pools needed to be larger even though queue depths didn't appear high.
Have the queue manager enabled for monitoring data.
For example ALTER QMGR MONQ(HIGH) MONCHL(HIGH). This allows you to collect additional information about channels and queues. Set the MONQ attribute for queues and the MONCHL for channels
Check your MQ NETTIME with with TCP PING time.
The DIS CHSTATUS NETTIME value gives an indication of the time on the network. This should be comparable with a TSO PING command. If it is much higher you may need to tune the TCP buffers. See here
Monitor channel BATCHSZ and XBATCHSZ.
XBATCHSZ is what your channels have if this is smaller than batch size - then if the messages are small there are not enough messages to fill a batch. You may get small XBATCHSZ. if the channel is limited by BATCHLIM
Can you avoid channels disconnecting and reconnecting soon afterwards?
Do you have a business need for channels to stop soon after they are idle ?
You can use DIS CHL(*) where(DISCINT,NE,0) to see which channels have DISCINT specified - this applies to both ends of the channel.
You may be able to see channels start and stop messages in the job log(CSQX500I and CSQ501I) but you may have configured MQ to suppress these.
Channels with a large batch size (over 50) are more efficient than a small batch size. You may want to use BATCHLIM to limit how much data is sent in a batch - useful when processing very large messages
If there is a large distance between the queue managers - increasing the batch size up to 1000 may help.
Use DIS CHL(*) where(BATCHSZ,LT,50)
Have enough active logs to avoid any delays due to archiving.
Make sure the logs are of reasonable size - i.e. why would you not use 4GB logs these days once on V8 or later.
Monitor your systems so you know what is normal behavior
- Turn on all statistics keep the SMF records for at least a week
- Know typical queue depth of your application queues.
- Know the average depth of your transmission queues
- Know the nettime of your key channels at peak time
- Turn on Class(3) accounting sometimes for a short period( 5 minutes)
Channels with SSLCIPH set
Is the channel negotiating at a 'sensible' frequency. Negotiations are expensive and slow the flow of data over the channel.
Is cryptographic offload available to negotiate the secret key
Are your channels resources constrained?
Having too few adapters can cause delays. Having too many adapters should have not impact. Channels with lots of large Persistent messages may need more adapters.
If you find channels seems to be slow, try stopping and restarting the channel. It it seems to go faster then the dispatched it was on may be constrained.
Check the Chinit SMF to see if a dispatcher is constrained