This document discusses some of the key performance indicators (KPI) for using MQ on z/OS.
For details about the fields and metrics see supportPac MP1B.
The KPI can apply to throughput and to how much work is being done.
What to I need to look at to make sure my work is not delayed?
Rate at which data can be logged
For persistent messages the most important resource is the rate at which you can write to the active log datasets. The maximum rate at which you can log data depends on your DASD and your workload profile.
1. DASD dependant. The rate at which you can log to disk depends on your DASD. If our DASD is mirrored synchronously then this will be slower than if it is not mirrored. If your I/O subsystem is slow this will impact performance.
2. If the workload profile has large persistent messages, then a lot of data can be written in each I/O. If the workload profile has only lots of short messages (a few KB) then there may only be a small amount of data per I/O.
The profile of logging activity has different characteristics depending on the load and I/O rate.
1. If the log task is less than 80% busy then the logging is lightly loaded, and can handle more work
2. As the queue manager logs more data, the logger task becomes busier. The number of pages written per I/O increases
3. When the pages per I/O is over about 50, then the queue manager is approaching maximum capacity. If you only have a few application instances processing short persistent messages, then the maximum pages per I/O may be low perhaps less than 10.
4. If there is more data to be logged, and the I/O rate cannot keep up, then the buffers containing data to be written to the log will fill up, and applications will have to wait for a free log buffer.
The KPI are
The Log WTB - wait for buffers should always be zero. If this is greater than zero, the queue managaer was trying to log more than the DASD could handle. This could be caused by a temporary slow I/O rather than an increase in MQ workload. You can look at the I/O response time in the MQ log statistics, or use products like RMF to review the DASD performance. If the log data sets fill up, then this will show as many requests waiting for buffer.
Pages per I/O. Monitoring this over time will show what profile you have. You need to know if your typical profile is 10 pages per I/O or over 50 pages per I/O. If you get a significant change in the pages per I/O either up or down, then this may indicate a problem.
Log task busy. If this task is under 80% then the queue manager can easily handle the amount of persistent data being processed.
Buffer pool usage
Keeping your buffers pools under 85% is key to good performance for short lived messages. This eliminates application I/O to the page set.
If all your messages are in the buffer pool ( the optimum for performance) then there should be no reads from the page sets. There may be writes to pages sets during checkpoint activity.
For performance you should monitor the response time of the structures. The response time will depend on the configuration of the hardware. For example you can get contention on the channels to the CF. If the CF is on the same physical processor as an LPAR the response time will be much better than from a remote processor.
Use z/OS facilities, such as RMF to monitor these response times.
Monitor number of messages off loaded to SMDS
Has there been a change to the number of messages offloaded to DB2 or SMDS? For example it has been zero, but is no non zero. With SMDS you may have reached a % full which causes messages to be offloaded.
Look in the MQ SMF data, or use the DIS USAGE TYPE(SMDS) command and Offloaded column in the messsage CSQE280I MQ7A SMDS usage.
If you have configured your structures to use Storage Class Memory the performance of your structure may vary when the structure gets to 90% full. You can use the MVS command D XCF,STR,STRNAME=.... RMF ( or equivilant) to display the SCM usage.
SMDS is faster and uses less CPU than storing large shared messages in DB2.
The WMQ command DIS CHSTATUS gives lots of information about the channels. You can use monitoring tools ( or the MQCMD in MP1B) to periodically display this information.
In WMQ V8 this information is available in SMF records.
The time to send messages over a channel is in two areas
- The time a messages is waiting to be sent
- The time to send the message over the network, and the end of batch processing.
How long did messages have to wait before being sent?
To display the time the message is waiting using the XQTIME. This value may change over a day, as more MQ work is processed, or as the network gets busier.
If the BATCHINT value is zero, the achieved batch size should be less than the negotiated batch size for short messages. Values XBATCHSZ < BATCHSZ in DIS CHSTATUS. If XBATCHSZ is close to BATCHSZ then most of the time there were always messages waiting to be sent.
If the BATCHINT is a large value then XBATCHSZ can be the same as BATCHSZ, as there is a get with wait.
The NETTIME is the time between sending an end of batch request, and getting the response back, excluding the time in the remote queue manager. This value has two components
- The time the request is on the network
- The delay before the remote queue manager processes the request. For example if the channel has put to a queue, and the queue is full, the channel can wait and retry the put. Once the message has been put successfully the next request can be processed, and end of batch processing can be done. In this case the nettime includes the wait and retry of the put.
Your nettime values should be within a range specific to you over the day. if you get values longer than normal, this can indicate a network problem, or processing problem at the remote queue manager.
How much work is my queue manager doing?
You should monitor your queue manager to see if there are trends in the work being done in the queue manager.
- Peak number of puts and gets per hour. This tells you if there is an increase in workload, or a change in the application workload
- How many log CIs are created per hour. This tells you how much persistent data you are processing
- Queue manager and Chinit virtual storage usage. This tells you how much storage you are using - and how much free storage is available
- Peak number of channels in use.
- Highest buffer pool usage for each buffer pool.
- Peak number of pages in use in a page set.
- For a structure, the % usage of the entries and elements from the D XCf,STR,strname=.... command
- Display the SMDS usage ( or use SMF) to display your SMDS activity. An increase in SMDS activity can be caused by more shared queue activity, or by larger messages.