General Page

IBM MQ 9.4.2 includes a new command-line tool named mqperfck, which gives you insight into the workload being processed by a queue manager. mqperfck produces an HTML report that shows statistics on the throughput and performance of the queue manager's transaction log, queues and channels.
* mqperfck was also added to releases 9.2.0.35, 9.3.0.30, and 9.4.0.10 by Known Issue: DT429852

The mqperfck tool is primarily intended to provide a snapshot of IBM MQ behaviour over a short period of time, rather than as a continuous monitoring solution.

For longer term monitoring of IBM MQ see the Monitoring and Performance topics in IBM Documentation, and https://github.com/ibm-messaging/mq-metric-samples

Use mqperfck to take a snapshot of IBM MQ queues and channels when the system is performing well, and another if the system is not performing as you expect. Compare the two reports to gain insight into differences in the way the workload is being handled.

Running the mqperfck tool

mqperfck has a command line interface that lets you specify a queue manager that the tool should connect to, the queues and sender channels it should monitor, and the number of iterations the tool should run for.

Notes:

If you set environment variable MQ_CONNECT_TYPE=CLIENT before calling the command, then mqperfck attempts to connect to the queue manager using a client connection. You can supply the details of the client channel to be used for the connection using IBM MQ environment variables like MQSERVER and MQCCDTURL. See Connecting client applications to queue managers using environment variables in IBM Documentation.
To run the command you must have adequate permissions to connect to the queue manager.

For more information on running the command see mqperfck (MQ performance check) in IBM Documentation.

Interpreting the performance report output

The html report file name has the following format:

mqperfcheck_<QMGR_NAME>_YYY-MM-DD_HHMMSSS-<unique_counter>.html

The generated HTML report has six sections:

A header section
System and queue manager statistics
Queue statistics for each queue named in the command parameters
Channel status for each channel
Native HA and Cross Region Recovery statistics
A summary for each queue

The content of the report varies according to the platform on which it runs, so not all of the following statistics are published on all platforms. For instance some system statistics like CPU load and MemFree are only published on Linux and AIX.

Native HA and Cross Region Recovery statistics are only included if the queue manager is running in a Native HA / CRR configuration.

Header section

The header section includes:

Queue manager name	The connected queue manager name
Hostname	Host name
CPU Count	The CPU count reported by host
MQ version	MQ version as V.R.M.F
Build level	The build level of the MQ runtime
Queue manager command level	The command level of the connected queue manager
Start time	Start time
End time	End time
Command line arguments	Command line arguments used to run mqperfck

Statistics sections

The majority of the statistics below are published by the queue manager and are described in the MQ documentation here:
https://www.ibm.com/docs/en/ibm-mq/9.4?topic=trace-metrics-published-system-topics

Minimum, maximum, and average values are calculated across the duration of all iterations (unless otherwise mentioned below), and these values are printed alongside a list of the raw values recorded for each iteration.

Queue manager and System Statistics

All sections

Sample Interval lengths	The length of each interval is printed in microseconds. The length of the intervals is governed by the frequency that the queue manager publishes statistics. They are usually published every 10 seconds, or when a new monitoring application (like mqperfck or amqsrua) subscribes to the topics.

CPU performance - platform wide (see also Metrics published on the system topics in IBM Documentation)

User CPU time percentage	Average % time spent by the CPU running non-privileged code.
System CPU time percentage	Average % time spent by the CPU running privileged code.
CPU load - one minute average	System load average over a short, medium, and longer time period. These numbers give an indication of how busy the system is. The meaning is platform-dependent but Wikipedia has a good overview: https://en.wikipedia.org/wiki/Load_(computing). A very general rule of thumb is that a load average of less than the number of CPU cores is desirable. When load average exceeds the CPU count then it generally means that processes or threads are blocked. If MQ performance is suffering and these numbers are high then the machine may not be adequately sized for the load. However, on Linux the load average includes threads waiting on IO and other task in uninterruptible sleep which makes interpretation more difficult. For more information see: https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html To quote the Brendan Gregg blog post, the system load is “…more useful for relative comparisons: if you know the system runs fine at a load of 20, and it's now at 40, then it's time to dig in with other metrics to see what's going on.”
CPU load - five minute average
CPU load - fifteen minute average
RAM free percentage	% RAM free
RAM total bytes	Total system RAM in MB

CPU performance - running queue manager (see also Metrics published on the system topics in IBM Documentation)

User CPU time	Estimated average % CPU time used by the queue manager's processes while running non-privileged code.
System CPU time	Estimated average % CPU time used by the queue manager's processes while running privileged code.
RAM total bytes	Estimated MB of RAM used by queue manager processes

Disk usage - queue manager recovery log (see DISK (platform persistent data stores) in IBM Documentation)

Log - write latency	A rolling average of time (in microseconds) taken to write an entry to the queue manager transaction log. This is a key metric - particularly for persistent workloads. Monitor this value and, if the system is performing poorly, compare this value to values when the system was working well. Slow log writes - for example taking multiple milliseconds - are likely to have an impact on overall queue manager performance. For examples of the impact of log disk latency on queue manager throughput see Section 2.4 of the Persistent Messaging Performance paper published by the MQ Performance Team
Log - write size	Rolling average of size (in bytes) of entries written to the queue manager transaction log.
Log - logical bytes written	Logical bytes written to the log during the interval.
Log - physical bytes written	Physical bytes written to the log during the interval. When LogWriteIntegrity=TripleWrite then this number will be higher than logical bytes written. The ratio of logical to physical bytes written when the queue manager is heavily loaded gives an insight into how much extra writing is occurring because of partial page writes. The difference can be disregarded when the queue manager is lightly loaded because partial pages are much more likely to occur and have no significant impact on performance. See Section 2.3 of the Persistent Messaging Performance paper.
Log - current primary space in use	% primary log space currently in use. 100% indicates all the primary log space is taken. More than 100% indicates that secondary extents are in use, and that more than the intended log capacity is being used.
Log - workload primary space utilization	An approximate indication over a period of time of % primary log space used. If this is consistently over 100% you may wish to increase primary logs or investigate what is causing high utilization.
Log - Log - Slowest write since restart ^†	The time taken (in microseconds) of the slowest (highest latency) individual log write since the queue manager began.
Log - timestamp of slowest write^†	Date and (local) time when the highest log latency was observed.

^† No average printed.

Per-Object Statistics Section

Queues

The following statistics are included for each queue requested in the report.
Minimum, maximum, and average values are given for each of the following statistics alongside raw values for each iteration.

All sections

Sample Interval lengths	See previous section for description
Queue name	The queue name

MQPUT and MQPUT1 MQI Statistics

lock contention	% of accesses to the queue which waited for the queue lock to become available. Queue lock contention is more likely to manifest itself on machines with more cores. As lock contention grows into double figures then MQI performance is likely to suffer. An application design which distributes work over a set of queues will alleviate this contention. Lock contention is likely to increase if the cost of individual MQI operations is high. For example: a getter searching a deep queue with inefficient selection criteria will hold the lock for longer; as will putting a large message if disk latency is high
rolled back MQPUT count	Number of MQPUT operations which were rolled back. A high number of rollbacks may indicate an application problem an should be investigated
MQPUT1 persistent message count	Number of persistent messages put to the queue using MQPUT1. MQPUT1 is a composite operation which includes MQOPEN, MQPUT, and MQCLOSE and is suitable for putting a single message to a queue, but it is not recommended for applications that put multiple messages to a queue. These applications should open the queue and issue multiple MQPUT requests as described in application design performance considerations
MQPUT persistent message count	Number of persistent messages put to the queue using MQPUT. Use persistent messages for essential data and non-persistent messages for everything else. See Effect of Message Persistence for more information
MQPUT1 non-persistent message count	Number of non-persistent messages put to the queue using MQPUT1
MQPUT non-persistent message count	Number of non-persistent messages put to the queue using MQPUT
queue avoided puts	Number of messages passed directly from putting applications to waiting getters without needing to be queued. This is a performance optimisation which is described in section 5.6.1.1 of the performance best practice guide (see: https://ibm-messaging.github.io/mqperf/MQ_Performance_Best_Practices_v1.0.1.pdf) and in conference presentations - e.g. pages 24-25 here: https://www.mqtechconference.com/sessions_v2014/MQ_Internals_DeepDive.pdf
persistent byte count	Count of bytes put in persistent messages
non-persistent byte count	Count of bytes put in non-persistent messages

MQGET MQI Statistics

rolled back MQGET count	Number of MQGET operations which were rolled back. A high number of rollbacks may indicate an application problem an should be investigated.
MQGET browse fails with MQRC_TRUNCATED_MSG_FAILED	Non-destructive MQGETs (browses) which failed because the buffer provided on the MQGET call was not large enough to contain the message. Most applications will resize the buffer and try again when seeing this. Repeatedly trying to get a message is inefficient so you should design your application to reduce these errors. Section 7.3.1 of the MQ Performance Best Practices Guide suggests using a buffer that can handle 90% of messages.
MQGET browse fails with MQRC_NO_MSG_AVAILABLE	Non-destructive MQGETs (browses) which failed because no message was found. This could be because the queue is empty, or because the getter specified matching options or message selectors which did not match any messages on the queue. The extended queue metrics provide more information about gets which failed to match a message.
MQGET browse fails	Total number of failed non-destructive MQGETs
destructive MQGET fails with MQRC_TRUNCATED_MSG_FAILED	See comments on MQRC_TRUNCATED_MSG_FAILED above
destructive MQGET fails with MQRC_NO_MSG_AVAILABLE	See comments on MQRC_NO_MSG_AVAILABLE above
destructive MQGET fails	Total failed destructive MQGETs during interval
MQGET browse persistent byte count	Number of bytes of persistent message data browsed
MQGET browse non-persistent byte count	Number of bytes of non-persistent message data browsed
MQGET browse persistent message count	Number of persistent messages browsed
MQGET browse non-persistent message count	Number of non-persistent messages browsed
destructive MQGET persistent byte count	Number of bytes of persistent message data got destructively
destructive MQGET non-persistent byte count	Number of bytes of non-persistent message data got destructively
destructive MQGET persistent message count	Number of persistent messages got destructively
destructive MQGET non-persistent message count	Number of non-persistent messages got destructively
MQGET byte count	Total bytes of message data retrieved
MQGET count	Total number of successful MQGET calls

MQOPEN and MQCLOSE MQI Statistics

MQOPEN count	Number of MQOPEN calls issued during interval
MQCLOSE count	Number of MQCLOSE calls issued during interval

Extended queue metrics

Prior to the release of the mqperfck tool the following statistics were only available internally in the queue manager. To minimise the performance impact on the queue manager the queue manager collects these statistics without taking locks and this which can lead to minor inconsistencies with the MQGET counts reported above. You should therefore use these numbers as an approximation or the level of match failures rather than absolutely precise measurements.

msg search count	The number of MQGETs where the queue manager searched for a message (this will be every MQGET that wasn't satisfied by directly passing an MQPUT to a waiting getter - see also "queue avoided puts")
msg not found count	Number of MQGETs where we checked and failed to find a message.
msg examine count	Number of (matching and non-matching) messages examined to see if the meet the criterion specified on MQGET. Counts of messages that do not match the criterion are recorded below:
intran get skipped count	Messages examined but skipped because they were held by another (uncommitted) MQGET
intran put skipped count	Messages examined but skipped because the MQPUT had not been committed.
selection mismatch count	Number of messages that were checked but that did not match properties required by selector. Note that when processing selectors the queue manager needs to parse the message selector and retrieve message properties from disk to check for matches. This can be an expensive operation.
correlid mismatch short count	This is a count of messages which were examined because MQGET specified MQMO_MATCH_CORREL_ID but skipped because quick CorrelId hash did not match. Note that MQ keeps hash tables of CorrelIds to quickly locate messages, so MQMO_MATCH_CORREL_ID is much more efficient than any other form of message matching or selection
correlid mismatch long count	Messages examined where a match for the CorrelId hash was found, but where the full CorrelId comparison failed.
msgid mismatch count	Messages examined because MQGET specified MQMO_MATCH_MSG_ID, but skipped because the MsgId did not match the requested MsgId
load msg dtl count	The queue manager will make a decision on whether a message matches the criteria by checking in memory buffers and hash tables whenever possible. This count reflects the number of messages which could not be found by more efficient means and where messages or message headers that needed to be loaded from the queue file to check for a match. Keep queue depths low, and use CorrelId matching to keep this number to a minimum.

General queue metrics

Queue depth	The queue depth at the end of the interval
average queue time	The average amount of time (in microseconds) that messages spend on the queue
queue purged count	Number of times the queue was cleared during the interval
messages expired	Number of messages expired during the interval
open input count	Number of queue handles open at the end of the interval which included one of the destructive get (MQOO_INPUT*) options in their call to MQOPEN. (Note: this count may also include handles which also specified the MQOO_BROWSE option on MQOPEN)
open output count	Number of queue handles at the end of the interval that are open for output (MQPUT)
open browse count	Number of queue handles open at the end of the interval which included the MQOO_BROWSE option on their call to MQOPEN. (Note: this count may include handles which also specified one of the MQOO_INPUT* options on MQOPEN)
open publish count	Number of queue handles open at the end of the interval which were opened by queue manager processes to put messages to subscriptions that specified this queue as their destination

Sender Channels

The following statistics are included for each sender channel requested in the report. Minimum, maximum, and average values are given for the following statistics (unless otherwise mentioned below) alongside raw values for each interval.

Note that:

Channel statistics are derived from a MQCMD_INQUIRE_CHANNEL_STATUS PCF command which is issued once per interval. For information about these statistics, see MQCMD_INQUIRE_CHANNEL_STATUS (Inquire Channel Status) Response in IBM Documentation.
Where the command returns a long and short average for these statistics, the performance tool prints the short average.

Use this information to help:

Channel name	The channel name
Interval length	See previous sections for more information about interval lengths
Channel status (MQCHS_*) ^†	Channel state . This gives you a quick indicator of whether the channel is running, stopped etc. The raw values show how the channel's status has changed over the lifetime of the report. A value of 3 indicates that the channel is running (MQCHS_RUNNING). To map other MQCHS_* values, see MQCHS_* (Command format Channel Status)
Channel start date ^†	The date the channel started
Channel start time ^†	The time the channel started
Network time Indicator	Amount of time, in microseconds, to send a request to the remote end of the channel and receive a response when confirming a batch. This time only measures the network time for an operation (any time spend in MQ processing is subtracted from the round trip time) so it is a good indication of network latency. Check this value if you suspect you might have issues with your network. The value is based on recent activity over a short period. This metric is useful in checking why a batch takes a long time to complete. It is the NETTIME value mentioned in topic: Determining whether a channel can move messages fast enough
Batch size Indicator	An indication of the number of messages that the channel is sending in each batch. The value displayed is based on recent activity over a short period. Check IBM MQ documentation for more information about the channel batch size attribute. Note that the channel batch size attribute is the maximum size of a batch. The actual size of a batch can be less; for example, a batch completes when there are no messages left on the transmission queue, or if the batch interval expires.
Time in user exits	Shows the time spent (in microseconds per message) executing user exits. The value displayed is based on recent activity over a short period. This is the EXITTIME value mentioned in topic: Determining whether a channel can move messages fast enough
Compression time	Shows the time spent (in microseconds per message) compressing message data. The value displayed is based on recent activity over a short period. This is the COMPTIME value mentioned in topic: Determining whether a channel can move messages fast enough
TLS key reset count	The number of successful TLS secret key resets that have occurred for this channel instance in the interval.
Time on XMITQ	The time, in microseconds, that messages remained on the transmission queue before being retrieved. The time is measured from when the message is put onto the transmission queue until it is retrieved to be sent on the channel and, therefore, includes any interval caused by a delay in the putting application committing the message.
Messages on XMITQ	This parameter applies to cluster sender channels only. It indicates the number of messages available to the channel on the transmission queue.
Buffers sent count	The number of send calls the queue manager made during the interval to transmit data over the channel. This value can be used to check that the channel is moving messages
Bytes sent count	The number of bytes sent in this interval
Completed batches count	The number of completed batches in this interval
Messages sent count	The number of messages sent in this interval

^† No average printed.

Native HA Replication Statistics

The following statistics are included if the queue manager is running in a Native HA configuration:

Replica Name	The replica name
Average network round trip time	The average time taken in microseconds for network round trips to this replica. This is just the time spent flowing the data (the time spent processing the data on the remote replica is subtracted from the total network round trip time).
Throttling time percentage	The percentage of the interval that was spent throttling work on the active instance to help this replica catch up. Any non-zero values indicate that this replica is causing the active to run more slowly than it could. If a reduction in performance of this queue manager will impact your business then you should investigate why this replica is not able to keep pace with the active instance and resolve any issues you find.
Backlog bytes	The backlog in bytes at the end of the interval of log updates waiting to be sent to this replica.
Backlog average bytes	The short-term average in bytes of log updates waiting to be sent to the replica.
Backlog long-term average bytes	The long-term average in bytes of log updates waiting to be sent to the replica.
Catch-up time percentage	The percentage of the interval that the replica was out of sync with the active and was attempting to catch up.
Log write average acknowledgement latency	The average time taken for a log write to be acknowledged by the replica, in microseconds in this interval.
Log write average acknowledgement size	The average amount of logged bytes acknowledged in a single response from the replica in this interval. If the average acknowledgement size starts to change it may indicate a change in the type of synchronous data being logged.
Log - write latency	A rolling average of time (in microseconds) taken to write an entry to the replica's transaction log. This reflects the speed of the replica's log disk, and is an important metric. See the comments for this statistic regarding slow log writes in the "Disk usage - queue manager recovery log" section above.
Log - write size	Rolling average of size (in bytes) of entries written to the replica's transaction log.
Acknowledged log sequence number ^†	The log sequence number (LSN) that has been acknowledged by the replica as having been written to its recovery log.
Log - timestamp of slowest write ^†	Date and (local) time when the highest log latency was observed.
Log - slowest write since restart	The time taken (in microseconds) by the slowest (highest latency) individual log write since the queue manager began.
Synchronous log bytes sent	The total amount of synchronous log data that has been sent to the replica in bytes in this interval. Typically, in-sync appends are much smaller in size than catchup.
Catch-up log bytes sent	The total amount of catch-up log data that has been sent to the replica in bytes in this interval
Synchronous compressed log bytes sent	The amount of compressed synchronous log data in bytes that was sent to the replica during the interval. Compression takes place when the number of bytes sent exceeds the CompressionThreshold value in the “NativeHALocalInstance” stanza in the qm.ini file. Monitoring the compression stats allows the effectiveness of compression to be measure over time.
Synchronous log bytes decompressed	The amount of synchronous log data in bytes after decompression that was received by the replica during the interval.
Synchronous log data average compression time	The average time spent compressing synchronous log data during the interval.
Synchronous log data average decompression time	The average time spent decompressing synchronous log data during the interval.
Catch-up compressed log bytes sent	The amount of compressed catch-up log data in bytes that was sent during the interval.
Catch-up log bytes decompressed	The amount of catch-up log data in bytes after decompression that was received by the replica during the interval.
Catch-up log data average compression time	The average time spent compressing catch-up log data during the interval.
Catch-up log data average decompression time	The average time spent decompressing catch-up log data during the interval.

^†No average printed.

Cross Region Recovery Group Statistics

The following statistics are included if the Native HA queue manager has been configured with a cross region replication group:

Recovery group name	The group name
Average network round trip time	The average time taken in microseconds for network round trips to the recovery group leader. This is just the time spent flowing the data (the time spent processing the data on the recovery group is subtracted from the total network round trip time).
Backlog average bytes	The short-term average in bytes of log updates waiting to be sent to the replica.
Backlog bytes	The backlog in bytes at the end of the interval of log updates waiting to be sent to this replica.
Log bytes sent	The total amount of log data that has been sent to the replica in bytes in this interval. The recovery group does not form any part of the quorum for the active instance so there is no differentiation in replication traffic being in-sync or for catch-up.
Recovery log sequence number ^†	The LSN the group could recover from
Rebase count	The number of times the group has been rebased
Log data average decompression time	The average time spent decompressing log data during the interval.
Log bytes decompressed	The amount of log data in bytes after decompression that was received during the interval.
Log data average compression time	The average time spent compressing log data during the interval.
Compressed log bytes sent	The amount of compressed log data in bytes that was sent during the interval. Compression takes place when the number of bytes sent exceeds the GroupCompressionThreshold value in the “NativeHALocalInstance” stanza in the qm.ini file.

^†No average printed.

Summary / Analysis

The following summary is included for each queue requested in the report:

Queue name	The queue name
Total elapsed time	The total time covered by the report.
Approx. MQPUT rate (successful calls only)	The average number of successful MQPUTs per second during the run time of the report.
Approx. MQGET rate (successful calls only)	The average number of successful destructive MQGETs per second during the run time of the report.
Approx. MQGET-BROWSE rate (successful calls only)	The average number of successful non-destructive MQGETs per second during the run time of the report.
MQGET % success	The percentage of successful destructive MQGET calls
MQGET-BROWSE % success	The percentage of successful non-destructive MQGET calls
Queue avoid %	The percentage of messages passed directly from putting applications to waiting getters without needing to be queued.
MQPUT % rolled back	The percentage of MQPUTs which were rolled back.
MQGET % rolled back	The percentage of MQGETs which were rolled back.
Ave Queue Depth	The average depth of the queue during the run time of the report.
Ave Searches per Get	The average number number of MQGETs where the queue manager searched for a message
Ave Msgs examined per Get	The average number number of messages per MQGET that were tested for a match
Ave CorrelId short hash mismatches per Get	The average number number of messages per MQGET which were rejected because MQGET specified MQMO_MATCH_CORREL_ID but skipped because quick CorrelId hash did not match.
Ave CorrelId long hash mismatches per Get	The average number number of messages per MQGET which were rejected because MQGET specified MQMO_MATCH_CORREL_ID and the quick CorrelId hash matched but when the message was loaded a full hash comparison failed
Ave MsgId mismatches per Get	The average number numberof messages per MQGET which were rejected because MQGET specified MQMO_MATCH_MSG_ID but skipped because the MsgId did not match
Ave Selector mismatches per Get	The average number number of messages per MQGET which were rejected because the message did not match the a selector specified by the consumer
Ave Msg loads per Get	The average number number of messages that were loaded per MQGET.

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"ARM Category":[{"code":"a8m0z00000008LcAAI","label":"Performance"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.4.0;9.4.2"}]

Tips

Interpreting the IBM MQ Performance Check Report