Moving messages between IBM MQ for z/OS queue managers in a sysplex
tonySharkey 060000E7M1 Visits (2352)
When looking at moving messages between MQ queue managers in a sysplex, there are a number of options for configuration. Recent innovations in the TCP stack such as SMC-R and SMC-D have led us to revisit whether shared queues can still be regarded as the "gold standard".
In summary, this blog will show that for small messages, shared queue still achieves the best throughput rate but as message size grows and/or workload increases, alternative configurations may be more optimal.
A bit more detail:
There are a number of different configurations for moving messages between queue managers in a sysplex and this blog aims to show the relative costs, transaction rates and impact on accounting for a subset of these configurations.
The following configurations have been measured:
In each of these measurements, a queue is used for the request portion of the workload and a separate queue is used for the reply portion. These 2 queues are treated as a pair and are used by all requester and server tasks for an iteration of the workload. As the workload is increased, additional pairs of queues, with associated requester and server tasks, are added. Each queue is allocated a separate MQ channel where applicable.
Transaction cost is based upon the cost of the entire request/reply round-trip and includes the following address spaces:
Also included is the calculated cost to the CF, based on the RMF Coupling Facility report where we have taken the "% busy" number and converted to CPU microseconds, then divided by the transaction rate, to give a CF cost per transaction.
Table 1: Transaction Cost by address space - CPU microseconds per transaction for 2KB messages.
Notes on table 1:
Some customers may regard the CF cost separately or as a fixed cost, so the table reports the costs both with and without the CF cost.
Chart 1: Transaction Cost
When using MCA channels, there is a large proportion of the transaction cost incurred by the channel initiators which involves the getting of the message from the transmission queue and interfacing with the network protocol (TCP/IP).
Using intra-group queuing (IGQ) has effectively moved the cost from the channel initiator to the queue manager address space. In addition to this, there is increased usage of the Coupling Facility. Furthermore there is an increase in the MQPUT cost as the message is being put to the ‘SYS
When shared queues are used, the queue manager cost is decreased from the IGQ-level but is still 5 times that of the measurements using MCA channels. The application cost is also increased, primarily in the requester application address space, as the application is performing an MQGET of a specific message plus the queue depth is sufficiently low that there is a further cost when the depth transitions from 0 to 1. The Coupling Facility utilization was running at 42% busy.
The following chart shows the transaction rate achieved when running the request/reply workload between the 2 queue managers.
Chart 2: Achieved Transaction Rate in each configuration
In these measurements, the TCP/IP channel throughput is limited largely by network latency.
The use of SMC-R and SMC-D has reduced this latency and given a significant improvement in throughput at little additional cost.
Intra-group queuing offers an improved rate over the standard TCP/IP channel but the benefits will depend on the CF both in its responsiveness and location relative to the LPARs.
The shared queue configuration offers similar throughput to the SMC-R and SMC-D measurement with less administration effort but does add significant load to the Coupling Facility at these transaction volumes.
What happens when the workload is scaled to use more queues?
The measurements thus far use a single pair of queues for the request/reply workload for 2KB messages and are not intended to demonstrate performance characteristics as system limits are approached.
The measurements in this section aim to demonstrate how performance may change as the workload is increases via the use of more pairs of queues.
Chart 3: Total cost per transaction with increasing workload
As the workload increases the cost increased as follows:
In the measurements using SMC-R, SMC-D and Shared queue additional processors were allocated to allow the workload rate to increase, whereas the IGQ and TCP/IP measurements hit implementation limits.
These increases are based on the total costs from the MQ queue manager and channel initiator address spaces, TCP/IP, batch application address spaces and coupling facility. The batch applications use minimal business logic and typically account for 25% of the total transaction cost in all but the shared queue configuration.
In the shared queue configuration, the application costs comprise up to 50% of the total transaction cost and this increase is due to the higher cost of putting to the Coupling Facility and this cost increases as the CF becomes less responsive.
Chart 4: Coupling Facility usage with increasing workload
Chart 4 shows that as the workload increases, the CF becomes busier for the shared queue configuration. The level increases to in excess of 90% busy and it may be beneficial to increase the number of CF processors available as the CF will become less response by converting responses from synchronous to asynchronous as well as the synchronous response times starting to increase. In these measurements the synchronous response times increased from 4.7 to 12.8 microseconds.
For the shared queue measurement where the CF has multiple processors and the CPU exceeds 60%, we would suggest that more CF processors should be allocated.
With regards to the IGQ measurement, the CF remains at approximately 40% busy regardless of workload and this appears to be due to IGQ being driven to its limit.
Chart 5: Achieved Transaction Rate with increasing workload
The MCA channels show a steadily increasing transaction rate as the workload increases.
By comparison the SMC-R and SMC-D measurements shows a higher transaction rate which shows no indication of being constrained by implementation limits.
The IGQ measurement actually shows a decrease in transaction rate, dropping 24%, due to increasing contention on the IGQ resources.
The shared queue measurement shows good scaling performance until the Coupling Facility becomes the constraining factor.
What happens with larger messages?
The measurements thus far have been limited to messages of 2KB and those measurements using MQ channels are not driving the network particularly efficiently.
For those measurements using shared queued and IGQ, the impact of larger messages will depend on the size of the messages and the whether the offloaded portion of the message is either to Shared Message Data Sets or DB2 BLOBs. Note that DB2 BLOB performance is not discussed in this blog. When messages are offloaded, the Coupling Facility usage is significantly reduced, instead the DASD subsystem may see an increase in workload.
To demonstrate the performance differences of larger messages, measurements are reported for 64KB and 4MB messages.
Measurements using 64KB messages
The following 2 charts compare the transaction cost and transaction rates of the 5 configurations when using 64KB messages.
With these larger messages, the messages flowing across MQ channels are processed more efficiently, i.e. the proportion of the MQ implementation headers of the total payload is much smaller with large messages.
Chart 6: Total transaction cost with increasing workload using 64KB messages
In both the IGQ and Shared queue measurements the messages are stored on the shared message data sets, and the remote queue manager will read those datasets directly to access the message data. This means that the Coupling Facility usage is much less than for the small message workload.
The measurements using MQ channels have the majority of the transaction cost in the MQ channel initiator and TCP/IP address spaces.
Chart 7: Achieved Transaction Rate with increasing workload using 64KB messages
It is with these larger messages that the benefit of both SMC-R and SMC-D can be observed.
While the IGQ throughput has peaked for the workload, the shared queue measurement shows consistent improvements as the workload is increased. This is largely because each queue is defined to a separate structure which in turn has a unique shared message data set. In this measurement the performance is likely to be constrained by DASD performance.
The SMC-R measurements are able to process more than 3 times the volume of data that the peak rate achieved in the shared queue measurement.
SMC-D achieved 30% more throughput than the peak SMC-R measurement and was able to process over 21,000 transactions per second, which equates to approximately 1350 MB/second outbound and 1350 MB/second inbound to each queue manager.
Measurements using 4MB messages
The following 2 charts compare the transaction cost and transaction rates of the 5 configurations when using 4MB messages.
Chart 8: Total transaction cost with increasing workload using 4MB messages
For 4MB messages, the most expensive configurations are those using MQ channels, however as chart 9 shows, these also gives the greatest throughput rate.
In these measurements the shared queue measurements were lowest cost due to the use of shared message data sets, which meant that the data was read directly from disk rather than having to be transported over the network. This of course puts additional load on the disk subsystem and as demonstrated in chart 10, does not necessarily give the best performance.
The cost for the measurements using MQ channels see approximately 80% of the total cost in the MQ channel initiator and TCP/IP address spaces, whereas IGQ is predominantly in the MQ master address spaces and shared queue is largely in the application address spaces.
Chart 9: Achieved Transaction Rate with increasing workload using 4MB messages
In chart 9, the SMC-D transaction rate significantly exceeds all other configurations, achieving more than 3 times the rate of TCP/IP, IGQ and Shared queue measurements.
The SMC-D measurement also demonstrates a 60% improvement over the SMC-R equivalent tests. This equates to approximately 1.6GB per second for the outbound channels and 1.6GB per second for the inbound channels.
Both the IGQ and shared queue measurements show little benefit as more queues are made available to transport data.
Chart 10: Breakdown of transaction cost by address space compared with transaction rate
Chart 10 shows transaction cost by each address space for the 5 queue pair measurement with 4MB messages.
As mentioned earlier, the largest proportion of the cost of the MQ channel measurements is in the MQ channel initiator and the TCP/IP address spaces.
From the MQ channel initiator statistics and accounting data, which has been available since MQ for z/OS version 8.0, we can see that approximately 80% of the channel initiator cost is spent in the dispatcher tasks, which are primarily the interface to the network.
The results show that for small messages, shared queue still offers the best transaction rate combined with the minimum use of z/OS CPU time, however with the availability of SMC-R and SMC-D, equivalent performance can be achieved for a small increase in CPU cost for those customers who are constrained by CF resources.
A rule of thumb for the particular set of measurements using 2KB messages might be that:
For larger messages, where the message cannot be stored entirely in the Coupling Facility, the use of SMC-R and particularly SMC-D can offer significant benefit in throughput but at increased cost in the MQ channel initiator and TCPIP address spaces.
Where shared queue or IGQ is not available, the use of SMC-D or SMC-R should be considered as an enhancement to the basic performance of TCP/IP.
In our shared queue measurements using 2KB messages with multiple queues, the CF CPU utilization was higher than is typically suggested for those processor types and it would be advisable in a production environment to ensure additional CF processors are available.
In a system where the Coupling Facility is already being used, it may be that the additional load of moving messages either via IGQ or using a shared queue for put on LPAR 1 and get on LPAR 2 is sufficient to overload the CF such that response times degrade by going asynchronous.
Different performance characteristics may be observed with more sets of application queues. For example, as more queues are used in the MCA channel configurations, we would expect the throughput to scale well until some system limit is reached, such as network capacity, CPU or even the IBM MQ dispatcher task being busy.
In the configurations where the Coupling Facility is a significant factor, there are many areas that need to be considered, including the CPU utilization, whether responses are synchronous or asynchronous – as the load increases, more responses will become asynchronous which could degrade the performance, whether the links to the CF are saturated, etc. Performance report MP16 offers some guidance on what may need to be monitored in a shared queue environment.
This is not an exhaustive list of configurations, for example it is possible to use shared transmit queues.
Measurements were run on a dedicated performance system and different behavior may be observed on a busier system.
Additional testing would be required for configurations where: