Running a WebSphere Message Broker performance benchmark on z/OS

White Papers

Abstract

This document gives best practices for setting up and running a performance WebSphere Message Broker (WMB) performance benchmark on z/OS.

Content

You need to clearly document what the objectives of the benchmark are. For example is it just

To achieve a particular throughput and cost per request for a given architecture

or it is

To achieve a particular throughput and cost per request
Document every tuning activity and the performance effect to this activity
To change the architecture to achieve the throughput and cost targets. This could cover changing the broker, MQ and DB2 configurations
To explore the DB2 database and table structures to improve the throughput and cost targets.

Before you start running a broker performance benchmark you need to have done some preparation and testing.

Preparation

Application

Your application should be close to what will be running in production. For example debug code and trace nodes should be removed or deactivated
You need to be able to measure and report throughput. This might be as simple as using the WebSphere MQ(WMQ) command reset qstats to display the rate on the output queue (not on the input queue)
You may need to measure the time taken to process requests. If so, in the first flow you will need to record the time the message was got, and pass this data through the flows. In the last flow record the time and calculate the difference and report this time. Some people store the information in DB2, others accumulate it and display information every 1000 messages, or after a time period. You cannot use the time when the message was originally put because there may be a large backlog on the input queue. You cannot divide the measurement duration by the number of messages because you will have parallel processing. (If you put 2 messages to a queue, each of which takes 1 second to process, and you process both messages in parallel, it takes 1 second to process both messages, not 0.5 seconds per message)

Message Broker

Turn ZAAPs and ZIIPS off when using WMB accounting, to identify the high use nodes, the statistics report on GCP usage, not total CPU usage. When you have identified the high use nodes, enable any ZAAPs again.

DB2

Once you have loaded your tables ensure you have done a database reorg to optimise the data and the indexes.
You will be doing repeated runs. You need the same table content for each run. Set up jobs to restore the tables from backups, or ensure you can delete the rows added in each run.
Ensure you have done runstats on tables. This primes DB2 with information to help it make the optimum use of indexes. If you want to use empty tables, run runstats when the table has sufficient rows, and then delete rows from the table.
Use a realistic number of rows in the DB2 tables.
Use partitioned data for large tables to spread activity across the table ( and so spread it across disks), so you do not have activity in just a very small part of a table. This means your test data needs to be representative of production data with a spread of keys
The log datasets should be striped using 4 stripes. This provides benefit when you are logging a lot of data.
Check your buffer pool usage
Specify ACCOUNTINGINTERVAL = COMMIT in your ODBC.ini file. If this is not specified you will only get DB2 accounting records when the broker shuts down. If this is specified you get accounting records when the application issues a commit.
Run a fixed number of messages and review the DB2 accounting data

Identify where the elapsed time is being spent within DB2
Review how much time is spent in stored procedures
Calculate how much out of the total time is spent in DB2.

Use an SQL activity trace, sort it by the elapsed time to identify the long requests.
Check that the ODBC statements are all cached, and non require a full prepare.
Run multiple threads in parallel and see how the time in DB2 changes. For example you may get row contention as different threads try to update the same rows.
If the application is inserting records into a table ensure the table is partitioned, and the activity is spread across the table, rather than in one small section of the table. An example of this is avoiding the insert of an audit record where data is inserted in time stamp sequence, as this causes extra DB2 costs and contention.

WebSphere MQ

Check WMQ applications are using persistent messages within synch point. If they are within synch point then increasing the number of messages processed per commit can improve performance.
The log datasets should be striped using 4 stripes. This is of benefit when you are logging a lot of data.
For good performance WMQ queues should not be deep. Once an WMQ buffer pool fills up, gets and puts will write to the page set, and this will be many times slower than if the data was help in the buffer pool.
If a queue depth exceeds a threshold, the test should be considered as invalid. If a queue depth is increasing then increasing the number of instances getting messages from this queue may help - but this may then cause increased DB2 contention. Decreasing the number of instances may improve overall throughput. If changing the number of instances does not help, throttle the input process to limit the messages being processed concurrently, and so keep the queue depths within limits.
Turn off WMQ internal trace
You should be using the NOHARDENBO option by default. Using HARDENBO causes a log force for every get, and means you can only process one message per commit.

C run time and Langauge Environment(LE)

Use the LE option RPTSTG to report the storage usage, and implement what it recommends for heap and stack size. This will reduce the number of storage calls. After you have tuned the storage, remove this option. If you significantly change what is running in the address space, use RTPSTG to see if you need to change the storage definitions.

Java

Use Java Health centre to look at the Java within your flows, identify the hot spots and see if you can reduce the CPU.

Profile the application

Use tools like IBM Application Performance Analyzer for z/OS (5655-W71) to identify where CPU and elapsed time is being spent. Work to minimise the expensive components.

Preparing a measurement

You need to be able to easily check the run was successful. For example

the number of messages out = number of messages in
the number of messages on the error queue is zero
there were 0 rollbacks
the WMQ queues did not have over 100 messages per queue
each request was processed within 1 second.
there were no application errors reported

Process a fixed number of messages. Check the WMQ and DB2 statistics to ensure you have the expected number of requests. For example,

If you process 1000 requests and processed 2000 identical select statements, this means there are two DB2 requests per input message. Can the result of the first request be saved and reused and so eliminate the need for the duplicate select?
Are there multiple hops between flows using WMQ. Can these flows be combined to eliminate these hops. Each hop will require taking data, flattening it into a message , putting and getting the message, and parsing the message. These can significantly add to the costs if done repeatedly.

You may need to set up WLM reporting groups so you can record the CPU used by each address space, or group of address spaces.
Determine how you will measure CPU per requests. Do you need to remove costs due to loading queues, and your monitoring tools?
Do you need to estimate the CPU used by dummy code, which replaces functions you do not have?

Running the test

Start with a low number of broker flow instances and increase them as required. Sometimes reducing the number of instances can improve the throughput as it may decrease (DB2) contention on a resource
Ensure that WMQ queues have low numbers of messages - perhaps less than 200. If the depth of a queue continually increases the test is not valid. Change the number of instances, check persistent messages are being processed within sync point by looking at the WMQ stats, or throttle the work coming in.
Identify where the CPU and elapsed time is being spent and try to reduce them
Check the DB2 buffer pools to make sure they are large enough
Use DB2 accounting to identify how much of the total CPU/request is in DB2 and compare with the total cost per request
Use DB2 accounting to identify how much of the elapsed time/request is in DB2, and compare with the average elapsed time per request
From the DB2 accounting, check the ODBC statements are being cached.
Identify any high use stored procedures and work to reduce the complexity
If WMQ buffer pools fill up move queues with a high put rate, but low get rate to their own page set and buffer pool
Monitor WMQ buffer pools filling up from messages on the console
Use a 1 minute SMF interval during the tests
During the initial phase of the benchmark you can make changes to several areas before each run. When you start getting to running the benchmark you should only change one thing from the previous run. You need at least three runs producing the same results to be considered a golden run.
Use broker stats to identify the hot nodes. Investigate why any CPU time is high.
As a general approach reduce bottlenecks so the solution scales and you can use the LPAR at over 90% utilised, then focus on reducing CPU usage.
Maintain a spread sheet to keep track of changes to the environment/applications and progress.
For 'golden runs' keep ( and offload) the reports from RMF, DB2,MQ and broker so they can be referred to at a later date.
If you make a change which has a big impact, you should ensure that the configuration is optimal, for example you might need to change the number of instances to stop queue depth from building up.

Trademarks
The following terms are trademarks of International Business Machines Corporation in the United States, other countries, or both:

DB/2
Language Environment
WebSphere
z/OS

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

Original Publication Date

26 June 2012

[{"Product":{"code":"SSKM8N","label":"WebSphere Message Broker"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Performance","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"8.0;7.0;6.0;5.0","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Tips

Running a WebSphere Message Broker performance benchmark on z/OS

White Papers

Abstract

Content

Original Publication Date

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?