Improve the performance of your WebSphere Business Integration Message Broker V5 message flow
Common causes of performance problems
Problems with the performance of a WebSphere Business Integration Message Broker message flow usually take one of two forms. The first and most obvious is that, for some reason, the required message throughput cannot be achieved. The second is that, although the required level of message throughput is being achieved, the CPU cost is far greater than expected. When the level of performance from a message flow falls short of expectations, it is usually because of one of the following reasons:
- A lack of resources with which to run the message flow
- A poorly configured environment
- Poor response time from a dependent application
- An inefficient processing algorithm for the message flow
- Inefficient or excessive ESQL processing within the message flow
Let's look at each of these factors briefly.
Lack of resources to run the message flow
Message processing with a message broker is typically a CPU-intensive process. This is understandable because the processing involves parsing of incoming messages, manipulation of data in the message, storage of the data, and creation of one or more output messages. Therefore, it is important to ensure that there is sufficient CPU available to run a message flow. For example, if a CPU is already 90% busy and your message flow requires 30% CPU to process messages at the required rate, it will not be possible to process the messages without some impact on response times. Run the same message flow on a system which has no other work running on it and you'll get better results. You need to ensure that there is sufficient CPU to run a message flow before conducting any performance tests. If there is not sufficient CPU during peak times, considering running performance testing at off-peak times or on another machine.
There are similar issues for I/O processing. If there is already a high level of persistent message processing taking place within the broker queue manager, its ability to process an additional high level load of persistent messages is going to be significantly limited.
Investigations into resource usage (CPU, memory, and I/O capacity) require the use of a system level monitor to determine the overall level of activity on a machine, as well as the resources available to and used by a Message Broker and the message flows it is running. For more information on how to use a system monitoring, see Tuning a WebSphere MQ Integrator Broker. This information is not covered in this article.
A poorly configured environment
Even a message flow with the most efficient algorithm and coding will struggle to operate efficiently in a poorly configured environment. The message broker and its dependent components, such as the broker database manager and broker queue manager, must be well tuned. Otherwise, what is seen as poor message flow performance may be due to the configuration of the broker queue manager, or another component. See Tuning a WebSphere MQ Integrator Broker for details on how to tune a message broker and its associated components. For the purposes of this article, we'll assume that the environment is properly configured.
Poor response time from a dependent application
When a sequence of message flows or applications interacts with each other, you need to be aware of the response time and message throughput provided by all of the processing components. You may find that what you thought was a problem with one message flow is, in fact, a problem with another dependent message flow or application. To detect this type of problem, you'll have to investige the runtime performance of the message flows and connected applications.
An inefficient processing algorithm for the message flow
A good algorithm for message processing is essential in developing well-performing message flows. By a good algorithm, we mean the choice and sequencing of processing nodes within a message flow. A poor algorithm often leads to significant, unnecessary processing by a message flow. For example, parsing an input message multiple times is not recommended, but sometimes happens, especially in message flows which have to process more than one type of message. It is important, therefore, that you fully understand the facilities provided by WebSphere Business Integration Message Broker and that you are aware of current design best practices.
Inefficient or excessive ESQL processing in the message flow
There are often a number of different ways to implement a processing algorithm in ESQL within a Compute node. It is important that you find the most efficient in order to get the highest level of performance possible from a message flow. In order to identify and resolve such issues, you'll need to investigate the runtime performance of the message flow.
From the previous sections, you can see that two types of investigation are required to identify the cause of a performance problem, regardless of whether it is caused by poor throughput or excessive resource usage. The first is to monitor the use of resources at the system level and ensure that there are sufficient resources available for the message flows to run. The techniques for doing this and the tools to use are discussed in Tuning a WebSphere MQ Integrator Broker.
The second type of investigation is to examine runtime performance in the message broker. You can do this by using the facilities that are provided by WebSphere Business Integration Message Broker, WebSphere MQ Integrator, and MQSeries Integrator. It is recommended that you examine the runtime performance in the Message Broker first. Once you have an efficient message flow, you should then ensure that it has sufficient CPU available to process messages. In this article, we'll assume that the message flow is functionally correct and that there are no bugs or error processing taking place.
WebSphere Business Integration Message Broker, WebSphere MQ Integrator and MQSeries Integrator provide a trace facility in all versions of the product. As its name implies, the trace can be used to monitor message processing activity. It can provide details of message flow execution down to the individual ESQL statement level. Additionally WebSphere Business Integration Message Broker V5 introduced a new facility called Accounting and Statistics. This facility helps you understand the processing profile of a broker or execution group at the message flow, node, or terminal level. Both Trace and Accounting and Statistics are briefly described in Debug Facilities. This is then followed by an example of how both trace and accounting and statistics can be used to debug a performance problem in Identify a Performance Problem: An Example.
Trace and Accounting and Statistics are useful in identifying a number of problems, such as poor response times from dependent applications or subsystems such as databases; inefficient or excessive processing within the message flow, and, in some cases, highlighting an inefficient algorithm, although this type of problem is usually discovered as a result of investigating message flow operation in detail rather than as a direct output from the facilities.
This section describes the Trace and Accounting and Statistics facilities provided with WebSphere Business Integration Message Broker V5, WebSphere MQ Integrator V2.1, and MQSeries Integrator V2.
The trace facilities provided with WebSphere Business Integration Message Broker, WebSphere MQ Integrator and MQSeries Integrator consist of product trace and the trace node.
There are two main forms of product trace available, user trace and service trace. There is also an ODBC trace, which is typically used only to trace interactions with a database and only when investigations point to specific issues with database access. User trace can be used for message flow debugging and profiling. With analysis it enables you to determine for example which nodes have the highest elapsed times. Service trace provides a much more detailed trace than user trace and is typically used only to resolve product behavior issues. Service trace is very unlikely to be used to resolve a performance problem. We will focus on user trace in this article, although the techniques presented can be applied to other forms of trace.
The collection of trace data is optional. By default tracing is inactive and generates no overhead. You can start and stop tracing using the mqsichangetrace command or the Message Broker Toolkit. You can activate tracing for the Message Broker, the Configuration Manager, or the Message Broker Toolkit. You don't need to change a message flow in order to use product trace. Trace information is written to a trace file. In order to be read, this file must be processed using the mqsireadlog and mqsiformatlog commands. The output from the mqsireadlog command is a file in XML format which must be processed with the mqsiformatlog command to produce a readable report.
Trace records the processing sequence of a message through a message flow. It does not show the contents of a message being processed. To show message contents, you must use the trace node in the message flow.
You can use the trace node to generate trace records that include text, message content, and date and time information, to help you to monitor the behavior of the message flow. You can specify the location to which the trace is written. This can be the user trace file, a specified file or the local error log. In order to use trace nodes, you must modify the message flow to add and remove the trace nodes. Using the trace facility may be most appropriate in situations where information on execution is required, but it is not possible to change the message flow.
It is worth noting that the trace node results in a processing overhead if it is connected to terminals along which messages travel. This is true even if no trace is written by the trace node. Therefore, it's recommended that you ensure that no trace node has its input terminal connected to an active path through the message flow unless you specifically want to do this for error reporting or problem diagnosis.
In this article we'll focus on the use of trace in the Message Broker. For detailed instructions on how to collect and format trace, consult the product documentation. This article focuses on the use of trace for analysis rather than its collection.
The use of product tracing to detect performance problems enables you to see the dynamic nature of message flow execution. Although a code review of a message flow may be useful in highlighting potential problems, dynamic analysis is the more useful because it enables you to see what is happening at message processing time. Such analysis is very difficult to undertake in real time and so trace information is captured during the course of execution so that you can see what was happening at that time. This is not an ideal approach since the overhead of trace can be significant, depending on the type of trace, and it will cause distortion of true processing times. However, though not ideal, using trace facilities is a practical way to observe the relative costs of different pieces of processing.
Accounting and Statistics Facility
Message flow accounting and statistics data records dynamic information about the runtime behavior of a message flow. For example, this data indicates how many messages are processed and how large those messages are, as well as CPU usage and elapsed processing times. Accounting and statistics data can provide information at the message flow, thread, node and terminal level. This data has a wide range of uses, including performance analysis, message flow profiling and charge back based on usage. This facility is available only in WebSphere Business Integration Message Broker V5.
You can collect two types of data: snapshot and archive. Snapshot data is collected for an interval of approximately 20 seconds. The exact interval depends on the level of activity in both the system and message broker. At the end of an interval the data is written to the destination and the next interval is started. Archive data is collected over a longer period, which can be anything from 10 to 14400 minutes (240 hours). The time is specified at broker definition, but you can change it using the mqsichangebroker command.
You can collect accounting and statistics data only for messages that start with an MQInput node. Message flows that start with a custom input node, an MQeInput node, or real-time Input node will not collect accounting and statistics data.
When requesting collection of accounting and statistics data, you can specify where to send the data. There are a number of options:
Publish the data using the Publish/Subscribe facility. The data is
published as an XML message and is available to subscribing
applications that register with the appropriate topic. The topic
name depends on the facility being monitored. The topic name has
$SYS/Broker/<brokerName>/StatisticsAccounting/<recordType>/ <executionGroupLabel>/<messageFlowLabel>where recordType is set to snapshot or archive, and the broker, execution group, and message flow names are specified according to the subscriber's requirements.
- Write the data to the user trace log. The data is written even if trace is currently turned off.
- Write the data to SMF (z/OS only). WebSphere Business Integration Message Broker V5 writes accounting and statistics data as Type 117 records. These records can be processed by any tool or facility that is capable of processing SMF records.
The collection of accounting and statistics data is optional. By default it is switched off. You must specifically request it at the message flow, execution group, or message broker level. You can start or stop collection dynamically using the mqsichangeflowstats command.
For detailed instructions on how to collect and format accounting and statistics information, refer to the product documentation. This article focuses how to use accounting and statistics data for analysis rather than how to collect the data.
Identify a Performance Problem: An Example
This section shows how we used the trace and accounting and statistics facilities to identify a performance problem in a message flow called Order Routing. The message flow was failing to achieve its performance target. Message throughput was only around 0.1 messages per second. A reasonable rate for this message flow would be at least 100 messages per second given the message flow complexity and the speed of the machine on which it was running. To help assess what is a reasonable rate for a message flow to process messages, consult the WebSphere Business Integration Message Broker performance report relevant to the platform on which you are running for examples of similar types of processing.
The message flow in our example consists of one MQInput node, one Filter node, one Compute node and three MQOutput nodes. The message flow structure is shown in Figure 1.
A 6K XML message is received in the MQInput node called IN. Processing passes to a filter node called ISORDER, where a simple ESQL expression is applied to the message. In this test, all messages were sent out on the True terminal of the filter node to a compute node called COMPUTE. Finally, the message is written to an MQOutput node called ORDER_OUT.
Figure 1. Order Routing Message Flow
The processing used in the compute node called COMPUTE is show in Listing 1:
Listing 1. ESQL used in the node COMPUTE
CREATE COMPUTE MODULE OrderRouting_Compute1 CREATE FUNCTION Main() RETURNS BOOLEAN BEGIN CALL CopyEntireMessage(); DECLARE i INTEGER; DECLARE total INTEGER; SET total = 0; SET i = 1; WHILE i <= (CARDINALITY(InputRoot.XML.Test.Product)) DO SET total = "total" + CAST(InputRoot.XML.Test.Product[i].Price as INTEGER); DECLARE j INTEGER; SET j = 0; WHILE j < 10000 DO SET OutputRoot.XML.Test.TotalPriceChar = CAST(total as CHARACTER) || 'Dollars'; SET j = "j" + 1; END WHILE; SET OutputRoot.XML.Test.TotalPrice = total; SET i = "i" + 1; END WHILE; RETURN TRUE; END; CREATE PROCEDURE CopyEntireMessage() BEGIN SET OutputRoot = InputRoot; END; END MODULE;
The first stage in identifying the cause of the problem was to look at the CPU utilization of the execution group running the message flow. We did this to ensure that the message flow was capable of fully utilizing a CPU (and so was not I/O bound) and that there was sufficient spare CPU in the system to allow the message flow to utilize one processor. During the tests, this was the only message flow running. There was no other work running on the system, so shortage of CPU was not an issue. When examining the ability of a message flow to fully utilize a processor, it is important to ensure that there are always messages in the input queue. If not, the message flow will become idle and CPU utilization will fail to reach the maximum. When running such tests, you also need to ensure that all debug processing and trace nodes along the critical path have been removed from the message flow.
Having made sure that there was sufficient CPU capacity to run the message flow, we needed to proceed with more detailed analysis. The fact that the message flow fully utilized one CPU and still got such a very low message rate pointed very strongly to some excessive processing in the flow. Although the exact location of the problem is not obvious yet, it should become more obvious out as we look more deeply into what is happening.
At this point, we took two different approaches. The first, based solely on the use of user trace, is described in the section Isolate the problem using trace. The second approach using a combination of accounting and statistics and user trace is presented in the section Isolate the problem using accounting and statistics.
Isolate the problem using trace
Two traces were taken of the message flow processing the same input messages. The first trace was a user trace of type "normal." The second was a user trace of type "debug." When collecting such traces, it's best to use the same messages each time so that you have details of exactly the same execution path being taken. Where there are significantly different types of message being processed within the same message flow, you may to want to analyze each type of message separately, taking a pair of traces for each case. In our case, two messages were processed in the message flow for each trace. You may want to run with more than this in order to see if there is any significant variation in the elapsed times of nodes.
The user normal trace provides a way of observing message flow entry, exit and node entry, and exit with timestamps with the least possible overhead. The user debug trace is used to understand the details of execution within a node once a processing intensive node is identified. It is not used to determine node transition times because of its high overhead, but it does provide a great deal of detail and will show every node and ESQL statement being executed, which the user trace does not.
Where the sequence of processing involves more than one message flow and possibly other applications, it is best to capture the whole sequence of processing in one run. By doing this, you'll capture the response times of other message flows and applications, which is necessary to ensure that the problem is not actually poor response time from another message flow or application. It is recommended that you use only one instance of each message flow.
Once we collected the traces, we formatted them using the mqsireadlog and mqsiformatlog commands.
We checked the formatted trace to ensure that the messages were being processed as expected. Check that the sequence of processing nodes listed in the trace is as expected. If not investigate the reason and recollect the traces. In this case, the processing sequence was as expected. The message was received by the MQInput node IN. It was then propagated to the TRUE terminal of the ISORDER filter node. Following this, the message was processed by the COMPUTE node before finally being sent to the MQOutput node ORDER_OUT.
The next step in the analysis was to determine where the time was being
spent. Key to this is the ability to recognize the entry to and exit from
a node. Message flow invocation in the trace is marked by the string
Message received and propagated to 'out' terminal of MQ Input node.
Node exit is marked by either of the strings
Message propagated to or
Message successfully output. Message flow exit
is marked by the start of the next message flow. Otherwise, it's not
possible to recognize message exit without knowing the message flow node
With a very simple message flow, you may be able to identify processing node and message flow transition times by visually scanning the formatted trace. With large message flows or where there are multiple message flows involved, it's much better to automate the process. You can do this easily with a script such as the Perl script mqsitraceanalyse, which is available as a download on this page. This script is an example of the type of processing that can be performed on the user normal trace of a message flow that uses WebSphere MQ input and output messages.
It makes sense to first run the script against the user normal script as this requires the least overhead. The overhead of the user debug trace is significantly higher than that of the user normal because it traces flow execution in much more detail, covering the execution of every ESQL statement. This distorts the timings significantly.
You can use the mqsitraceanalyse script in one of two ways. The first way is to use it to mark the flow entry and exit points in the trace and calculate the elapsed times in nodes. The output is written to a text file and viewed using an editor. The second way is to have the script calculate processing node and message flow elapsed times as above, but to produce the output in a comma separated variable (csv) format. This allows the data to be loaded into a spreadsheet, such as Lotus®; 1-2-3 or Microsoft® Excel, for further analysis.
mqsitraceanalyse takes the following parameters:
- --f specifies the input file name
- --l specifies the level of output. Valid values are 0 and 1. 0 indicates that only node exit points will be output. 1 indicates that all trace read will be output. The default value is 0.
- --t specifies which thread the script should process. The default is all threads in the input file.
- --s specifies that the output file should be in a csv format. In this case, only message flow entry, exit, and node exit are reported. y indicates that a csv file should be output. n indicates that annotated trace should be output. The default value is n.
- --v specifies which product the trace is taken from. Valid values are 5 and 2. 5 indicates that the WebSphere Business Integration Message Broker V5 trace is to be analyzed. 2 indicates that WebSphere MQ Integrator V2.1 or MQSI V2 trace is to be analyzed. The default value is 5.
When using the script, it is best to process the trace from one thread at a time. Look at the formatted trace before running the script and determine which thread contains the processing that you wish to analyze.
The mqsitraceanalyse script was used to analyze the user normal trace of
the Order Routing flow by issuing the follwing command:
mqsitraceanalyse.pl --f=OrderRouting_usernormal.txt --t=27.
This generated the output shown in Listing 2.
Listing 2. Sample Analyzed Trace
**************** Start of Flow at 11:06:58.746192 ***** 11:06:58.746192 27 Message received and propagated to out terminal of MQ Input node OrderRouting.IN. 11:06:58.747360 27 Message propagated to true terminal of filter node OrderRouting.ISORDER. 0000001168 ***** Left Node - Elapsed time since last node is 0000001168 micro seconds ***** 11:07:14.711770 27 Message propagated to out terminal of compute node OrderRouting.COMPUTE. 0015964410 ***** Left Node - Elapsed time since last node is 0015964410 micro seconds ***** 11:07:14.714373 27 Message successfully output by output node OrderRouting. ORDER_OUT to queue ORDER_OUT on queue manager . 0000002603 ***** Message Put - Elapsed time since last node is 0000002603 micro seconds ***** 0015968181 ***** End of Flow at 11:07:14.714373 - Elapsed time of flow is 0015968181 micro seconds, 15.968181 seconds ***** **************** Start of Flow at 11:07:14.714895 ***** 11:07:14.714895 27 Message received and propagated to out terminal of MQ Input node OrderRouting.IN. 11:07:14.715862 27 Message propagated to true terminal of filter node OrderRouting.ISORDER. 0000000966 ***** Left Node - Elapsed time since last node is 0000000966 micro seconds ***** 11:07:30.563408 27 Message propagated to out terminal of compute node OrderRouting.COMPUTE. 0015847546 ***** Left Node - Elapsed time since last node is 0015847546 micro seconds ***** 11:07:30.566096 27 Message successfully output by output node OrderRouting. ORDER_OUT to queue ORDER_OUT on queue manager . 0000002688 ***** Message Put - Elapsed time since last node is 0000002688 micro seconds ***** 0015851201 ***** End of Flow at 11:07:30.566096 - Elapsed time of flow is 0015851201 micro seconds, 15.851201 seconds *****
The analyzed trace above contains the result of two messages being
processed in the message flow. The start of a message flow invocation is
marked with a comment of the form,
**************** Start of Flow at hh:mm:ss.mmmmmm *****.
The end of a message flow node is marked with a comment of the form,
***** End of Flow at hh:mm:ss.mmmmmm - Elapsed time of flow is mmmmmmmm micro seconds, ss.mmmmmm seconds*****.
The end of a processing node is marked with a comment of the form,
***** Left Node - Elapsed time since last node is mmmmmmmm micro seconds *****.
The elapsed times for the processing nodes and message flows are repeated in the first column so that you can quickly scan the times when viewing the output.
The output in Listing 2 shows that each of the invocations of the message flow took around 16 seconds to complete. With such a high elapsed time, it is no wonder that the message rate was so low. Examination of the individual node elapsed times shows that the majority of the time is spent in the node called COMPUTE. This is very useful information, because it isolates the problem to one processing node. Where multiple nodes have significant elapsed times, start with the nodes that have the largest times. It doesn't make sense to spend time trying to improve the performance of a node which only took 1% of the time.
Having identified the node with the highest elapsed time, we now needed to understand why the time was so high. This was beyond the scope of the user normal trace and we now needed to use the user debug trace in our analysis.
By examining the user debug trace to see what was happening within the node
COMPUTE, we observed that the statement
SET OutputRoot.XML. Test.TotalPriceChar = CAST(total as CHARACTER) || 'Dollars';
was being repeatedly executed. It should have been executed only once. The
ESQL source shows that it would have been executed 1000 times had we
counted all of the entries in the user debug trace. This is a programming
error. The WHILE loop around the SET statement had been placed there in
error, when the message flow was developed. Removing the WHILE loop and
re-running the message flow resulted in a message rate of over 100
messages per second. Problem solved!
In this case, it was easy to identify the cause of the problem. In a more
complex message flow or where there are multiple message flows in
sequence, identifying the most expensive nodes may not be as
straightforward. This is where the second use of the script comes in
handy. Running the script with the
produces a comma separated variable (csv) file, which can be loaded into a
spreadsheet for further analysis. The command used to generate such a file
mqsitraceanalyse.pl --f=OrderRouting_usernormal.txt --s=y --t=27
The data in the csv format file consists of eight columns, as follows:
- Column 1 contains a sequence number. This is a sequentially allocated number for each line of reduced trace information that is produced. This value is useful for sorting the trace back into its original sequence without having to reload the data. This number is allocated by mqsitraceanalyse.
- Column 2 contains the node name that was extracted at the time that node entry and exit points were detected.
- Column 3 contains the node transition time in microseconds that was calculated using time stamps in the user normal trace.
- Column 4 contains a description that was added to each line of trace generated as a result of node exit.
- Column 5 contains the elapsed time of the message flow in seconds. This is calculated as the trace is analyzed.
- Column 6 contains a description that was added to each line of trace generated as a result of message flow exit.
- Column 7 contains the timestamp for the event in the user trace.
- Column 8 contains the thread number on which the message flow was running.
Once the output file has been loaded into a spreadsheet, you can analyze it. Table 1 shows output that has been loaded into a spreadsheet.
Table 1. Processed trace loaded into a spreadsheet
|Sequence Number||Node Name||Node Elapsed Time||Node Entry/Exit||Message Flow Elapsed Time||Message Flow Entry/Exit||Timstamp||Thread Number|
|-||Number of Message Flow Entries||2||-||-||-||-||-|
|-||Number of Message Flow Exits||2||-||-||-||-||-|
|-||Number of Nodes||6||-||-||-||-||-|
|-||Maximum Node Elapsed Time||15964410||-||-||-||-||-|
|-||Minimum Node Elapsed Time||966||-||-||-||-||-|
|-||Average Node Elapsed Time||5303230||-||-||-||-||-|
The data loaded into the spreadsheet consists of two invocations of the OrderRouting message flow. Some simple analysis has been performed on the processed trace. This is placed at the top of the spreadsheet just below the column headings. The analysis consists of counting the number of message flows and processing nodes, as well as reporting the maximum, minimum, and average node elapsed times. The number of message flows was determined by counting the number of Flow Entry and Flow Exit values in column 6. The number of nodes was determined by counting the number of Node Exit values. These counts are useful for ensuring that you have all of the required message flows and nodes in a trace and that the trace has not been truncated. In this case, two instances of Flow Entry and Flow Exit are counted, showing that we have two complete message flows. When loading analyzed trace into the spreadsheet, it is best to minimize the amount of data that is loaded and remove any trace entries that are not relevant to the message flow under investigation.
Using a spreadsheet also allows you to easily sort the data based on a particular column, so that you could, for example, find the processing node with the highest elapsed time in a very long message flow or sequence of message flows. By re-sorting on the sequence number you can get back to the original sequence.
As with the annotated trace, you can see that the two executions of the message flow took around 16 seconds and that the majority of the time in each case was spent in the node COMPUTE. Again, we need to use the user debug trace to understand the sequence of processing within the node with the highest elapsed time.
In this example, the cause of the high response time and, therefore, poor message throughput was one statement being repeatedly executed inside a loop when it should not have been. Causes will vary. Some common causes for high transition times include:
- Processing loops in ESQL being executed many more times than was originally expected.
- Database access times taking significantly longer than expected. This could apply equally to read, update, or insert activity.
- A message being parsed many times unnecessarily.
- A message being processed a byte at a time as part of a BLOB, when it would be better being parsed using the MRM, for example.
- Filter expressions taking a long time to be evaluated. This could indicate that the field being selected was at the end of a large message, requiring a large amount of parsing to be performed.
Whatever the reason, it is important to identify the cause and correct it. This may involve modifying the message flow or even the messages being processed. Having corrected the most obvious problems, it is well worth repeating the trace capture and analysis phases to ensure that any changes made have had the desired effect and that there are no outstanding problems.
Isolate the problem using accounting and statistics
The first part of our analysis involved collecting and formatting accounting and statistics data and user debug trace. The accounting and statistics data is used to identify where the performance problem lies. The user debug trace is used to allow us to see precisely which ESQL statements are being executed within a processing node and how many times those statements are run.
We used the same message flow for this part of our analysis as with the trace based analysis.
We issued the command
mqsichangeflowstats VKB2BRK -a -g -j -c active -r -t basic -n basic -o xml
to the broker, called VK2BRK, to start collection of the data.
We specified the following options:
- -a to specify archive mode
- -g to specify that data should be collected in all execution groups
- -j to specify that data should be collected for all message flows
- -c active to turn on the accounting and stats data collection
- -r to reset the archive log
- -t basic to specify that basic thread level data is to be recorded
- -n basic to specify that basic level node data is to be recorded
- -o xml to publish the output in XML format
For a more detailed explanation of the mqsichangeflowstats command, refer to the WebSphere Business Integration Message Broker help.
To ensure that the required data is being collected, you can use the
mqsireportflowstats command. In our case, issuing the command
mqsireportflowstats VKB2BRK -a -s -g -j
resulted in the following output:
Listing 3. Sample Analyzed Trace
BIP8187I: Statistics Archive settings for flow OrderRouting in execution group default - On?: inactive, ThreadDataLevel: none, NodeDataLevel: none, OutputFormat: usertrace , AccountingOrigin none. BIP8187I: Statistics Snapshot settings for flow OrderRouting in execution group default - On?: inactive, ThreadDataLevel: none, NodeDataLevel: none, OutputFormat: usertrace , AccountingOrigin none.
The options used on the mqsireportflowstats command were as follows:
- -a to specify that details on archive settings were required
- -s to specify that details on the snapshot settings were required
- -g to specify that the details for all execution groups were required
- -j to specify that details for all message flows were required
To have the broker publish the accounting and statistics information it has
collected, you can either wait for the archive interval to expire or
reissue the mqsichangeflowstats command changing one of the
parameters. For example, change nodelevel from basic to
advanced. In this example, we had the broker publish the
accounting and statistics information by issuing the command
mqsichangeflowstats VKB2BRK - a -g -j -c active -t basic -n advanced -o xml.
Following this command, the broker continues to produce accounting and
statistics information, but with the new parameters. If you want to stop
accounting and statistics, follow the command with the -c inactive
option. For example,
mqsichangeflowstats VKB2BRK -a -g -j -c inactive -t basic -n basic -o xml.
When accounting and statistics information collection was activated, we
requested it in the format of an XML message published using the
Publish/Subscribe mechanism. This results in a message being published by
the Message Broker to a topic whose name is of the form:
The exact topic name under which the information is published depends on
the name of the broker, the type of statistics, and the execution group
and message flow label values.
In order to obtain the accounting and statistics information, you need to
issue a subscription to the Message Broker on the relevant topic, and then
issue an MQGET against the publish queue that is named in the
subscription request. A subscription for
was issued to the Message Broker VKB2BRK by sending a WebSphere MQ message
to the SYSTEM.BROKER.CONTROL.QUEUE containing an RFH2 header with a
publish/subscribe folder. In this case the wildcard character # was used
after the StatisticsAccounting level to specify that the application
should receive all publications for all execution groups and message flows
in the broker VKB2BRK.
You need a program to issue the subscription request. You may have an existing application that issues RFH2 messages that can be reused. If not, consider using the utility provided in SupportPac IH03. This utility is a GUI-based program that helps with developing and testing WebSphere MQ and WebSphere Business Integration Message Broker V5. Test messages are stored as files, which are then read by the application and written to a WebSphere MQ queue. To receive accounting and statistics information using this utility do the following:
- Start the RFHUTIL program if running on the same machine as the Message Broker. If the Message Broker is remote use the RFHUTILC program.
- Ensure that you have connectivity to the Message Broker queue manager by writing a message to an application queue and retrieving it using the Main tab of RFHUTIL. You don't need to specify any message data to write a message, just put values in the Queue Manager Name and Queue Name fields. Then select Write Q to write a message and Read Q to read the message. Resolve any connection problems between RFHUTIL and the Message Broker.
- With connectivity established, the next step is to issue the subscription. To do this, click the PubSub tab of RFHUTIL.
- Click a request type of Subscribe.
- Enter the name of the topic that the accounting and statistics are published to. For example $SYS/Broker/VKB2BRK/StatisticsAccounting/Archive/ default/OrderRouting.
- Enter the name of the Message Broker queue manager in the Queue Manager to Connect to field.
- Enter the name of the queue that accounting and statistics messages are to be written to in the Publish Queue field.
- Click or enter any other values/choices that you want, such as retained for retained publications.
- Select the Process Request tab of RFHUTIL. Note: When using the RFHUTILC version of the program you may have to use the Close Q option on the Main tab in order to complete the sending of the message.
- Run the message flow and issue the mqsichangeflowstats commands to generate accounting and statistics information.
- Wait for 30 seconds after issuing the second mqsichangeflowstats command.
- Click the Main tab of RFHUTIL.
- Enter a value for the Message Broker queue manager name in the Queue Manager Name field.
- Enter a value in the Queue Name field of the application queue that was named in the subscription request, the publish queue.
- Click the Read Q tab. The message will then be read from the queue.
- Click the Data tab on RFHUTIL. The Message Data area of the tool should contain the requested accounting and statistics information.
- Initially the XML is unformatted. To format it click the XML button in the Data Format area of the window.
- You now have a formatted accounting and statistics XML message. If you want, you can save the data to a file for later analysis by using the Write File button on the Main tab of RFHUTIL.
The great benefit of using this utility is that you don't need to write any application code. By issuing more Read Q requests against the application queue named in the subscription request you can retrieve further sets of accounting and statistics information as it is published.
Listing 4 shows the accounting and statistics message which was retrieved following the processing of one message through the OrderRouting message flow.
Listing 4. Accounting and statistics for the OrderRouting message flow.
<WMQIStatisticsAccounting RecordType="Archive" RecordCode=" StatsSettingsModified"> <MessageFlow BrokerLabel="VKB2BRK" BrokerUUID="c8e82ffb-f900-0000-0080- abb130c11c36" ExecutionGroupName="default" ExecutionGroupUUID="e24e017b-fa00-0000-0080-abb130c11c36" MessageFlowName=" OrderRouting" StartDate="2004-03-02" StartTime="17:12:40.951016" EndDate="2004-03-02" EndTime="17:13:49.721515" TotalElapsedTime="16230860" MaximumElapsedTime="16230860" MinimumElapsedTime="16230860" TotalCPUTime=" 11510642" MaximumCPUTime="11510642" MinimumCPUTime="11510642" CPUTimeWaitingForInputMessage="1462" ElapsedTimeWaitingForInputMessage="52537801" TotalInputMessages="1" TotalSizeOfInputMessages="6168" MaximumSizeOfInputMessages="6168" MinimumSizeOfInputMessages="6168" NumberOfThreadsInPool="1" TimesMaximumNumberOfThreadsReached="1" TotalNumberOfMQErrors="0" TotalNumberOfMessagesWithErrors="0" TotalNumberOfErrorsProcessingMessages="0" TotalNumberOfTimeOutsWaitingForRepliesToAggregateMessages="0" TotalNumberOfCommits="1" TotalNumberOfBackouts="0" AccountingOrigin="Anonymous"/> <Threads Number="1"> <ThreadStatistics Number="21" TotalNumberOfInputMessages="1" TotalElapsedTime ="16230860" TotalCPUTime="11510642" CPUTimeWaitingForInputMessage="1462" ElapsedTimeWaitingForInputMessage="52537801 " TotalSizeOfInputMessages="6168" MaximumSizeOfInputMessages="6168" MinimumSizeOfInputMessages="6168"/> </Threads> <Nodes Number="6"> <NodeStatistics Label="COMPUTE" Type="ComputeNode" TotalElapsedTime="16227050 " MaximumElapsedTime="16227050" MinimumElapsedTime="16227050" TotalCPUTime="11507949" MaximumCPUTime="11507949" MinimumCPUTime="11507949" CountOfInvocations="1" NumberOfInputTerminals="1" NumberOfOutputTerminals="2"/> <NodeStatistics Label="IN" Type="MQInputNode" TotalElapsedTime="132" MaximumElapsedTime="132" MinimumElapsedTime="132" TotalCPUTime="102" MaximumCPUTime="102" MinimumCPUTime="102" CountOfInvocations= "1" NumberOfInputTerminals="0" NumberOfOutputTerminals="3"/> <NodeStatistics Label="ISORDER" Type="FilterNode" TotalElapsedTime="965" MaximumElapsedTime="965" MinimumElapsedTime="965" TotalCPUTime="605" MaximumCPUTime="605" MinimumCPUTime="605" CountOfInvocations= "1" NumberOfInputTerminals="1" NumberOfOutputTerminals="4"/> <NodeStatistics Label="ORDER_OUT" Type="MQOutputNode" TotalElapsedTime="2713" MaximumElapsedTime="2713" MinimumElapsedTime="2713" TotalCPUTime="1986" MaximumCPUTime="1986" MinimumCPUTime="1986" CountOfInvocations="1" NumberOfInputTerminals="1" NumberOfOutputTerminals="2"/> <NodeStatistics Label="RETURN_OUT" Type="MQOutputNode" TotalElapsedTime="0" MaximumElapsedTime="0" MinimumElapsedTime="0" TotalCPUTime="0" MaximumCPUTime="0" MinimumCPUTime="0" CountOfInvocations="0" NumberOfInputTerminals="1" NumberOfOutputTerminals="2"/> <NodeStatistics Label="UNKNOWN_OUT" Type="MQOutputNode" TotalElapsedTime="0" MaximumElapsedTime="0" MinimumElapsedTime="0" TotalCPUTime="0" MaximumCPUTime="0" MinimumCPUTime="0" CountOfInvocations="0" NumberOfInputTerminals="1" NumberOfOutputTerminals="2"/> </Nodes> </WMQIStatisticsAccounting>
The accounting and statistics message consists of the MessageFlow, Thread, Nodes and NodeStatisitics folders within the WMQIStatisticsAccounting folder. Together they provide information about the message flow, its nodes, and the time to process messages.
The folder MessageFlow contains information about the broker name, broker UUID, execution group name, message flow name, the time the statistics were collected, information on overall CPU used, and so on. This allows you to identify the accounting and statistics details and ensure that you have the correct message.
Of most interest in this example are the NodeStatistics folders since they show what is happening at the node level. There are six such folders, one for each node in the message flow. Let's look at the folder for the node COMPUTE in more detail. This is shown in Listing 5.
Listing 5. NodeStatistics folder for the node COMPUTE in message flow OrderRouting
<NodeStatistics Label="COMPUTE" Type="ComputeNode" TotalElapsedTime="16227050 " MaximumElapsedTime="16227050" MinimumElapsedTime="16227050" TotalCPUTime="11507949" MaximumCPUTime="11507949" MinimumCPUTime="11507949" CountOfInvocations="1" NumberOfInputTerminals="1" NumberOfOutputTerminals="2"/>
The NodeStatistics folder contains the following information:
- The Label attribute, which identifies the node name.
- The Type attribute, which identifies the type of node. In this case, it is a ComputeNode.
- The TotalElapsedTime, MaximumElasedTime, and MinimumElasedTime attributes, which identify the total, maximum and minimum elapsed times in microseconds that the node took to process messages. In this case, they are all the same as only one message was processed.
- The TotalCPUTime, MaximumCPUTime, and MinimumCPUTime attributes, which identify the total, maximum and minimum CPU times in microseconds that the node took to process messages. In this case, they are all the same as only one message was processed.
- The CountOfInvocations attribute, which identifies how many messages have been processed in the node.
- The NumberofInputTerminals and NumberOfOutputTerminals, which identify how many input and output terminals the processing node has.
As we examine each of the NodeStatistics folders for the message flow, it becomes obvious that the majority of the time is spent in the node COMPUTE. This highlights the most expensive processing node. As with the trace-based approach, the next step is to consult the user debug trace to see why so much time is consumed in the node COMPUTE.
The accounting and statistics message also contains information at the thread level in the Thread and ThreadStatistics folders. In this case, we did not need this information to debug the problem.
This section has shown how you can identify problem nodes using the basic level of statistics. You may want to use the advanced level to get more information. With the advanced level, you get TerminalStatistics for each node. These give the count of messages processed for each terminal within a node. This information would be useful for understanding the path of execution through a message flow.
This article has shown you how to investigate the performance of a message flow and to understand where processing time is being spent within the message flow. Once you understand which nodes or pieces of ESQL are consuming the most CPU, you will be able to optimize the use of nodes, ESQL, or both in such areas and so improve the efficiency of the message flow.
Although you may have initially come at this subject from the need to resolve a performance problem, we recommend that you also use the techniques in this article to profile your most important message flows, even those without any obvious problems, to identify any hotspots and areas for improvement. By profiling message flows in such detail, you'll develop a detailed understanding of the resource usage pattern of each message flow. Such information is useful for capacity planning and will enable you to more accurately predict the resources required to run similar applications or to process an increased volume of messages.