Using HealthCenter to troubleshoot Java compute node performance in IBM Integration Bus V9
Geza Geleji 270004YEWY Visits (5692)
Through Java compute nodes, IBM Integration Bus version 9 allows arbitrary Java code to be executed within message flows, giving Java developers the freedom to transform messages in any way that can be formalized in said language. This post explains how to apply the troubleshooting features embedded in the IBM Java Virtual Machine (JVM) when investigating potential problems in code running within Java compute nodes.
The HealthCenter add-on of the
HealthCenter can connect to an appropriately configured IBM JVM through a TCP connection to download and display diagnostic data. It requires the TCP port and address of a HealthCenter agent running inside the JVM, to be provided by the user as inputs. The JVM starts the HealthCenter agent when it is called with the -Xhealthcenter command-line option. By default, agent port numbers are allocated starting from 1972; if the port is unavailable, the next one is tried until one that is available is found. For a particular JVM, the best way to find the port on which the agent is listening depends on the operating system (this is typically possible using utilities like netstat or lsof). In IBM Integration Bus, Java compute nodes always run in a process called DataFlowEngine. HealthCenter agents started by IIB bind to ports in the order they are started. The bipbroker process is started first, therefore, it normally binds to port 1972. The next process to start is biphttplistener, which thus binds to port 1973. The DataFlowEngine processes (one per Integration Server) are started last; they bind to ports 1974 and upwards.
We will use Memory Analyzer in this post to view some data captured by HealthCenter. Further information about these tools can be found in the IBM
Starting the HealthCenter agent
To start the HealthCenter agent, we need to create a start-up script for IIB in a special location. All the script needs to do is set an environment variable called IBM_JAVA_OPTIONS to a JVM-specific string. This will cause all IIB processes to run an embedded HealthCenter agent to which we may connect later using HealthCenter.
Although this post focuses on running the agent with a network connection from the HealthCenter GUI, it is also possible to run the agent in "headless" mode, meaning that a network connection is not necessary. In headless mode, the agent writes the captured data to a file instead of sending it over the network connection. Some of the interactive features such as triggering a thread / heap / system dump will not be available when using this option. For further details, please refer to Conf
To make sure that IBM Integration Bus starts the DataFlowEngine JVMs with the HealthCenter agent, we should create a file called setenv.cmd in the IIB profile directory (by default, C:\P
This causes all IBM Integration Bus JVMs to start the HealthCenter agent, allowing unsecured inbound TCP connections from HealthCenter.
When running IBM Integration Bus on Linux, a process quite similar to the above needs to be completed to ensure that IIB starts the HealthCenter agent. Let us create a file called setenv.sh in the IIB profile directory (/va
The above causes IIB to start the JVMs with the HealthCenter agent, permitting unsecured inbound TCP connections from HealthCenter. We need to make sure that the file is readable by the user running IIB, then source the mqsiprofile script into the current shell before starting the Integration Node. If a previous version of the file has already been sourced, we will need to start a new shell to be able to do this.
Create a file called setenv.sh in the IIB profile directory (/va
Again, make sure that the file is readable by the user running IIB, and source the mqsiprofile script into the current shell before starting the Integration Node.
The same effect can be achieved on z/OS by manipulating the ENVFILE (this is normally created in the comp
Although IBM Integration Bus runs a slightly different set of processes on z/OS, the Integration Service processes will still get assigned port numbers starting from 1974 by default.
Running the HealthCenter client
Once we have an IIB instance running with the embedded HealthCenter agents, it should be straightforward to connect HealthCenter to one of its processes. The following is one of the dialog boxes we are greeted with after starting the HealthCenter client inside IBM Support Assistant:
With the above setting, the client would connect to the DataFlowEngine process executing the first Integration Server of the IIB instance running on the same host as the client. To see how HealthCenter reacts to problems, let us create the following message flow:
Make sure that the on the "Input Message Parsing" property page of the HTTP Input node, the "Message domain" is set to XMLNSC. We may now try out HealthCenter and see how we can use it to solve problems.
Confirming the process being monitored
If we have any doubt later that our client is connected to the right HealthCenter agent, we may easily confirm this by adding a dummy inner class with a peculiar name to our Java Compute node:
After (re-)deploying the flow, all we need to do is navigate to the "Class histogram" view in the HealthCenter client GUI, press "Collect histogram data" and verify the presence of our dummy class:
There are some other interesting details, such as (among others) heap settings, installation and configuration directories, names and UUIDs of the Integration Node and Server, as well as the name of the queue manager, provided in the "Configuration" view of the "Environment" section, specifically, in the "Java Parameters" property:
Detecting a memory leak
Let us now enter the following code in the Java Compute node (this code is provided as-is and should only be used for learning purposes):
The above code is clearly defective. We store each message that passes through the node in a LinkedList; this constitutes a memory leak. To see how HealthCenter can help us detect it, let us deploy the flow to our Integration Server and point a single-threaded client to the address specified on the HTTP Input node. Using the Integration Server's embedded HTTP listener may provide a slightly better demonstration, as doing so would ensure that all functionality exercised by the test is contained within the same DataFlowEngine process. We should use the following sample XML message:
We shouldn't have to wait for long before the "Used heap (after collection)" view of the "Garbage Collection" section begins to present an alarming sight:
We may check on the "Summary" tab that there indeed have been numerous garbage collection cycles, yet the heap usage steadily keeps increasing. In the "Monitored JVM" menu of the HealthCenter client, we may select the "Request a dump..." option, then choose "System Dump" to get a complete snapshot of the Java heap. This snapshot is normally written to the "common/errors" subdirectory of the IBM Integration Bus configuration data directory, and we may analyze it using Memory Analyzer.
One of the very first reports that Memory Analyzer offers pinpoints the source of the leak, namely, the com.
We may delve deeper into the heap dump, and learn that there is a LinkedList object in this class referencing a very large number of MbMessage objects. Using this information, the problem should be easy to solve. Clearly, real-life problems sometimes prove to be more difficult to fix than this one.
Detecting lock contention and CPU time consumption imbalances
To construct our second example, let us modify the code in the Java Compute node by replacing
If we now run, say, twenty instances of the message flow with twenty concurrent clients, we may observe a heavy performance degradation in comparison to the case where the Java Compute node code doesn't include the above excerpt. Let us remark that the performance overhead of the "MEMORY LEAK" section is actually quite small compared to that of the "PERFORMANCE + LOCKING ISSUE" section, so the presence or absence of the former does not make a significant difference in terms of message throughput — at least not as long as there is enough heap available.
What can HealthCenter say to help us find the cause of the performance difference (supposing we don't already know...)? Let us first turn to the "Method profile" view of the "Profiling" section:
In the figure above,
When the "Tree (%)" is equal to the "Self (%)", this indicates that fact that the profiler doesn't trace calls any deeper: any code in methods called by this method are treated as part of the calling method itself. This is useful, because it typically applies to Java built-in code and we want to optimize our own code, not that of Java.
We can see that almost all CPU time is being spent in the java
The "Monitors" view of the "Locking" section shows another problem:
In this example, of 81854 successful attempts to get hold of a certain lock, the requester had to wait in 81852 cases because the lock was already owned by another thread. As shown in the "% miss" column, virtually all attempts to acquire the lock resulted in a miss. It is beyond the scope of this post to give advice on what miss rate is appropriate and what isn't, however, a high miss rate and/or hold time typically indicates that the code needs to be optimized. The column "Name" shows the name and type of the lock: it is often a Java object, represented by the name of its class. In our example, the lock is a Java class, in which case the class name is displayed. Note that the bottom five monitors in the above screenshot are simple types, and are thus represented by their Java type signatures.
This post explained how to start IBM Integration Bus with an embedded HealthCenter agent, and how to connect the HealthCenter GUI client to the agent. A simple check was described to allow us to ensure that the client is connected to the right agent. Through examples with artificial performance bottlenecks, we've seen how HealthCenter can be used to identify various common performance problems in Java compute nodes such as memory leaks, excessive CPU time being spent in a method, and lock contention.