You might have heard of the 90/10 rule: 90% of the time is spent in executing 10% of the code. Optimizing 90% of the code will have little impact on overall performance. You need to pinpoint and improve performance bottlenecks -- the 10% of the code where 90% of the execution time is spent. IBM® Rational® Application Developer (IRAD) can help you in achieving that goal. It allows you to collect execution-time data for your Java™ application with varying degrees of details, and provides effective visualization of the data so that you can quickly identify performance bottlenecks.
In the rest of this article, you will learn how to use IRAD to profile an application for execution time analysis, and examine the information displayed in various relevant views. Later, you will explore the three profiling sets that come built-in with IRAD -- ready-to-use for collecting execution time data -- and how they map to different profiling types for execution-time analysis. You will also learn about the granularity of the data needed for various views, and which profiling set (or profiling type options) to use to get them populated.
This article will demonstrate IRAD's execution time analysis capabilities using a Maximum Contiguous Subsequence Sum problem: given a sequence of (possibly negative) integers, identify the contiguous subsequence with the maximum sum. Lets say the sequence is {a0, a1, ..., aN-1}. You have to find the contiguous subsequence {ai, ai+1, ..., aj}, such that the sum of the subsequence (ai + ai+1 + ... + aj) is maximum over all possible contiguous subsequences. For the sake of mathematical completeness, the maximum contiguous subsequence sum is zero if all the integers are negative.
Here is an example. The sequence {1, -2, 11, -9, 13, 3} has several contiguous subsequences, such as {1, -2}, {1, -2, 11}, {11}, {11, -9}, and so forth. The sums of these four subsequences are 1-2=-1, 1-2+11=10, 11, and 11-9=2, respectively. Among these as well as other possible contiguous subsequences, the subsequence {11, -9, 13, 3} has the maximum sum, which is 18. There is no other contiguous subsequence with a sum more than 18.
Listing 1 shows a simple and straight-forward
solution. The solve method in the
MaxContSubSum class computes the sum of all
possible contiguous subsequences, and remembers the maximum sum seen so
far. Instead of taking input from user, the program uses a random number
generator with an unchanging seed to create the input sequence. It
insures that input sequence is the same for all runs of the program, so
that performance measurements are repeatable and comparable.
Listing 1. A solution for the Maximum Contiguous Subsequence Sum problem (MaxContSubSum.java)
import java.util.Random;
public class MaxContSubSum {
public MaxContSubSum(int[] a) {
this.a = a;
maxsum = seqstart = seqend = 0;
}
public int solve() {
final int N = a.length;
maxsum = 0;
for (int i=0; i<N; i++) {
for (int j=i; j<N; j++) {
int sum = 0;
for (int k=i; k<=j; k++)
sum += a[k];
if (sum > maxsum) {
maxsum = sum; seqstart = i; seqend = j;
}
}
}
return maxsum;
}
public void printResults() {
System.out.println("Sequence:");
for (int i=0; i<a.length; i++) {
System.out.print(" " + a[i]);
}
System.out.println();
System.out.println("Max Sum = " + maxsum
+ " SubSeq Start = " + seqstart
+ " SubSeq End = " + seqend);
System.out.println("Sub Sequence:");
for (int i=seqstart; i<=seqend; i++) {
System.out.print(" " + a[i]);
}
System.out.println();
}
public static int[] init(int N, int d) {
int[] a = new int[N];
Random r = new Random(N+d);
for (int i = 0; i < N; i++) {
a[i] = r.nextInt() % d;
}
return a;
}
public static void main(String[] args) {
int[] a = init(1000, 1000);
MaxContSubSum solver = new MaxContSubSum(a);
solver.solve();
solver.printResults();
}
private int[] a;
private int maxsum;
private int seqstart;
private int seqend;
}
|
Collecting execution time data
Before you start profiling the MaxContSubSum
class, make sure that the IBM® Agent Controller is running.
It is packaged with IRAD and gets installed along with it.
On a Microsoft® Windows® platform, it usually runs as a service
(see Resources for more details on the IBM®
Agent Controller).
Here are the steps to profile your application:
- Start the Profile dialog by clicking Run menu and selecting Profile.
- Create a configuration for MaxContSubSum under the Java application.
- Enter the project name, the application name, and other argument details in the Main tab.
- Go to the Profiling tab and select the Execution History -- Full Graphical Detail profiling set (Figure 1 shows the Profile dialog with annotations highlight the Profiling tab and Profiling Set selection).
- Click the Profile button. IRAD will switch to the Profiling and Logging perspective, and execute your application to collect execution time data.
There will be a Profiling Monitor view in the perspective showing a monitor for the program being profiled. After the program terminates, select the Execution Time Analysis entry and right-click, then select Open With from the pop-up menu. This will bring up a list of views (Figure 2) that you can use to visualize the execution time data. You can click on the view you are interested in, and IRAD will bring up that view. The next section will examine each of these views and the data they display.
Figure 1. Profiling dialog
Figure 2. Views for visualizing execution time data
The Execution Statistics view is the simplest. It shows
Base Time, Average Base Time, Cumulative Time, and Number of Calls
for each method. It can be configured to suit your needs. For
example, you can choose to see package, class, or method-level
information, with times shown in seconds or as a percentage.
You can also sort the information by clicking a column heading.
Figure 3 shows the Execution
Statistics view, with method-level information and percentage time
(the
and
icons on the top-right corner are selected). It is sorted on Cumulative
Time (see the < in the column heading). Looking at the view, you can
identify that the solve method of the
MaxContSubSum class consumes the maximum
time (37.63%), and hence it should be optimized (you will do it
later in this article).
Figure 3. Execution Statistics view
The Performance Call Graph view shows the same information in an innovative graphical manner that makes it easy to identify performance bottlenecks. The view shows the 20 most time-consuming nodes (a node can represent a method, a process, or a thread). An arc (line connecting two nodes) represents the dynamic call structure, and its thickness indicates the amount of time spent along that call-path. To optimize your application, you need to focus on the thickest lines.
Looking at the Figure 4, you can
identify that the path from Thread:main to
MaxContSubSum:main to
MaxContSubSum:Solve
is responsible for most of the execution time because the arcs along
this path are thickest. The conclusion is same as that drawn from the
Execution Statistics view -- you need to optimize the
solve method -- but the Performance Call
Graph view is more effective due to its graphical nature. It also
allows you to search the call graph for a method, focus on a sub-tree,
or make other graph manipulations. Lets say you identify a node that
represents a bottleneck. You can right-click on the node, select
Subtree from the menu, and click on Focus on Subtree.
Then you can drill down to the method within the sub-tree that is
responsible for performance degradation. Similarly, you can select
options other than Max Path to Root from the list, depending
upon your needs.
Figure 4. Performance Call Graph view
You can use the Method Details view for further investigating
a method. In the Performance Call Graph view, select Show Method
Details from the menu (see Figure 4).
The Method Detail view for the main
method of the MaxContSubSum class (shown
in Figure 5) reconfirms your findings.
It shows that the method solve consumed
67.77% of the descendent time.
Figure 5. Method Details view
The Execution Flow view allows you to examine the threads within a program against the time line. It is essentially the trace of the program execution. If you prefer to see data in tabular format, click on the Table icon (next to Home icon) in the Execution Flow view (shown in Figure 6), and it will bring up the Execution Flow Table view.
Figure 6. Execution Flow view and Execution Flow Table view
The UML2 Trace Interaction view is more sophisticated in displaying trace information in various granularities using UML Sequence Diagrams. You can see this view using the menu shown in Figure 2. You can choose to examine thread, class, or object interactions. All of these are shown in Figure 7. In most cases, thread interactions and class interactions should suffice. The object interactions are finer-grained than the class interactions. Predictably, class interactions display cumulative information for all instances of a class, and object interactions display each instance separately.
Notice the object ids shown in the object interactions. Also, in class
interactions, the calls from the static main
method of MaxContSubSum class to its own
constructor and solve methods are shown as
self-pointing arrows. But in object interactions, they are shown as an
arrow from the MaxContSubSum class to its
instance with object id 7621. On the other hand, in class interactions,
the call to the Object constructor is shown as
an arrow from the MaxContSubSum class to the
Object class.
Figure 7. UML2 Trace Interactions view (Thread, Class and Object interactions)
By analyzing the Performance Call Graph, Method Details, and
Execution Statistics, you know that the solve
method is a performance bottleneck. In this section, you will optimize
the solve method and examine how it improves
the overall performance. To inspect the source code, you can right-click
on the solve method and select Open Source
(see Figure 4); it will switch to the Java
perspective (to get back to views, switch to the Profile and Logging
perspective).
The solve method in
Listing 1 is an obvious implementation. It
computes sums of all possible sub-sequences and does an exhaustive
search to find the one with the maximum sum. It is a cubic algorithm,
which means that the execution time for a sequence of length N is
proportional to the cube of N. It is an over-simplification, but
typically, every nested for loop has a
multiplicative effect on the execution time. In this algorithm, there
are three nested for loops leading to
cubic execution time. If we could somehow eliminate one
for loop, the algorithm will become
quadratic, and execution time will improve drastically.
Let us examine the inner-most for loop.
It computes the sum of sub-sequence between the i'th and j'th elements
of the original sequence. Let's call this value Sum(i, j). The second
for loop increments j, and computes this
value from scratch. But if you notice, Sum(i, j +1) = Sum(i, j) + a[j+1],
where a[j+1] is the (j+1)'th element of the sequence. So you can reuse
the value from the previous iteration to compute it for the current
iteration. The Listing 2 shows a
modified solve method with only two nested
for loops (the rest of the code remains the
same as Listing 1).
Listing 2. Improved solve method
public class MaxContSubSum {
// other methods
public int solve() {
final int N = a.length;
maxsum = 0;
for (int i=0; i<N; i++) {
int sum = 0;
for (int j=i; j<N; j++) {
sum += a[j];
if (sum > maxsum) {
maxsum = sum;
seqstart = i; seqend = j;
}
}
}
return maxsum;
}
// other methods
}
|
Now collect execution time data for the modified program. Figure 8 shows the Execution Statistics view for the new data. The contribution of the solve method to execution time has been reduced from 37.63% (Figure 3) to 0.79%. A comparison of execution time in seconds (instead of percentage) is shown in Figure 9. With this simple optimization, the execution time for solve has decreased from 0.375 seconds to 0.005 seconds, reducing the cumulative time for main to half (from 0.585 seconds to 0.247 seconds).
It is possible to eliminate another for
loop from the solve method and make it a linear time algorithm (see
Resources), but its impact on overall
performance will be insignificant, because now it contributes less
than one percent to the execution time. The Performance Call Graph
in Figure 10 declares
loudly that you should rather focus your efforts on optimizing the
printResults method, an exercise left to you.
Figure 8. Execution Statistics with optimized solve method
Figure 9. Comparison of execution time before and after optimizing the solve method
Figure 10. Performance call graph after optimizing the solve method
Profiling sets and profiling types
So far, you have collected execution time data using the Execution History - Full Graphical Detail profiling set (Figure 1). But IRAD comes with three built-in profiling sets for Execution History: Statistical, Performance Call Graph, and Full Graphical Detail. Each loads the data with a varying degree of details. The Statistical profiling set loads the least amount of data, just enough to show Execution Statistics view. The Performance Call Graph profiling set loads time spent along various call-paths, enough to show the Execution Statistics view as well as the Performance Call Graph and Method Details views. The Full Graphical Details profiling set loads the maximum amount of data. It loads full trace details, and hence can show the Execution Flow and UML2 Trace Interactions views in addition to Execution Statistics, Performance Call Graph and Method Details views. Most of the time, the Performance Call Graph profiling set will satisfy your needs.
The built-in profiling sets have been created using profiling types. You can modify them, or create new profiling sets. For that, you will have to understand profiling types. In the Profiling Dialog (Figure 1), if you click Add, it will start the Profiling Set wizard. Give a name to your profiling set and click Next, which will start the dialog shown in Figure 11. You can see two profiling types under Time Analysis: Call Graph Analysis and Execution Time Analysis.
The Performance Call Graph profiling set uses Call Graph Analysis profiling type. The Statistical profiling set uses the Execution Time Analysis profiling type with the Show execution statistics (compressed data) option selected. The Full Graphical Details profiling set also uses the Execution Time Analysis profiling type, but with the Show execution flow graphical details option selected.
There is an option to Show instance level information, which collects the data needed to show Object Interactions in the UML2 Trace Interactions view. The Full Graphical Details profiling set is created without selecting this option, so you cant see Object Interactions. If you are interested in Object Interaction, select this option. However remember that it collects data for each object instance instead of aggregating it to the class level, which results in the amount of data increasing exponentially. You will very rarely require data fine-grained to this extent. Table 1 summarizes the various profiling types and options, corresponding to the built-in profiling set in IRAD, and the views that are available depending on the granularity of the collected data.
Figure 11. Profiling types and options for collecting execution time data
| Profiling Type | Call Graph Analysis | Execution Time Analysis | ||||
| Options Selected | Execution statistics (compress data) | Execution flow graphical details | ||||
| Instance-level information | ||||||
| Corresponding Profiling Set in IRAD | Performance Call Graph | Statistical | Full Graphical Details | None | ||
| Views | Execution Statistics | Yes | Yes | Yes | Yes | |
| Performance Call Graph | Yes | Yes | Yes | |||
| Method Details | Yes | Yes | Yes | |||
| Execution Flow | Yes | Yes | ||||
| UML2 Trace Interactions | Thread | Yes | Yes | |||
| Class | Yes | Yes | ||||
| Object | Yes | |||||
Optimizing every part of your code is not worth the effort. You should focus your efforts on code segments where returns are higher, or rather highest. IRAD helps you to identify these code segments (performance bottlenecks). The first step is to collect execution time data using one of IRADs built-in profiling sets. In most cases, the Performance Call Graph profiling set should suffice, but you can customize existing profiling sets or create new ones by manipulating profiling type options to suit your needs.
Then you can use various IRAD views for effective visualization of the execution time data. The Performance Call Graph view is the most important because it helps you identify a performance bottleneck quickly. The Method Details view gives you vital information about a method: execution time statistics and time break-down for its callers and descendents; it helps you verify your findings. Other views provide you more insight about runtime characteristics of the code. Once you have identified a performance bottleneck, optimize it, and collect another set of execution time data. Compare both sets to quantify the improvements. When you are satisfied, move on to the next bottleneck, and repeat the process. Exploit IRAD to the fullest, and make your Java application a winner!
Learn
-
For specific information about how to set up and start IBM Agent Controller,
consult the Rational
Application Developer V6 Programming Guide.
-
For an overview of IRADs profiling capabilities, see
Runtime
problem determination with IBM Rational Application Developer.
-
For profiling concepts in Eclipse, see
An
introduction to profiling Java applications.
-
Check out the IBM
Rational Application Developer page at developerWorks.
-
For the source code
of a linear time algorithm for Maximum Contiguous Subsequence Sum problem, see
Data
Structures and Problem Solving Using Java, 3rd ed.
by Mark Allen Weiss (Addison-Wesley, 2005).
Get products and technologies
-
IBM trial products
for download: Build your next development project using with IBM trial
software, available for download directly from developerWorks.
Discuss
-
Rational Software Architect, Software Modeler, Application Developer and Web Developer forum: Ask questions about Rational Application Developer.
-
developerWorks blogs: Participate in the developerWorks community.

Satish Chandra Gupta is a developer in the IBM Rational® PurifyPlus® group in Bangalore, India. His interests include compilers, programming languages, runtime analysis, Java memory leaks, type theory, software engineering, and software development environments. His research has been published in ACM/IEEE conferences. He received a B.Tech. from the Indian Institute of Technology in Kanpur (India), and an M.S. from the University of Wisconsin in Milwaukee (USA).
Comments (Undergoing maintenance)





