 | Level: Introductory Satish Chandra Gupta (satish.gupta@in.ibm.com), Programmer, RAD/PurifyPlus, Rational Software, IBM
04 Oct 2005 Updated 10 Nov 2005 Performance bottlenecks are code segments that
contribute significantly to the overall application execution time,
and are responsible for slowing down or even halting an application.
Identifying and improving these code segments has tremendous impact
on an application's performance. This article describes how to
identify performance bottlenecks in a Java application by collecting
and visualizing execution time data using IBM® Rational®
Application Developer.
Performance bottlenecks
You might have heard of the 90/10 rule: 90% of the time is spent in executing 10% of the code. Optimizing 90% of the code will have little impact on overall performance. You need to pinpoint and improve performance bottlenecks -- the 10% of the code where 90% of the execution time is spent. IBM® Rational® Application Developer (IRAD) can help you in achieving that goal. It allows you to collect execution-time data for
your Java™ application with varying degrees of details, and provides
effective visualization of the data so that you can quickly identify performance bottlenecks.
In the rest of this article, you will learn how to use IRAD to profile an application for execution time analysis, and examine the information displayed in various relevant views. Later, you will explore the three profiling sets that come built-in with IRAD --
ready-to-use for collecting execution time data -- and how they map to different profiling types for execution-time analysis. You will also learn about the granularity of the data needed for various views, and which profiling set (or profiling type options) to use
to get them populated.
Execution time analysis
This article will demonstrate IRAD's execution time analysis capabilities using a Maximum Contiguous Subsequence Sum problem: given a sequence of (possibly negative) integers, identify the contiguous subsequence with the maximum sum. Lets say the sequence
is {a0, a1, ..., aN-1}.
You have to find the contiguous subsequence
{ai, ai+1, ..., aj}, such that the sum
of the subsequence (ai + ai+1 + ... + aj)
is maximum over all possible contiguous subsequences. For the sake of
mathematical completeness, the maximum contiguous subsequence sum is
zero if all the integers are negative.
Here is an example. The sequence {1, -2, 11, -9, 13, 3} has several contiguous subsequences, such as {1, -2}, {1, -2, 11}, {11}, {11, -9}, and so forth. The sums of these four subsequences are 1-2=-1, 1-2+11=10, 11, and 11-9=2, respectively. Among these as well as other possible contiguous subsequences, the subsequence {11, -9, 13, 3} has the maximum sum, which is 18. There is no other contiguous subsequence with a sum more than 18.
Listing 1 shows a simple and straight-forward
solution. The solve method in the
MaxContSubSum class computes the sum of all
possible contiguous subsequences, and remembers the maximum sum seen so
far. Instead of taking input from user, the program uses a random number
generator with an unchanging seed to create the input sequence. It
insures that input sequence is the same for all runs of the program, so
that performance measurements are repeatable and comparable.
Listing 1. A solution for the
Maximum Contiguous Subsequence Sum problem (MaxContSubSum.java)
import java.util.Random;
public class MaxContSubSum {
public MaxContSubSum(int[] a) {
this.a = a;
maxsum = seqstart = seqend = 0;
}
public int solve() {
final int N = a.length;
maxsum = 0;
for (int i=0; i<N; i++) {
for (int j=i; j<N; j++) {
int sum = 0;
for (int k=i; k<=j; k++)
sum += a[k];
if (sum > maxsum) {
maxsum = sum; seqstart = i; seqend = j;
}
}
}
return maxsum;
}
public void printResults() {
System.out.println("Sequence:");
for (int i=0; i<a.length; i++) {
System.out.print(" " + a[i]);
}
System.out.println();
System.out.println("Max Sum = " + maxsum
+ " SubSeq Start = " + seqstart
+ " SubSeq End = " + seqend);
System.out.println("Sub Sequence:");
for (int i=seqstart; i<=seqend; i++) {
System.out.print(" " + a[i]);
}
System.out.println();
}
public static int[] init(int N, int d) {
int[] a = new int[N];
Random r = new Random(N+d);
for (int i = 0; i < N; i++) {
a[i] = r.nextInt() % d;
}
return a;
}
public static void main(String[] args) {
int[] a = init(1000, 1000);
MaxContSubSum solver = new MaxContSubSum(a);
solver.solve();
solver.printResults();
}
private int[] a;
private int maxsum;
private int seqstart;
private int seqend;
}
|
Collecting execution time data
Before you start profiling the MaxContSubSum
class, make sure that the IBM® Agent Controller is running.
It is packaged with IRAD and gets installed along with it.
On a Microsoft® Windows® platform, it usually runs as a service
(see Resources for more details on the IBM®
Agent Controller).
Here are the steps to profile your application:
- Start the Profile dialog by clicking Run menu and selecting
Profile.
- Create a configuration for MaxContSubSum under the Java application.
- Enter the project name, the application name, and other argument details in the Main tab.
- Go to the Profiling tab and select the Execution
History -- Full Graphical Detail profiling set
(Figure 1 shows the Profile
dialog with annotations highlight the Profiling tab and Profiling
Set selection).
- Click the Profile button. IRAD will switch to the Profiling
and Logging perspective, and execute your application to collect
execution time data.
There will be a Profiling Monitor view in the perspective
showing a monitor for the program being profiled. After the program
terminates, select the Execution Time Analysis entry and
right-click, then select Open With from the pop-up menu. This
will bring up a list of views (Figure 2)
that you can use to visualize the execution time data. You can click
on the view you are interested in, and IRAD will bring up that view.
The next section will examine each of these views and the data they
display.
Figure 1. Profiling dialog
Figure 2. Views for visualizing execution time data
Views
The Execution Statistics view is the simplest. It shows
Base Time, Average Base Time, Cumulative Time, and Number of Calls
for each method. It can be configured to suit your needs. For
example, you can choose to see package, class, or method-level
information, with times shown in seconds or as a percentage.
You can also sort the information by clicking a column heading.
Figure 3 shows the Execution
Statistics view, with method-level information and percentage time
(the and
icons on the top-right corner are selected). It is sorted on Cumulative
Time (see the < in the column heading). Looking at the view, you can
identify that the solve method of the
MaxContSubSum class consumes the maximum
time (37.63%), and hence it should be optimized (you will do it
later in this article).
Figure 3. Execution Statistics view
The Performance Call Graph view shows the same information
in an innovative graphical manner that makes it easy to identify
performance bottlenecks. The view shows the 20 most time-consuming
nodes (a node can represent a method, a process, or a thread). An
arc (line connecting two nodes) represents the dynamic call structure,
and its thickness indicates the amount of time spent along that
call-path. To optimize your application, you need to focus on the
thickest lines.
Looking at the Figure 4, you can
identify that the path from Thread:main to
MaxContSubSum:main to
MaxContSubSum:Solve
is responsible for most of the execution time because the arcs along
this path are thickest. The conclusion is same as that drawn from the
Execution Statistics view -- you need to optimize the
solve method -- but the Performance Call
Graph view is more effective due to its graphical nature. It also
allows you to search the call graph for a method, focus on a sub-tree,
or make other graph manipulations. Lets say you identify a node that
represents a bottleneck. You can right-click on the node, select
Subtree from the menu, and click on Focus on Subtree.
Then you can drill down to the method within the sub-tree that is
responsible for performance degradation. Similarly, you can select
options other than Max Path to Root from the list, depending
upon your needs.
Figure 4. Performance Call Graph view
You can use the Method Details view for further investigating
a method. In the Performance Call Graph view, select Show Method
Details from the menu (see Figure 4).
The Method Detail view for the main
method of the MaxContSubSum class (shown
in Figure 5) reconfirms your findings.
It shows that the method solve consumed
67.77% of the descendent time.
Figure 5. Method Details view
The Execution Flow view allows you to examine the threads
within a program against the time line. It is essentially the trace
of the program execution. If you prefer to see data in tabular format,
click on the Table icon (next to Home icon) in the Execution Flow view
(shown in Figure 6), and it will bring
up the Execution Flow Table view.
Figure 6. Execution Flow view and Execution Flow Table view
The UML2 Trace Interaction view is more sophisticated in
displaying trace information in various granularities using UML
Sequence Diagrams. You can see this view using the menu shown in
Figure 2. You can choose to examine
thread, class, or object interactions. All of these are shown in
Figure 7. In most cases, thread
interactions and class interactions should suffice. The object
interactions are finer-grained than the class interactions.
Predictably, class interactions display cumulative information
for all instances of a class, and object interactions display
each instance separately.
Notice the object ids shown in the object interactions. Also, in class
interactions, the calls from the static main
method of MaxContSubSum class to its own
constructor and solve methods are shown as
self-pointing arrows. But in object interactions, they are shown as an
arrow from the MaxContSubSum class to its
instance with object id 7621. On the other hand, in class interactions,
the call to the Object constructor is shown as
an arrow from the MaxContSubSum class to the
Object class.
Figure 7. UML2 Trace Interactions view (Thread, Class and Object interactions)
Improving performance
By analyzing the Performance Call Graph, Method Details, and
Execution Statistics, you know that the solve
method is a performance bottleneck. In this section, you will optimize
the solve method and examine how it improves
the overall performance. To inspect the source code, you can right-click
on the solve method and select Open Source
(see Figure 4); it will switch to the Java
perspective (to get back to views, switch to the Profile and Logging
perspective).
The solve method in
Listing 1 is an obvious implementation. It
computes sums of all possible sub-sequences and does an exhaustive
search to find the one with the maximum sum. It is a cubic algorithm,
which means that the execution time for a sequence of length N is
proportional to the cube of N. It is an over-simplification, but
typically, every nested for loop has a
multiplicative effect on the execution time. In this algorithm, there
are three nested for loops leading to
cubic execution time. If we could somehow eliminate one
for loop, the algorithm will become
quadratic, and execution time will improve drastically.
Let us examine the inner-most for loop.
It computes the sum of sub-sequence between the i'th and j'th elements
of the original sequence. Let's call this value Sum(i, j). The second
for loop increments j, and computes this
value from scratch. But if you notice, Sum(i, j +1) = Sum(i, j) + a[j+1],
where a[j+1] is the (j+1)'th element of the sequence. So you can reuse
the value from the previous iteration to compute it for the current
iteration. The Listing 2 shows a
modified solve method with only two nested
for loops (the rest of the code remains the
same as Listing 1).
Listing 2. Improved solve method
public class MaxContSubSum {
// other methods
public int solve() {
final int N = a.length;
maxsum = 0;
for (int i=0; i<N; i++) {
int sum = 0;
for (int j=i; j<N; j++) {
sum += a[j];
if (sum > maxsum) {
maxsum = sum;
seqstart = i; seqend = j;
}
}
}
return maxsum;
}
// other methods
}
|
Now collect execution time data for the modified program.
Figure 8 shows the
Execution Statistics view for the new data. The contribution of
the solve method to execution time has been reduced from 37.63%
(Figure 3) to 0.79%. A comparison
of execution time in seconds (instead of percentage) is shown in
Figure 9. With this simple optimization,
the execution time for solve has decreased from 0.375 seconds to 0.005
seconds, reducing the cumulative time for main to half (from 0.585
seconds to 0.247 seconds).
It is possible to eliminate another for
loop from the solve method and make it a linear time algorithm (see
Resources), but its impact on overall
performance will be insignificant, because now it contributes less
than one percent to the execution time. The Performance Call Graph
in Figure 10 declares
loudly that you should rather focus your efforts on optimizing the
printResults method, an exercise left to you.
Figure 8. Execution Statistics with optimized solve method
Figure 9. Comparison of execution time before and after optimizing the solve method
Figure 10. Performance call graph after optimizing the solve method
Profiling sets and profiling types
So far, you have collected execution time data using the Execution
History - Full Graphical Detail profiling set
(Figure 1). But IRAD comes with three
built-in profiling sets for Execution History: Statistical, Performance
Call Graph, and Full Graphical Detail. Each loads the data with a varying
degree of details. The Statistical profiling set loads the least amount
of data, just enough to show Execution Statistics view. The Performance
Call Graph profiling set loads time spent along various call-paths,
enough to show the Execution Statistics view as well as the Performance
Call Graph and Method Details views. The Full Graphical Details profiling
set loads the maximum amount of data. It loads full trace details, and
hence can show the Execution Flow and UML2 Trace Interactions views in
addition to Execution Statistics, Performance Call Graph and Method
Details views. Most of the time, the Performance Call Graph profiling set
will satisfy your needs.
The built-in profiling sets have been created using profiling
types. You can modify them, or create new profiling sets.
For that, you will have to understand profiling types. In the Profiling
Dialog (Figure 1), if you click
Add, it will start the Profiling Set wizard. Give a name to your
profiling set and click Next, which will start the dialog shown
in Figure 11. You can see two profiling
types under Time Analysis: Call Graph Analysis and Execution Time Analysis.
The Performance Call Graph profiling set uses Call Graph Analysis
profiling type. The Statistical profiling set uses the Execution Time
Analysis profiling type with the Show execution statistics
(compressed data) option selected. The Full Graphical Details
profiling set also uses the Execution Time Analysis profiling type, but
with the Show execution flow graphical details option selected.
There is an option to Show instance level information, which
collects the data needed to show Object Interactions in the UML2 Trace
Interactions view. The Full Graphical Details profiling set is created
without selecting this option, so you cant see Object Interactions. If
you are interested in Object Interaction, select this option. However
remember that it collects data for each object instance instead of
aggregating it to the class level, which results in the amount of data
increasing exponentially. You will very rarely require data fine-grained
to this extent. Table 1 summarizes the various
profiling types and options, corresponding to the built-in
profiling set in IRAD, and the views that are available
depending on the granularity of the collected data.
Figure 11. Profiling types and options for collecting execution time data
Table 1. Profiling types, built-in profiling sets, and the available views
| Profiling Type | Call Graph Analysis | Execution Time Analysis | | Options Selected | | Execution statistics (compress data) | Execution flow graphical details | | | Instance-level information | | Corresponding Profiling Set in IRAD | Performance Call Graph | Statistical | Full Graphical Details | None | | Views | Execution Statistics | Yes | Yes | Yes | Yes | | Performance Call Graph | Yes | | Yes | Yes | | Method Details | Yes | | Yes | Yes | | Execution Flow | | | Yes | Yes | | UML2 Trace Interactions | Thread | | | Yes | Yes | | Class | | | Yes | Yes | | Object | | | | Yes |
Conclusions
Optimizing every part of your code is not worth the effort. You
should focus your efforts on code segments where returns are higher,
or rather highest. IRAD helps you to identify these code segments
(performance bottlenecks). The first step is to collect execution
time data using one of IRADs built-in profiling sets. In
most cases, the Performance Call Graph profiling set should suffice,
but you can customize existing profiling sets or create new ones by
manipulating profiling type options to suit your needs.
Then you can use various IRAD views for effective
visualization of the execution time data. The Performance Call
Graph view is the most important because it helps you identify a
performance bottleneck quickly. The Method Details view gives you
vital information about a method: execution time statistics and
time break-down for its callers and descendents; it helps you verify
your findings. Other views provide you more insight about runtime
characteristics of the code. Once you have identified a performance
bottleneck, optimize it, and collect another set of execution time
data. Compare both sets to quantify the improvements. When you are
satisfied, move on to the next bottleneck, and repeat the process.
Exploit IRAD to the fullest, and make your Java application a winner!
Resources Learn
Get products and technologies
Discuss
About the author  | 
|  | Satish Chandra Gupta is a developer in the IBM Rational® PurifyPlus® group in Bangalore, India. His interests include compilers, programming languages, runtime analysis, Java memory leaks, type theory, software engineering, and software development environments. His research has been published in ACM/IEEE conferences. He received a B.Tech. from the Indian Institute of Technology in Kanpur (India), and an M.S. from the University of Wisconsin in Milwaukee (USA). |
Rate this page
|  |