Need for speed -- Eliminating performance bottlenecks

Optimizing Java Programs Using IBM Rational Application Developer 6.0.1

Performance bottlenecks are code segments that contribute significantly to the overall application execution time, and are responsible for slowing down or even halting an application. Identifying and improving these code segments has tremendous impact on an application's performance. This article describes how to identify performance bottlenecks in a Java application by collecting and visualizing execution time data using IBM® Rational® Application Developer.

Satish Chandra Gupta (satish.gupta@in.ibm.com), Programmer, RAD/PurifyPlus, Rational Software, IBM

Satish Chandra GuptaSatish Chandra Gupta is a developer in the IBM Rational® PurifyPlus® group in Bangalore, India. His interests include compilers, programming languages, runtime analysis, Java memory leaks, type theory, software engineering, and software development environments. His research has been published in ACM/IEEE conferences. He received a B.Tech. from the Indian Institute of Technology in Kanpur (India), and an M.S. from the University of Wisconsin in Milwaukee (USA).


developerWorks Contributing author
        level

10 November 2005 (First published 04 October 2005)

Performance bottlenecks

You might have heard of the 90/10 rule: 90% of the time is spent in executing 10% of the code. Optimizing 90% of the code will have little impact on overall performance. You need to pinpoint and improve performance bottlenecks -- the 10% of the code where 90% of the execution time is spent. IBM® Rational® Application Developer (IRAD) can help you in achieving that goal. It allows you to collect execution-time data for your Java™ application with varying degrees of details, and provides effective visualization of the data so that you can quickly identify performance bottlenecks.

In the rest of this article, you will learn how to use IRAD to profile an application for execution time analysis, and examine the information displayed in various relevant views. Later, you will explore the three profiling sets that come built-in with IRAD -- ready-to-use for collecting execution time data -- and how they map to different profiling types for execution-time analysis. You will also learn about the granularity of the data needed for various views, and which profiling set (or profiling type options) to use to get them populated.


Execution time analysis

This article will demonstrate IRAD's execution time analysis capabilities using a Maximum Contiguous Subsequence Sum problem: given a sequence of (possibly negative) integers, identify the contiguous subsequence with the maximum sum. Lets say the sequence is {a0, a1, ..., aN-1}. You have to find the contiguous subsequence {ai, ai+1, ..., aj}, such that the sum of the subsequence (ai + ai+1 + ... + aj) is maximum over all possible contiguous subsequences. For the sake of mathematical completeness, the maximum contiguous subsequence sum is zero if all the integers are negative.

Here is an example. The sequence {1, -2, 11, -9, 13, 3} has several contiguous subsequences, such as {1, -2}, {1, -2, 11}, {11}, {11, -9}, and so forth. The sums of these four subsequences are 1-2=-1, 1-2+11=10, 11, and 11-9=2, respectively. Among these as well as other possible contiguous subsequences, the subsequence {11, -9, 13, 3} has the maximum sum, which is 18. There is no other contiguous subsequence with a sum more than 18.

Listing 1 shows a simple and straight-forward solution. The solve method in the MaxContSubSum class computes the sum of all possible contiguous subsequences, and remembers the maximum sum seen so far. Instead of taking input from user, the program uses a random number generator with an unchanging seed to create the input sequence. It insures that input sequence is the same for all runs of the program, so that performance measurements are repeatable and comparable.

Listing 1. A solution for the Maximum Contiguous Subsequence Sum problem (MaxContSubSum.java)
import java.util.Random;

public class MaxContSubSum {
    public MaxContSubSum(int[] a) {
        this.a = a;
        maxsum = seqstart = seqend = 0;
    }
    public int solve() {
        final int N = a.length;
        maxsum = 0;
        for (int i=0; i<N; i++) {
            for (int j=i; j<N; j++) {
                int sum = 0;
                for (int k=i; k<=j; k++)
                    sum += a[k];
                if (sum > maxsum) {
                    maxsum = sum; seqstart = i; seqend = j;
                }
            }
        }
        return maxsum;
    }
    public void printResults() {
        System.out.println("Sequence:");
        for (int i=0; i<a.length; i++) {
            System.out.print(" " + a[i]);
        }
        System.out.println();
        System.out.println("Max Sum = " + maxsum
                + "    SubSeq Start = " + seqstart
                + "    SubSeq End = " + seqend);
        System.out.println("Sub Sequence:");
        for (int i=seqstart; i<=seqend; i++) {
            System.out.print(" " + a[i]);
        }
        System.out.println();
    }
    public static int[] init(int N, int d) {
        int[] a = new int[N];
        Random r = new Random(N+d);
        for (int i = 0; i < N; i++) {
            a[i] = r.nextInt() % d;
        }
        return a;
    }
    public static void main(String[] args) {
        int[] a = init(1000, 1000);
        MaxContSubSum solver = new MaxContSubSum(a);
        solver.solve();
        solver.printResults();
    }
    private int[] a;
    private int   maxsum;
    private int   seqstart;
    private int   seqend;
}

Collecting execution time data

Before you start profiling the MaxContSubSum class, make sure that the IBM® Agent Controller is running. It is packaged with IRAD and gets installed along with it. On a Microsoft® Windows® platform, it usually runs as a service (see Resources for more details on the IBM® Agent Controller).

Here are the steps to profile your application:

  1. Start the Profile dialog by clicking Run menu and selecting Profile.
  2. Create a configuration for MaxContSubSum under the Java application.
  3. Enter the project name, the application name, and other argument details in the Main tab.
  4. Go to the Profiling tab and select the Execution History -- Full Graphical Detail profiling set (Figure 1 shows the Profile dialog with annotations highlight the Profiling tab and Profiling Set selection).
  5. Click the Profile button. IRAD will switch to the Profiling and Logging perspective, and execute your application to collect execution time data.

There will be a Profiling Monitor view in the perspective showing a monitor for the program being profiled. After the program terminates, select the Execution Time Analysis entry and right-click, then select Open With from the pop-up menu. This will bring up a list of views (Figure 2) that you can use to visualize the execution time data. You can click on the view you are interested in, and IRAD will bring up that view. The next section will examine each of these views and the data they display.

Figure 1. Profiling dialog
Profiling dialog
Figure 2. Views for visualizing execution time data
Views for visualizing execution time data

Views

The Execution Statistics view is the simplest. It shows Base Time, Average Base Time, Cumulative Time, and Number of Calls for each method. It can be configured to suit your needs. For example, you can choose to see package, class, or method-level information, with times shown in seconds or as a percentage. You can also sort the information by clicking a column heading. Figure 3 shows the Execution Statistics view, with method-level information and percentage time (the M and % icons on the top-right corner are selected). It is sorted on Cumulative Time (see the < in the column heading). Looking at the view, you can identify that the solve method of the MaxContSubSum class consumes the maximum time (37.63%), and hence it should be optimized (you will do it later in this article).

Figure 3. Execution Statistics view
Execution Statistics view

The Performance Call Graph view shows the same information in an innovative graphical manner that makes it easy to identify performance bottlenecks. The view shows the 20 most time-consuming nodes (a node can represent a method, a process, or a thread). An arc (line connecting two nodes) represents the dynamic call structure, and its thickness indicates the amount of time spent along that call-path. To optimize your application, you need to focus on the thickest lines.

Looking at the Figure 4, you can identify that the path from Thread:main to MaxContSubSum:main to MaxContSubSum:Solve is responsible for most of the execution time because the arcs along this path are thickest. The conclusion is same as that drawn from the Execution Statistics view -- you need to optimize the solve method -- but the Performance Call Graph view is more effective due to its graphical nature. It also allows you to search the call graph for a method, focus on a sub-tree, or make other graph manipulations. Lets say you identify a node that represents a bottleneck. You can right-click on the node, select Subtree from the menu, and click on Focus on Subtree. Then you can drill down to the method within the sub-tree that is responsible for performance degradation. Similarly, you can select options other than Max Path to Root from the list, depending upon your needs.

Figure 4. Performance Call Graph view
Performance Call Graph view

You can use the Method Details view for further investigating a method. In the Performance Call Graph view, select Show Method Details from the menu (see Figure 4). The Method Detail view for the main method of the MaxContSubSum class (shown in Figure 5) reconfirms your findings. It shows that the method solve consumed 67.77% of the descendent time.

Figure 5. Method Details view
Method Details view

The Execution Flow view allows you to examine the threads within a program against the time line. It is essentially the trace of the program execution. If you prefer to see data in tabular format, click on the Table icon (next to Home icon) in the Execution Flow view (shown in Figure 6), and it will bring up the Execution Flow Table view.

Figure 6. Execution Flow view and Execution Flow Table view
Execution Flow view and Execution Flow Table view

The UML2 Trace Interaction view is more sophisticated in displaying trace information in various granularities using UML Sequence Diagrams. You can see this view using the menu shown in Figure 2. You can choose to examine thread, class, or object interactions. All of these are shown in Figure 7. In most cases, thread interactions and class interactions should suffice. The object interactions are finer-grained than the class interactions. Predictably, class interactions display cumulative information for all instances of a class, and object interactions display each instance separately.

Notice the object ids shown in the object interactions. Also, in class interactions, the calls from the static main method of MaxContSubSum class to its own constructor and solve methods are shown as self-pointing arrows. But in object interactions, they are shown as an arrow from the MaxContSubSum class to its instance with object id 7621. On the other hand, in class interactions, the call to the Object constructor is shown as an arrow from the MaxContSubSum class to the Object class.

Figure 7. UML2 Trace Interactions view (Thread, Class and Object interactions)
UML2 Trace Interactions view (Thread, Class and Object interactions)

Improving performance

By analyzing the Performance Call Graph, Method Details, and Execution Statistics, you know that the solve method is a performance bottleneck. In this section, you will optimize the solve method and examine how it improves the overall performance. To inspect the source code, you can right-click on the solve method and select Open Source (see Figure 4); it will switch to the Java perspective (to get back to views, switch to the Profile and Logging perspective).

The solve method in Listing 1 is an obvious implementation. It computes sums of all possible sub-sequences and does an exhaustive search to find the one with the maximum sum. It is a cubic algorithm, which means that the execution time for a sequence of length N is proportional to the cube of N. It is an over-simplification, but typically, every nested for loop has a multiplicative effect on the execution time. In this algorithm, there are three nested for loops leading to cubic execution time. If we could somehow eliminate one for loop, the algorithm will become quadratic, and execution time will improve drastically.

Let us examine the inner-most for loop. It computes the sum of sub-sequence between the i'th and j'th elements of the original sequence. Let's call this value Sum(i, j). The second for loop increments j, and computes this value from scratch. But if you notice, Sum(i, j +1) = Sum(i, j) + a[j+1], where a[j+1] is the (j+1)'th element of the sequence. So you can reuse the value from the previous iteration to compute it for the current iteration. The Listing 2 shows a modified solve method with only two nested for loops (the rest of the code remains the same as Listing 1).

Listing 2. Improved solve method
public class MaxContSubSum {
    // other methods
    public int solve() {
        final int N = a.length;
        maxsum = 0;
        for (int i=0; i<N; i++) {
            int sum = 0;
            for (int j=i; j<N; j++) {
                sum += a[j];
                if (sum > maxsum) {
                    maxsum = sum;
                    seqstart = i; seqend = j;
                }
            }
        }
        return maxsum;    
    }
    // other methods
}

Now collect execution time data for the modified program. Figure 8 shows the Execution Statistics view for the new data. The contribution of the solve method to execution time has been reduced from 37.63% (Figure 3) to 0.79%. A comparison of execution time in seconds (instead of percentage) is shown in Figure 9. With this simple optimization, the execution time for solve has decreased from 0.375 seconds to 0.005 seconds, reducing the cumulative time for main to half (from 0.585 seconds to 0.247 seconds).

It is possible to eliminate another for loop from the solve method and make it a linear time algorithm (see Resources), but its impact on overall performance will be insignificant, because now it contributes less than one percent to the execution time. The Performance Call Graph in Figure 10 declares loudly that you should rather focus your efforts on optimizing the printResults method, an exercise left to you.

Figure 8. Execution Statistics with optimized solve method
Execution Statistics with optimized solve method
Figure 9. Comparison of execution time before and after optimizing the solve method
Comparison of execution time before and after optimizing the solve method
Figure 10. Performance call graph after optimizing the solve method
Performance call graph after optimizing the solve method

Profiling sets and profiling types

So far, you have collected execution time data using the Execution History - Full Graphical Detailprofiling set (Figure 1). But IRAD comes with three built-in profiling sets for Execution History: Statistical, Performance Call Graph, and Full Graphical Detail. Each loads the data with a varying degree of details. The Statistical profiling set loads the least amount of data, just enough to show Execution Statistics view. The Performance Call Graph profiling set loads time spent along various call-paths, enough to show the Execution Statistics view as well as the Performance Call Graph and Method Details views. The Full Graphical Details profiling set loads the maximum amount of data. It loads full trace details, and hence can show the Execution Flow and UML2 Trace Interactions views in addition to Execution Statistics, Performance Call Graph and Method Details views. Most of the time, the Performance Call Graph profiling set will satisfy your needs.

The built-in profiling sets have been created using profiling types. You can modify them, or create new profiling sets. For that, you will have to understand profiling types. In the Profiling Dialog (Figure 1), if you click Add, it will start the Profiling Set wizard. Give a name to your profiling set and click Next, which will start the dialog shown in Figure 11. You can see two profiling types under Time Analysis: Call Graph Analysis and Execution Time Analysis.

The Performance Call Graph profiling set uses Call Graph Analysis profiling type. The Statistical profiling set uses the Execution Time Analysis profiling type with the Show execution statistics (compressed data) option selected. The Full Graphical Details profiling set also uses the Execution Time Analysis profiling type, but with the Show execution flow graphical details option selected.

There is an option to Show instance level information, which collects the data needed to show Object Interactions in the UML2 Trace Interactions view. The Full Graphical Details profiling set is created without selecting this option, so you cant see Object Interactions. If you are interested in Object Interaction, select this option. However remember that it collects data for each object instance instead of aggregating it to the class level, which results in the amount of data increasing exponentially. You will very rarely require data fine-grained to this extent. Table 1 summarizes the various profiling types and options, corresponding to the built-in profiling set in IRAD, and the views that are available depending on the granularity of the collected data.

Figure 11. Profiling types and options for collecting execution time data
Profiling types and options for collecting execution time data
Table 1. Profiling types, built-in profiling sets, and the available views
Profiling TypeCall Graph AnalysisExecution Time Analysis
Options Selected Execution statistics (compress data)Execution flow graphical details
 Instance-level information
Corresponding Profiling Set in IRADPerformance Call GraphStatisticalFull Graphical DetailsNone
ViewsExecution StatisticsYesYesYesYes
Performance Call GraphYes YesYes
Method DetailsYes YesYes
Execution Flow  YesYes
UML2 Trace InteractionsThread  YesYes
Class  YesYes
Object   Yes

Conclusions

Optimizing every part of your code is not worth the effort. You should focus your efforts on code segments where returns are higher, or rather highest. IRAD helps you to identify these code segments (performance bottlenecks). The first step is to collect execution time data using one of IRADs built-in profiling sets. In most cases, the Performance Call Graph profiling set should suffice, but you can customize existing profiling sets or create new ones by manipulating profiling type options to suit your needs.

Then you can use various IRAD views for effective visualization of the execution time data. The Performance Call Graph view is the most important because it helps you identify a performance bottleneck quickly. The Method Details view gives you vital information about a method: execution time statistics and time break-down for its callers and descendents; it helps you verify your findings. Other views provide you more insight about runtime characteristics of the code. Once you have identified a performance bottleneck, optimize it, and collect another set of execution time data. Compare both sets to quantify the improvements. When you are satisfied, move on to the next bottleneck, and repeat the process. Exploit IRAD to the fullest, and make your Java application a winner!

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational, Java technology, DevOps
ArticleID=94707
ArticleTitle=Need for speed -- Eliminating performance bottlenecks
publish-date=11102005