Skip to main content

Classworking toolkit: Performance tracing with aspects

Can aspects clean up code pollution? Using the AspectWerkz framework for performance tracing is a first step.

Dennis Sosnoski (dms@sosnoski.com), Consultant, Sosnoski Software Solutions, Inc.
Author photo
Dennis Sosnoski is the founder and lead consultant of Seattle-area Java technology consulting company Sosnoski Software Solutions, Inc., specialists in XML and Web services training and consulting. His professional software development experience spans over 30 years, with the last several years focused on server-side XML and Java technologies. Dennis is a frequent speaker at conferences nationwide. He's also the lead developer of the open source JiBX XML Data Binding framework built around the Java classworking technology.

Summary:  In this edition of Classworking toolkit, consultant Dennis Sosnoski takes up where he left off last month by applying an aspect-oriented approach to performance analysis. He investigates a client application using the Apache Axis Web services framework to find where the execution time is being spent. Along the way, Dennis looks into the issue of measuring time intervals in Java ™ code and shows how you can find the granularity and overhead of the timer implementation on your system.

View more content in this series

Date:  05 Apr 2005
Level:  Introductory
Activity:  1844 views

Last month I showed how you could use the AspectWerkz framework to implement a basic performance tracing aspect. This month I'm going to take this aspect further, first adding some convenience features and then applying it in the analysis of a complex real world body of code -- the Apache Axis Web services framework. You'll see both the benefits and the limitations of my aspect-oriented approach to performance tracing, and end up with what I see as the general usefulness of aspect-oriented programming (AOP) to logging and tracing in general.

Before I get down and dirty with Axis, there's one other issue to cover. Both in the performance tracing aspect from last month and in my earlier articles on Java programming dynamics (see Resources) I've used method timing information to look at application performance issues. This approach is one I've personally found very convenient on a number of occasions, but if I'm going to be demonstrating it as much as I have, some side issues deserve to be discussed. I'll take care of those issues in the first part of this column.

A timely digression

The main purposes for my performance tracing aspect is to monitor the execution of a program and give a top-down view of where the execution time is being spent (as opposed to the bottom-up view you generally get with a profiler). To really apply it effectively is going to require some understanding of how time is being measured.

The Java programming language has had one timing method available as part of the standard API since the earliest releases. This method is java.lang.System.currentTimeMillis(), which returns the elapsed time in milliseconds since midnight, January 1, 1970 UTC. This method is very useful, especially because it's available on every version of the Java platform. I used it in my initial aspect demonstration code last month for this reason. However, currentTimeMillis() does have some drawbacks.

Ask the expert: Dennis Sosnoski on JVM and bytecode issues

For comments or questions about the material covered in this article series, as well as anything else that pertains to Java bytecode, the Java binary class format, or general JVM issues, visit the JVM and Bytecode discussion forum, moderated by Dennis Sosnoski.

One issue with currentTimeMillis() is that even though the time value is returned as a count of milliseconds, there are no guarantees about the granularity or accuracy of the count returned. On some systems the time is a true count of milliseconds, changing approximately every millisecond. On other systems the time may only change at intervals of 10 or 20 milliseconds, and the accuracy may vary from one change to the next.

Another limitation is just the basic timescale inherent in the method definition. Back when the Java platform first came into use, a millisecond was still a fairly short time interval (especially for an interpreted language). Today, with CPU clock speeds roughly a hundred times faster than when the Java platform was first released, a millisecond has become a relatively long time span.

Beyond these issues with currentTimeMillis(), there's also another concern with using a timer method of any form for monitoring program execution. The concern is that the timer method itself -- which probably ends up needing to go down to the operating system level to get access to a timer -- adds some overhead, and if the timer method overhead is too large (or the method is called too often), it's going to seriously distort the timing measurements. To make intelligent use of my performance tracing aspect, I need to have at least a rough idea of how much overhead is added by the timer method call.

Timing currentTimeMillis()

How much do I need to be concerned about using currentTimeMillis()? To find out, I wrote a program to perform some tests. This program, shown in Listing 1, takes a very simple approach to checking the granularity, accuracy, and overhead of currentTimeMillis() by repeatedly calling the method until the return value changes. The magnitude of the changes tells me the granularity of the timer; the number of calls between changes lets me derive the overhead per call; and the consistency of both the magnitude of the changes and the number of calls between changes gives me a check on the accuracy.


Listing 1. currentTimeMillis() granularity

public class MilliGran
{
    // round value to a single digit after decimal point
    private static String roundOneDecimal(double value) {
        int whole = (int)value;
        return whole + "." + (int)((value-whole)*10.0);
    }
    
    // print average and deviation with lead text
    private static void print(String lead, double sum,
        double ssqr, int reps) {
        double mean = sum / reps;
        double stddev = Math.sqrt(ssqr/reps-mean*mean);
        System.out.println(lead + roundOneDecimal(mean) +
            ", with standard deviation " + roundOneDecimal(stddev));
    }
    
    public static void main(String[] argv) {
        
        // get control values from command line
        int reps = Integer.parseInt(argv[0]);
        int sets = Integer.parseInt(argv[1]);
        
        // loop until time rolls over
        long start = System.currentTimeMillis();
        long last;
        while ((last = System.currentTimeMillis()) == start);
        
        // loop for count of sets specified
        int[] diffs = new int[reps];
        for (int i = 0; i < sets; i++) {
            double lsum = 0.0;
            double lsumsq = 0.0;
            double dsum = 0.0;
            double dsumsq = 0.0;
            for (int j = 0; j < reps; j++) {
                
                // count loops until result changes
                int loop = 0;
                last = System.currentTimeMillis();
                long now;
                do {
                    loop++;
                    now = System.currentTimeMillis();
                } while (now == last);
                
                // accumulate statistics information
                lsum += loop;
                lsumsq += loop*loop;
                int diff = (int)(now-last);
                dsum += diff;
                dsumsq += diff*diff;
                diffs[j] = diff;
            }
            
            // print results for set
            System.out.println("Set " + i + " results:");
            print(" Average time change ", dsum, dsumsq, reps);
            print(" Average calls before change ", lsum, lsumsq, reps);
            System.out.println(" Average time per call " +
                roundOneDecimal(dsum / lsum * 1000000.0) + " nanoseconds");
            System.out.print(" Differences:");
            for (int j = 0; j < reps; j++) {
                System.out.print(" " + diffs[j]);
            }
            System.out.println();
        }
    }
}

Listing 2 shows the results from running this program on my notebook computer (a relatively slow 1GHz PIIIm, running Mandrake Linux 10.0) using JDK 1.4.2. The results show the time changes with currentTimeMillis() are always a single millisecond over this sample, and the overhead of calling the method averages only about 3 microseconds. Because both these values are consistent (within 1 percent in this case) it's at least reasonable to assume that the timer value is accurate.


Listing 2. Results from running the currentTimeMillis() test

[dennis@notebook time]$ java -cp . MilliGran 10 4
Set 0 results:
 Average time change 1.0, with standard deviation 0.0
 Average calls before change 332.8, with standard deviation 39.4
 Average time per call 3004.8 nanoseconds
 Differences: 1 1 1 1 1 1 1 1 1 1
Set 1 results:
 Average time change 1.0, with standard deviation 0.0
 Average calls before change 338.6, with standard deviation 38.7
 Average time per call 2953.3 nanoseconds
 Differences: 1 1 1 1 1 1 1 1 1 1
Set 2 results:
 Average time change 1.0, with standard deviation 0.0
 Average calls before change 333.6, with standard deviation 33.5
 Average time per call 2996.7 nanoseconds
 Differences: 1 1 1 1 1 1 1 1 1 1
Set 3 results:
 Average time change 1.0, with standard deviation 0.0
 Average calls before change 336.8, with standard deviation 41.4
 Average time per call 2969.1 nanoseconds
 Differences: 1 1 1 1 1 1 1 1 1 1

Zooming in with nanoTime()

In JDK 5.0, Sun added an alternative method to provide better timing information: java.lang.System.nanoTime(). As the name implies, nanoTime() returns a time value in nanoseconds, providing a precision a million times greater than the currentTimeMillis() method. Even though most of today's systems will not provide times with nanosecond resolution, the actual values should at least be much better than can be obtained with the older method.

It's easy to modify the Listing 1 code to use the nanoTime() method instead of currentTimeMillis(). I won't show the modified code here (though it's included in the code download from the Resources section), but Listing 3 shows the results from running the modified program, this time using JDK 5.0:


Listing 3. Results from running the nanoTime() test

[dennis@notebook time]$ java -cp . NanoGran 10 4
Set 0 results:
 Average time change 2800.0, with standard deviation 400.0
 Average calls before change 1.0, with standard deviation 0.0
 Average time per call 2800.0 nanoseconds
 Differences: 3000 3000 3000 3000 2000 2000 3000 3000 3000 3000
Set 1 results:
 Average time change 2900.0, with standard deviation 300.0
 Average calls before change 1.0, with standard deviation 0.0
 Average time per call 2900.0 nanoseconds
 Differences: 3000 3000 3000 3000 3000 3000 2000 3000 3000 3000
Set 2 results:
 Average time change 2700.0, with standard deviation 458.2
 Average calls before change 1.0, with standard deviation 0.0
 Average time per call 2700.0 nanoseconds
 Differences: 2000 3000 3000 2000 3000 3000 3000 3000 3000 2000
Set 3 results:
 Average time change 2900.0, with standard deviation 300.0
 Average calls before change 1.0, with standard deviation 0.0
 Average time per call 2900.0 nanoseconds
 Differences: 3000 3000 3000 3000 2000 3000 3000 3000 3000 3000

As you can see in the Listing 3 results, the nanoTime() method runs slightly faster than currentTimeMillis(). But more important, it returns a different time value each time it's called, with apparent granularity of 1000 nanoseconds (or 1 microsecond). How can I tell the granularity? While I can't be certain based on these results, the fact that the differences between the result values returned by successive calls to nanoTime() is always either 2 or 3 microseconds is a pretty good indication.


Buffing up the aspect

It's easy to change the performance tracing aspect from the last column to use nanoTime(). While I'm at it, I'll make some other changes to improve the usability of the code and make it more flexible for my testing. Listing 4 gives the modified code in its entirety:


Listing 4. Performance tracing aspect code

public class TraceAspect
{
    private static final long DEFAULT_MINIMUM = 1000000;
    
    private final long m_minimumNanoseconds;
    private final int m_maximumNesting;
    private final String m_triggerClass;
    private final String m_triggerName;
    private int m_matchDepth;
    private int m_nestingDepth;
    private ArrayList m_pendingPrints = new ArrayList();
    
    public TraceAspect(AspectContext context) {
        
        // set minimum time value for tracing a method
        String value = context.getParameter("minimum-time");
        m_minimumNanoseconds = (value == null) ?
            DEFAULT_MINIMUM : Long.parseLong(value);
        
        // set maximum depth to be printed
        value = context.getParameter("maximum-depth");
        m_maximumNesting = (value == null) ?
            Integer.MAX_VALUE : Integer.parseInt(value);
        
        // set trigger class and method name if supplied
        value = context.getParameter("trigger-method");
        if (value == null) {
            m_matchDepth = 1;
            m_triggerClass = null;
            m_triggerName = null;
        } else {
            int split = value.lastIndexOf('.');
            m_triggerClass = value.substring(0, split);
            m_triggerName = value.substring(split+1);
        }
    }
    
    private void indent(int depth) {
        for (int i = 0; i < depth; i++) {
            System.out.print((i % 2 == 0) ? ' ' : '-');
        }
    }
    
    private void indent() {
        indent(m_nestingDepth);
    }
    
    private void printFullMethod(JoinPoint call) {
        MethodSignature signature = (MethodSignature)call.getSignature();
        System.out.print(signature.getDeclaringType().getName());
        System.out.print('.');
        System.out.print(signature.getName());
    }

    public Object trace(JoinPoint join) throws Throwable {
        
        // check for match on trigger
        boolean match = false;
        MethodSignature signature = (MethodSignature)join.getSignature();
        if (signature.getName().equals(m_triggerName)) {
            if (signature.getDeclaringType().getName().equals(m_triggerClass)) {
                match = true;
                m_matchDepth++;
            }
        }
        
        // check if execution is to be processed
        if (m_matchDepth == 0 || m_nestingDepth > m_maximumNesting) {
            
            // skip any timing or printing
            return join.proceed();
            
        } else {
            
            // add joinpoint to list pending print
            m_pendingPrints.add(join);
            m_nestingDepth++;
            
            // time joinpoint execution
            long start = System.nanoTime();
            Object result = null;
            Throwable thrown = null;
            try {
                result = join.proceed();
            } catch (Throwable t) {
                thrown = t;
            } finally {
                
                // adjust depths
                m_nestingDepth--;
                if (match) {
                    m_matchDepth--;
                }
                
                // remove joinpoint if still on pending list
                //  (will always be last in list)
                boolean entered = true;
                if (m_pendingPrints.size() > 0) {
                    m_pendingPrints.remove(m_pendingPrints.size()-1);
                    entered = false;
                }
                
                // check if execution time above cutoff
                long time = System.nanoTime() - start;
                if (time >= m_minimumNanoseconds || thrown != null) {
                    
                    // print all unprinted "entering" lines
                    int count = m_pendingPrints.size();
                    for (int i = 0; i < count; i++) {
                        
                        // print entering information for nesting call
                        //  (nesting depth based on start of pendings)
                        indent(m_nestingDepth-count+i);
                        JoinPoint prior = (JoinPoint)m_pendingPrints.get(i);
                        System.out.print('>');
                        printFullMethod(prior);
                        System.out.println();
                    }
                    
                    // clear all pendings now that they're printed
                    m_pendingPrints.clear();
                    
                    // print exiting information including time
                    indent();
                    System.out.print(entered ? '<' : '*');
                    printFullMethod(join);
                    System.out.print(" took ");
                    long tenthmillis = (time+50000)/100000;
                    System.out.print(tenthmillis / 10);
                    System.out.print('.');
                    System.out.print(tenthmillis % 10);
                    System.out.print(" ms.");
                    
                    // append exception information
                    if (thrown != null) {
                        System.out.println(" threw " +
                            thrown.getClass().getName() +
                            ": " + thrown.getMessage());
                        throw thrown;
                    } else {
                        System.out.println();
                    }
                }
            }
            return result;
        }
    }
}

The basic principles are still the same as last month, but this code adds a few bells and a couple of whistles. Besides using nanoTime(), I've also added three optional configuration parameters to the aspect definition. These parameters control:

  • The minimum measured time for a method call to be included in the trace output
  • The maximum depth of nesting to be included in the output
  • A trigger method used to turn on the trace so that only the calls made by this method (directly or indirectly) will be included in the output

AspectWerkz on Linux

I did run into a pair of minor quirks running AspectWerkz on Linux. First off, the supplied shell scripts wouldn't run on my system because they used DOS-style line endings. Once I changed the line endings, they worked for the 1.0 and 2.0 RCX releases, but not for 1.1 (I didn't bother tracking down the problem there). Hopefully, this issue will be corrected in the 2.0 release.

Second, I ran into compatibility problems between AspectWerkz 1.0 and 2.0 RC3 that was apparently Linux-specific, which forced me to include all the JAR files referenced by the Axis code in my classpath when using AspectWerkz 2.0, even though some of these JARs (including the Servlet API definitions) are not normally required on the client. The AspectWerkz developers weren't able to duplicate my problem on Windows, but if you use AspectWerkz 2.0 on Linux, you might want to be alert for the same type of problem.

The added constructor code uses a signature defined by AspectWerkz 2.0 to get access to context information, which includes any parameter definitions for the aspect. I'll demonstrate how these parameters are defined in the XML configuration files as they come into use.

At the beginning of the actual trace() method, I've added logic to test for a match on the trigger method name and class, and to just directly execute the joinpoint method call without timing if I know the call is not going to be printed (either because the trigger method isn't in the current nesting of calls, or because we've gotten too deeply nested for the calls to be of interest). I've also added throwable handling to the basic timing code, so that when a method exits through a throw rather than a normal return the basic thrown information will be included in the trace.

Finally, I changed the formatting of the trace output to make it a little more readable. Instead of just using spaces for indenting, this code will alternate spaces with hyphens, and when a method call is at the deepest level of nesting it'll be printed with a leading asterisk rather than the less-than sign (<) used to show the exit from a nesting level.


Analyzing Axis

With the new and improved performance tracing aspect of Listing 4 set to go, all I need is a target application to examine under my aspect microscope. I've chosen the Apache Axis Web services framework, running a client application to avoid complicating everything with server code. Because my main line of work is Web services, Axis has a natural appeal. I've spent at least a couple of weeks going through the Axis code previously, so I've got a decent familiarity with the structure -- but I also remember how painful it was when I first tried to track down the often convoluted paths taken by the code.

For readers who aren't working with Web services, here's a quick overview. Axis is an implementation of the JAX-RPC standard for Java Web services. It supports both client applications (which originate requests in the form of XML messages) and server applications (which generate XML response messages for the received requests), with most of the Axis code shared by the two sides. Given the nature of a Web services message exchange, it's pretty clear how the framework has to function at a high level. On the client side, this point could be summarized as follows: Accept the data for a request from the user code, combine that data with configuration information to generate the request XML, send the request and wait for a response, extract the data from a response, and finally deliver that data to the user code. It's the details of how this processing actually occurs within the Axis framework that tend to be very confusing.

I've got an existing Axis client application I've used in prior Web services performance tests (see Resources for a link), so for this column I'll look at using the performance tracing aspect with that client application. I'll also run the corresponding Axis server application so the client has something to generate responses, but won't include that in the performance trace. The actual purpose of the application is to retrieve information from a database of earthquakes on the server.

The view from on high

For a first look at Axis execution, I want to see a basic time distribution that gives me an idea which parts of the code are taking the longest to execute (along with the paths that led to those parts, of course). To accomplish this task, I'll use a configuration for my aspect that tells it to intercept every method call within both my own application code and the Axis code, but only to go to a maximum nesting depth of seven -- that way I won't be overwhelmed by the amount of detail. Listing 5 gives the AspectWerkz configuration file I'll use:


Listing 5. First look configuration

<aspectwerkz>
  <system id="AspectWerkzExample">
    <package name="com.sosnoski.aspectwerkz">
      <aspect class="TraceAspect">
        <pointcut name="traceexecution">
          execution(* org.apache.axis..*(..)) || execution(* com.sosnoski.seismic..*(..))
        </pointcut>
        <advice name="trace" type="around" bind-to="traceexecution"/>
        <param name="maximum-depth" value="7"/>
      </aspect>
    </package>
  </system>
</aspectwerkz>

Most of the Listing 5 configuration should look familiar to readers of my last column. The main differences from the configurations I used then are that I'm using a more complex expression to define the pointcut for my aspect, and I've added a parameter value to define the maximum nesting depth. In my earlier configurations, I used an attribute of the pointcut element to define my expression; with the longer expression, it's easier to make it the content of the element. AspectWerkz allows both forms to be used interchangeably. The pointcut expression says I want to include all method calls in all classes from packages in both the org.apache.axis and com.sosnoski.seismic hierarchies.

For my testing with Axis, I normally run multiple passes (request-response exchanges) through the code, so that the first pass will handle initializations and the following passes can run at full speed. In the tests with my performance trace aspect, I'll just use two passes, and I'll only show the output from the second pass. Listing 6 gives this output when the program is run through AspectWerkz using the Listing 5 configuration. Note that even though I'm using nanoTime() for timing, I've converted all the time values to milliseconds to keep the sizes of the numbers reasonable for printing.


Listing 6. First look performance trace

Running query for date range from 2002-04-02 13:12:53 to 2002-12-15 04:08:40,
  longitude range from 40.71031 to 160.91571,
  latitude range from -70.6799 to 22.293274,
 ->com.sosnoski.seismic.axislit.AxisDocLitClient.runQuery
 - >com.sosnoski.seismic.axislit.SeismicBindingStub.matchQuakes
 - ->org.apache.axis.client.Call.invoke
 - - >org.apache.axis.client.Call.invoke
 - - ->org.apache.axis.client.Call.invoke
 - - - *org.apache.axis.client.Call.invoke took 591.4 ms.
 - - - *org.apache.axis.message.RPCElement.getParams took 1447.0 ms.
 - - -<org.apache.axis.client.Call.invoke took 2042.0 ms.
 - - <org.apache.axis.client.Call.invoke took 2042.9 ms.
 - -<org.apache.axis.client.Call.invoke took 2043.1 ms.
 - ->org.apache.axis.client.Stub.extractAttachments
 - - >org.apache.axis.Message.getAttachments
 - - ->org.apache.axis.attachments.AttachmentsImpl.getAttachmentCount
 - - - *org.apache.axis.SOAPPart.saveChanges took 155.7 ms.
 - - -<org.apache.axis.attachments.AttachmentsImpl.getAttachmentCount took 157.5 ms.
 - - <org.apache.axis.Message.getAttachments took 157.7 ms.
 - -<org.apache.axis.client.Stub.extractAttachments took 157.9 ms.
 - <com.sosnoski.seismic.axislit.SeismicBindingStub.matchQuakes took 2202.4 ms.
 -<com.sosnoski.seismic.axislit.AxisDocLitClient.runQuery took 2202.9 ms.
Results from query:
Seismic region New Zealand Region has 1 regions and 2 matching quakes
Seismic region Bismarck and Solomon Islands has 4 regions and 30 matching quakes
Seismic region New Guinea has 9 regions and 55 matching quakes
Seismic region Caroline Islands Area has 2 regions and 8 matching quakes
Seismic region Guan to Japan has 2 regions and 10 matching quakes
Seismic region Taiwan Area has 1 regions and 2 matching quakes
Seismic region Philippine Islands has 5 regions and 22 matching quakes
Seismic region Bornea - Sulawesi has 10 regions and 37 matching quakes
Seismic region Sunda Arc has 12 regions and 49 matching quakes
Seismic region Myanmar and Southeast Asia has 1 regions and 2 matching quakes
Seismic region India - Xizand - Sichuan - Yunnan has 1 regions and 2 matching quakes
Seismic region Indian Ocean has 9 regions and 27 matching quakes
Seismic region Africa has 3 regions and 3 matching quakes
Seismic region Macquarie Loop has 1 regions and 2 matching quakes
Seismic region Andaman Islands to Sumatera has 4 regions and 14 matching quakes
 -*com.sosnoski.seismic.axislit.AxisDocLitClient.summarize took 1.7 ms.
Result match count 265 in 2203 ms.

My client test program prints results to the console, so these come out intermixed with the performance trace information in Listing 6. Ignoring the client program output, the trace information shows me at a glance where the time is being spent -- more than 90 percent of the total time was spent in the org.apache.axis.client.Call.invoke() method (the trace output shown in bold type in Listing 6). Less than 30 percent of that time was spent in a call to another method with the same name (or in a recursive call to the same method -- I can't really tell from the trace), while essentially all the remaining time was spent in a call to the org.apache.axis.message.RPCElement.getParams() method.

Paying the overhead

By comparing the total execution time for the request shown in the last line of Listing 6 (2203 milliseconds) with the timing for requests run without going through AspectWerkz, I can gain an idea of how much overhead I've added to the program with my monitoring. In this case, it's about a factor of two -- the times when I run the program directly are typically in the range of 1000 to 1100 milliseconds.

Why is the program running only half as fast with the aspect applied? There are two separate components to the added overhead. The first is the overhead added by AspectWerkz's bytecode manipulations, which allow it to call my aspect anytime one of the target methods is being called. The second is the overhead added by my timing code and by the trace output to the console.

I can see how these two components compare by commenting out everything except return join.proceed(); in the body of my trace() method, so that the only overhead is that added by the method call interceptions. It turns out it runs only about 10 percent faster than the full output shown in Listing 6, so I know that in this case almost all the overhead is due to the calls to my aspect.

I'd be able to reduce that overhead if I could be more selective about the pointcut methods -- for instance, to eliminate interceptions of trivial get methods that just return a field value. These trivial get methods probably make up the bulk of the method calls, so eliminating them would likely lower the overhead substantially. Unfortunately there's no consistent way to do that with the Axis code based on method names, which follows the standard Java language idiom of using get-type names for both trivial and complex operations. AspectWerkz could easily distinguish these trivial methods from ones that did at least some amount of processing at the time it's applying the aspect, but there's no way to tell it to exclude these methods from the pointcut. So I'm stuck with the interception overhead unless I want to either reduce the number of classes in the pointcut or list methods individually to be excluded from the pointcut. Neither is a very appealing option, so I'll live with the overhead and see how much useful information I can gain before my approach breaks down.

Drilling down

Referring back to Listing 6, more than 90 percent of the total time was spent in the org.apache.axis.client.Call.invoke() method. That seems like a natural target for a more detailed performance trace. I'll use the Listing 7 configuration to get a more detailed view:


Listing 7. Drill-down configuration

<aspectwerkz>
  <system id="AspectWerkzExample">
    <package name="com.sosnoski.aspectwerkz">
      <aspect class="TraceAspect">
        <pointcut name="traceexecution">
          execution(* org.apache.axis..*(..)) || execution(* com.sosnoski.seismic..*(..))
        </pointcut>
        <advice name="trace" type="around" bind-to="traceexecution"/>
        <param name="maximum-depth" value="5"/>
        <param name="trigger-method" value="org.apache.axis.client.Call.invoke"/>
      </aspect>
    </package>
  </system>
</aspectwerkz>

The Listing 7 configuration keeps the same pointcut definition as before, but adds a trigger method parameter for the aspect and reduces the maximum nesting depth to 5. Listing 8 gives the output generated by this trace configuration (without the client program output, which stays the same):


Listing 8. Drill-down trace output

>org.apache.axis.client.Call.invoke
 >org.apache.axis.client.Call.invoke
 ->org.apache.axis.client.Call.invoke
 - >org.apache.axis.client.Call.invoke
 - ->org.apache.axis.MessageContext.setTargetService
 - - *org.apache.axis.AxisEngine.getService took 14.9 ms.
 - -<org.apache.axis.MessageContext.setTargetService took 16.7 ms.
 - ->org.apache.axis.client.Transport.setupMessageContext
 - - *org.apache.axis.transport.http.HTTPTransport.setupMessageContextImpl took 2.8 ms.
 - -<org.apache.axis.client.Transport.setupMessageContext took 3.4 ms.
 - ->org.apache.axis.message.MessageElement.output
 - - *org.apache.axis.message.SOAPEnvelope.outputImpl took 5.8 ms.
 - -<org.apache.axis.message.MessageElement.output took 6.5 ms.
 - ->org.apache.axis.client.Call.invokeEngine
 - - *org.apache.axis.client.AxisClient.invoke took 572.3 ms.
 - -<org.apache.axis.client.Call.invokeEngine took 573.4 ms.
 - >org.apache.axis.client.Call.invoke took 602.6 ms.
 - >org.apache.axis.message.RPCElement.getParams
 - ->org.apache.axis.message.RPCElement.deserialize
 - - *org.apache.axis.message.MessageElement.publishToHandler took 1447.2 ms.
 - -<org.apache.axis.message.RPCElement.deserialize took 1449.0 ms.
 - <org.apache.axis.message.RPCElement.getParams took 1449.4 ms.
 -<org.apache.axis.client.Call.invoke took 2054.9 ms.
 <org.apache.axis.client.Call.invoke took 2057.1 ms.
<org.apache.axis.client.Call.invoke took 2057.4 ms.

This more detailed view of the execution time is starting to show some good details. I can now see that the innermost call to org.apache.axis.client.Call.invoke() is the call that wraps the request processing, including the call to the server. This call has to wait for the server response before returning, so clearly there is a lot of processing going on that's outside what's being covered in the trace -- all the server code is running in a separate JVM, without AspectWerkz. The real client processing load is showing up in the org.apache.axis.message.RPCElement.getParams() call, and I can see that essentially all of that processing time is spent in a call to org.apache.axis.message.MessageElement.publishToHandler().

Beyond the limits

I'll try drilling down further, into the org.apache.axis.message.MessageElement.publishToHandler() method identified by the Listing 8 output as the place where most of the client processing occurs. Listing 9 shows a configuration to do this. Here I've dropped my client code from the pointcut, because it's no longer of interest, and I've also lowered the minimum time for a method call to be included in the trace down to .4 milliseconds (expressed in nanoseconds). Finally, I changed the trigger method used to activate the trace to the new method of interest.


Listing 9. Second-level drill-down configuration

<aspectwerkz>
  <system id="AspectWerkzExample">
    <package name="com.sosnoski.aspectwerkz">
      <aspect class="TraceAspect">
        <pointcut name="traceexecution"
          expression="execution(* org.apache.axis..*(..))"/>
        <advice name="trace" type="around" bind-to="traceexecution"/>
        <param name="minimum-time" value="400000"/>
        <param name="maximum-depth" value="5"/>
        <param name="trigger-method" value="org.apache.axis.message.MessageElement.publishToHandler"/>
      </aspect>
    </package>
  </system>
</aspectwerkz>

Listing 10 shows a sample of the output generated by the Listing 9 trace configuration. The full output is very long and very repetitive, which is why I've only included a sample. At this point we're looking at the details of the Axis XML message handling, and the times taken by individual method calls are really not very significant -- it's the sheer number of calls made in processing all the individual components of the message that add up.


Listing 10. Second-level drill-down trace output

>org.apache.axis.message.MessageElement.publishToHandler
 >org.apache.axis.message.SAX2EventRecorder.replay
 ->org.apache.axis.encoding.DeserializationContext.startElement
 - *org.apache.axis.message.RPCHandler.onStartChild took 0.7 ms.
 - >org.apache.axis.encoding.ser.BeanDeserializer.startElement
 - ->org.apache.axis.encoding.DeserializerImpl.startElement
 - - *org.apache.axis.encoding.ser.BeanDeserializer.onStartElement took 12.8 ms.
 - -<org.apache.axis.encoding.DeserializerImpl.startElement took 13.1 ms.
 - <org.apache.axis.encoding.ser.BeanDeserializer.startElement took 13.1 ms.
 -<org.apache.axis.encoding.DeserializationContext.startElement took 15.9 ms.
 ->org.apache.axis.encoding.DeserializationContext.startElement
 - *org.apache.axis.encoding.ser.BeanDeserializer.onStartChild took 0.6 ms.
 -<org.apache.axis.encoding.DeserializationContext.startElement took 1.0 ms.
 -*org.apache.axis.encoding.DeserializationContext.startElement took 0.6 ms.
 -*org.apache.axis.encoding.DeserializationContext.endElement took 0.5 ms.
 ->org.apache.axis.encoding.DeserializationContext.startElement
 - *org.apache.axis.encoding.ser.BeanDeserializer.onStartChild took 0.5 ms.
 -<org.apache.axis.encoding.DeserializationContext.startElement took 0.9 ms.
 ->org.apache.axis.encoding.DeserializationContext.startElement
 - >org.apache.axis.encoding.ser.BeanDeserializer.onStartChild
 - -*org.apache.axis.encoding.DeserializationContext.getTypeFromAttributes took 1.3 ms.
 - <org.apache.axis.encoding.ser.BeanDeserializer.onStartChild took 1.9 ms.
 -<org.apache.axis.encoding.DeserializationContext.startElement took 2.5 ms.
 -*org.apache.axis.encoding.DeserializationContext.endElement took 0.6 ms.
 ->org.apache.axis.encoding.DeserializationContext.startElement
 - *org.apache.axis.encoding.ser.BeanDeserializer.onStartChild took 0.4 ms.
 -<org.apache.axis.encoding.DeserializationContext.startElement took 0.8 ms.
 ->org.apache.axis.encoding.DeserializationContext.startElement
 - >org.apache.axis.encoding.ser.BeanDeserializer.startElement
 - ->org.apache.axis.encoding.DeserializerImpl.startElement
 - - *org.apache.axis.encoding.ser.BeanDeserializer.onStartElement took 121.8 ms.
 - -<org.apache.axis.encoding.DeserializerImpl.startElement took 123.3 ms.
 - <org.apache.axis.encoding.ser.BeanDeserializer.startElement took 123.3 ms.
 -<org.apache.axis.encoding.DeserializationContext.startElement took 123.9 ms.
 ->org.apache.axis.encoding.DeserializationContext.endElement
 - *org.apache.axis.encoding.DeserializerImpl.endElement took 0.5 ms.
 -<org.apache.axis.encoding.DeserializationContext.endElement took 0.8 ms.
 ...

Listing 10 demonstrates the limits of what I can usefully do with my performance trace aspect. It's great for quickly getting a perspective on the major time sinks in an application's execution, but may not be very useful in actually isolating which code is running slow and can be improved. The latter issue is one that conventional execution profilers address very well, though, so all is not lost. Both performance tracing and execution profilers can be useful tools in analyzing and improving application performance.


Conclusions

This column shows how trace-type aspects can supply useful information about the execution of a program. Can they replace logging statements embedded in the program code? Not completely, in my view, because logging is often used to record specific information of interest to the developer or user that can only be accessed inside the logic of a method (where pointcuts cannot be defined). This information may not even be generated if logging is disabled, for performance reasons. The Axis code I analyzed in this column contains many logging statements providing this type of specific information.

Other types of logging statements consist of simple method entry and exit information. It'd be easy to replace these with an external aspect that could be activated at runtime. This aspect could be made to use the same logging framework as the embedded logging code so that the merged output gave a combined view of the program execution. I'm not going to write an aspect of this type, but hopefully these columns have made it clear how you can use this approach in your application.

Next month I'll look at the ASM bytecode manipulation framework. ASM is an alternative to the Javassist and BCEL frameworks I covered in my earlier articles on classworking. It makes some impressive performance claims, so I'll look into both performance and ease of use, rewriting one of the same examples I used for Javassist and BCEL to instead work with ASM.



Download

NameSizeDownload method
j-cwt04055-code.zip9 KB HTTP

Information about download methods


Resources

  • Be sure to read all the installments in the Classworking toolkit series by Dennis Sosnoski.

  • For more information about AspectWerkz, go straight to the project home page for comprehensive documentation and links to articles all around the Web.

  • Learn about the AspectJ aspect-oriented extension to the Java language in "Improve modularity with aspect-oriented programming" by Nicholas Lesiecki (developerWorks, January 2002).

  • Find out how AOP can add teeth to binding agreements between components in "Contract enforcement with AOP" by Filippo Diotalevi (developerWorks, July 2004).

  • Are your components suffering from a lack of flexibility? Then Andrew Glover's "AOP banishes the tight-coupling blues" (developerWorks, February 2004) may be just what you need to help shake out the kinks.

  • If you're looking for help in understanding and maintaining existing systems, check out "Use AOP to maintain legacy Java applications" by Abhijit Belapurkar (developerWorks, March 2004) to see how AOP can help.

  • "A look at aspect-oriented programming" by Gary Pollice (developerWorks, January 2004) looks beyond the details of AOP to consider its impact on the software development process and the acceptance barriers faced by AOP.

  • If you're already knowledgeable on AOP and want to move beyond the basics, check out the AOP@Work series, penned by six AOP experts.

  • The Apache Axis framework provides a widely used open source implementation of the JAX-RPC Java Web services standard.

  • Check out the author's Web Service Performance comparison to see how different toolkits and styles perform, both relative to other Web services toolkits and to Java RMI.

  • Read the entire Java programming dynamics series by author Dennis Sosnoski, which takes you on a tour of the Java class structure, reflection, and classworking.

  • To learn more about Java technology, visit the developerWorks Java zone. You'll find technical documentation, how-to articles, education, downloads, product information, and more.

  • Visit the New to Java technology site for the latest resources to help you get started with Java programming.

  • Get involved in the developerWorks community by participating in developerWorks blogs.

  • Browse for books on these and other technical topics.

About the author

Author photo

Dennis Sosnoski is the founder and lead consultant of Seattle-area Java technology consulting company Sosnoski Software Solutions, Inc., specialists in XML and Web services training and consulting. His professional software development experience spans over 30 years, with the last several years focused on server-side XML and Java technologies. Dennis is a frequent speaker at conferences nationwide. He's also the lead developer of the open source JiBX XML Data Binding framework built around the Java classworking technology.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=57772
ArticleTitle=Classworking toolkit: Performance tracing with aspects
publish-date=04052005
author1-email=dms@sosnoski.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers