Last month I showed how you could use the AspectWerkz framework to implement a basic performance tracing aspect. This month I'm going to take this aspect further, first adding some convenience features and then applying it in the analysis of a complex real world body of code -- the Apache Axis Web services framework. You'll see both the benefits and the limitations of my aspect-oriented approach to performance tracing, and end up with what I see as the general usefulness of aspect-oriented programming (AOP) to logging and tracing in general.
Before I get down and dirty with Axis, there's one other issue to cover. Both in the performance tracing aspect from last month and in my earlier articles on Java programming dynamics (see Resources) I've used method timing information to look at application performance issues. This approach is one I've personally found very convenient on a number of occasions, but if I'm going to be demonstrating it as much as I have, some side issues deserve to be discussed. I'll take care of those issues in the first part of this column.
The main purposes for my performance tracing aspect is to monitor the execution of a program and give a top-down view of where the execution time is being spent (as opposed to the bottom-up view you generally get with a profiler). To really apply it effectively is going to require some understanding of how time is being measured.
The Java programming language has had one timing method available as part of the standard API since
the earliest releases. This method is java.lang.System.currentTimeMillis(), which returns the
elapsed time in milliseconds since midnight, January 1, 1970 UTC. This method
is very useful, especially because it's available on every version of the
Java platform. I used it in my initial aspect demonstration code last month for
this reason. However, currentTimeMillis() does have
some drawbacks.
One issue with currentTimeMillis() is that even
though the time value is returned as a count of milliseconds, there are no
guarantees about the granularity or accuracy of the count returned. On some
systems the time is a true count of milliseconds, changing approximately every
millisecond. On other systems the time may only change at intervals of 10 or 20
milliseconds, and the accuracy may vary from one change to the next.
Another limitation is just the basic timescale inherent in the method definition. Back when the Java platform first came into use, a millisecond was still a fairly short time interval (especially for an interpreted language). Today, with CPU clock speeds roughly a hundred times faster than when the Java platform was first released, a millisecond has become a relatively long time span.
Beyond these issues with currentTimeMillis(),
there's also another concern with using a timer method of any form for
monitoring program execution. The concern is that the timer method itself --
which probably ends up needing to go down to the operating system level to get
access to a timer -- adds some overhead, and if the timer method overhead is too
large (or the method is called too often), it's going to seriously distort the
timing measurements. To make intelligent use of my performance tracing
aspect, I need to have at least a rough idea of how much overhead is added by the
timer method call.
How much do I need to be concerned about using currentTimeMillis()? To find out, I wrote a program to
perform some tests. This program, shown in Listing 1, takes a
very simple approach to checking the granularity, accuracy, and overhead of
currentTimeMillis() by repeatedly calling the method
until the return value changes. The magnitude of the changes tells me the
granularity of the timer; the number of calls between changes lets me derive the
overhead per call; and the consistency of both the magnitude of the changes and
the number of calls between changes gives me a check on the accuracy.
Listing 1. currentTimeMillis() granularity
public class MilliGran
{
// round value to a single digit after decimal point
private static String roundOneDecimal(double value) {
int whole = (int)value;
return whole + "." + (int)((value-whole)*10.0);
}
// print average and deviation with lead text
private static void print(String lead, double sum,
double ssqr, int reps) {
double mean = sum / reps;
double stddev = Math.sqrt(ssqr/reps-mean*mean);
System.out.println(lead + roundOneDecimal(mean) +
", with standard deviation " + roundOneDecimal(stddev));
}
public static void main(String[] argv) {
// get control values from command line
int reps = Integer.parseInt(argv[0]);
int sets = Integer.parseInt(argv[1]);
// loop until time rolls over
long start = System.currentTimeMillis();
long last;
while ((last = System.currentTimeMillis()) == start);
// loop for count of sets specified
int[] diffs = new int[reps];
for (int i = 0; i < sets; i++) {
double lsum = 0.0;
double lsumsq = 0.0;
double dsum = 0.0;
double dsumsq = 0.0;
for (int j = 0; j < reps; j++) {
// count loops until result changes
int loop = 0;
last = System.currentTimeMillis();
long now;
do {
loop++;
now = System.currentTimeMillis();
} while (now == last);
// accumulate statistics information
lsum += loop;
lsumsq += loop*loop;
int diff = (int)(now-last);
dsum += diff;
dsumsq += diff*diff;
diffs[j] = diff;
}
// print results for set
System.out.println("Set " + i + " results:");
print(" Average time change ", dsum, dsumsq, reps);
print(" Average calls before change ", lsum, lsumsq, reps);
System.out.println(" Average time per call " +
roundOneDecimal(dsum / lsum * 1000000.0) + " nanoseconds");
System.out.print(" Differences:");
for (int j = 0; j < reps; j++) {
System.out.print(" " + diffs[j]);
}
System.out.println();
}
}
}
|
Listing 2 shows the results from running this program
on my notebook computer (a relatively slow 1GHz PIIIm, running Mandrake
Linux 10.0) using JDK 1.4.2. The results show the time changes with currentTimeMillis() are always a single millisecond
over this sample, and the overhead of calling the method averages only about 3
microseconds. Because both these values are consistent (within 1 percent in this
case) it's at least reasonable to assume that the timer value is accurate.
Listing 2. Results from running the currentTimeMillis() test
[dennis@notebook time]$ java -cp . MilliGran 10 4 Set 0 results: Average time change 1.0, with standard deviation 0.0 Average calls before change 332.8, with standard deviation 39.4 Average time per call 3004.8 nanoseconds Differences: 1 1 1 1 1 1 1 1 1 1 Set 1 results: Average time change 1.0, with standard deviation 0.0 Average calls before change 338.6, with standard deviation 38.7 Average time per call 2953.3 nanoseconds Differences: 1 1 1 1 1 1 1 1 1 1 Set 2 results: Average time change 1.0, with standard deviation 0.0 Average calls before change 333.6, with standard deviation 33.5 Average time per call 2996.7 nanoseconds Differences: 1 1 1 1 1 1 1 1 1 1 Set 3 results: Average time change 1.0, with standard deviation 0.0 Average calls before change 336.8, with standard deviation 41.4 Average time per call 2969.1 nanoseconds Differences: 1 1 1 1 1 1 1 1 1 1 |
In JDK 5.0, Sun added an alternative method to provide better timing
information: java.lang.System.nanoTime(). As the
name implies, nanoTime() returns a time value in
nanoseconds, providing a precision a million times greater than the
currentTimeMillis() method. Even though most of today's systems
will not provide times with nanosecond resolution, the actual values should at
least be much better than can be obtained with the older method.
It's easy to modify the Listing 1 code to use the
nanoTime() method instead of currentTimeMillis(). I won't show the modified code here
(though it's included in the code download from the Resources section),
but Listing 3 shows the results from running the modified program, this time using JDK 5.0:
Listing 3. Results from running the nanoTime() test
[dennis@notebook time]$ java -cp . NanoGran 10 4 Set 0 results: Average time change 2800.0, with standard deviation 400.0 Average calls before change 1.0, with standard deviation 0.0 Average time per call 2800.0 nanoseconds Differences: 3000 3000 3000 3000 2000 2000 3000 3000 3000 3000 Set 1 results: Average time change 2900.0, with standard deviation 300.0 Average calls before change 1.0, with standard deviation 0.0 Average time per call 2900.0 nanoseconds Differences: 3000 3000 3000 3000 3000 3000 2000 3000 3000 3000 Set 2 results: Average time change 2700.0, with standard deviation 458.2 Average calls before change 1.0, with standard deviation 0.0 Average time per call 2700.0 nanoseconds Differences: 2000 3000 3000 2000 3000 3000 3000 3000 3000 2000 Set 3 results: Average time change 2900.0, with standard deviation 300.0 Average calls before change 1.0, with standard deviation 0.0 Average time per call 2900.0 nanoseconds Differences: 3000 3000 3000 3000 2000 3000 3000 3000 3000 3000 |
As you can see in the Listing 3 results, the nanoTime() method runs slightly faster than currentTimeMillis(). But more important, it returns a
different time value each time it's called, with apparent granularity of 1000
nanoseconds (or 1 microsecond). How can I tell the granularity? While I can't be
certain based on these results, the fact that the differences between the
result values returned by successive calls to nanoTime() is always either 2 or 3 microseconds is a pretty
good indication.
It's easy to change the performance tracing aspect from the last column to
use nanoTime(). While I'm at it, I'll make some other
changes to improve the usability of the code and make it more flexible for my
testing. Listing 4 gives the modified code in its entirety:
Listing 4. Performance tracing aspect code
public class TraceAspect
{
private static final long DEFAULT_MINIMUM = 1000000;
private final long m_minimumNanoseconds;
private final int m_maximumNesting;
private final String m_triggerClass;
private final String m_triggerName;
private int m_matchDepth;
private int m_nestingDepth;
private ArrayList m_pendingPrints = new ArrayList();
public TraceAspect(AspectContext context) {
// set minimum time value for tracing a method
String value = context.getParameter("minimum-time");
m_minimumNanoseconds = (value == null) ?
DEFAULT_MINIMUM : Long.parseLong(value);
// set maximum depth to be printed
value = context.getParameter("maximum-depth");
m_maximumNesting = (value == null) ?
Integer.MAX_VALUE : Integer.parseInt(value);
// set trigger class and method name if supplied
value = context.getParameter("trigger-method");
if (value == null) {
m_matchDepth = 1;
m_triggerClass = null;
m_triggerName = null;
} else {
int split = value.lastIndexOf('.');
m_triggerClass = value.substring(0, split);
m_triggerName = value.substring(split+1);
}
}
private void indent(int depth) {
for (int i = 0; i < depth; i++) {
System.out.print((i % 2 == 0) ? ' ' : '-');
}
}
private void indent() {
indent(m_nestingDepth);
}
private void printFullMethod(JoinPoint call) {
MethodSignature signature = (MethodSignature)call.getSignature();
System.out.print(signature.getDeclaringType().getName());
System.out.print('.');
System.out.print(signature.getName());
}
public Object trace(JoinPoint join) throws Throwable {
// check for match on trigger
boolean match = false;
MethodSignature signature = (MethodSignature)join.getSignature();
if (signature.getName().equals(m_triggerName)) {
if (signature.getDeclaringType().getName().equals(m_triggerClass)) {
match = true;
m_matchDepth++;
}
}
// check if execution is to be processed
if (m_matchDepth == 0 || m_nestingDepth > m_maximumNesting) {
// skip any timing or printing
return join.proceed();
} else {
// add joinpoint to list pending print
m_pendingPrints.add(join);
m_nestingDepth++;
// time joinpoint execution
long start = System.nanoTime();
Object result = null;
Throwable thrown = null;
try {
result = join.proceed();
} catch (Throwable t) {
thrown = t;
} finally {
// adjust depths
m_nestingDepth--;
if (match) {
m_matchDepth--;
}
// remove joinpoint if still on pending list
// (will always be last in list)
boolean entered = true;
if (m_pendingPrints.size() > 0) {
m_pendingPrints.remove(m_pendingPrints.size()-1);
entered = false;
}
// check if execution time above cutoff
long time = System.nanoTime() - start;
if (time >= m_minimumNanoseconds || thrown != null) {
// print all unprinted "entering" lines
int count = m_pendingPrints.size();
for (int i = 0; i < count; i++) {
// print entering information for nesting call
// (nesting depth based on start of pendings)
indent(m_nestingDepth-count+i);
JoinPoint prior = (JoinPoint)m_pendingPrints.get(i);
System.out.print('>');
printFullMethod(prior);
System.out.println();
}
// clear all pendings now that they're printed
m_pendingPrints.clear();
// print exiting information including time
indent();
System.out.print(entered ? '<' : '*');
printFullMethod(join);
System.out.print(" took ");
long tenthmillis = (time+50000)/100000;
System.out.print(tenthmillis / 10);
System.out.print('.');
System.out.print(tenthmillis % 10);
System.out.print(" ms.");
// append exception information
if (thrown != null) {
System.out.println(" threw " +
thrown.getClass().getName() +
": " + thrown.getMessage());
throw thrown;
} else {
System.out.println();
}
}
}
return result;
}
}
}
|
The basic principles are still the same as
last month, but this code
adds a few bells and a couple of whistles. Besides using nanoTime(), I've also added three optional configuration
parameters to the aspect definition. These parameters control:
- The minimum measured time for a method call to be included in the trace output
- The maximum depth of nesting to be included in the output
- A trigger method used to turn on the trace so that only the calls made by this method (directly or indirectly) will be included in the output
The added constructor code uses a signature defined by AspectWerkz 2.0 to get access to context information, which includes any parameter definitions for the aspect. I'll demonstrate how these parameters are defined in the XML configuration files as they come into use.
At the beginning of the actual trace() method, I've added
logic to test for a match on the trigger method name and class, and to just
directly execute the joinpoint method call without timing if I know the call is
not going to be printed (either because the trigger method isn't in the current
nesting of calls, or because we've gotten too deeply nested for the calls to be
of interest). I've also added throwable handling to the basic timing code, so
that when a method exits through a throw rather than a normal return the basic thrown
information will be included in the trace.
Finally, I changed the formatting of the trace output to make it a little more readable. Instead of just using spaces for indenting, this code will alternate spaces with hyphens, and when a method call is at the deepest level of nesting it'll be printed with a leading asterisk rather than the less-than sign (<) used to show the exit from a nesting level.
With the new and improved performance tracing aspect of Listing 4 set to go, all I need is a target application to examine under my aspect microscope. I've chosen the Apache Axis Web services framework, running a client application to avoid complicating everything with server code. Because my main line of work is Web services, Axis has a natural appeal. I've spent at least a couple of weeks going through the Axis code previously, so I've got a decent familiarity with the structure -- but I also remember how painful it was when I first tried to track down the often convoluted paths taken by the code.
For readers who aren't working with Web services, here's a quick overview. Axis is an implementation of the JAX-RPC standard for Java Web services. It supports both client applications (which originate requests in the form of XML messages) and server applications (which generate XML response messages for the received requests), with most of the Axis code shared by the two sides. Given the nature of a Web services message exchange, it's pretty clear how the framework has to function at a high level. On the client side, this point could be summarized as follows: Accept the data for a request from the user code, combine that data with configuration information to generate the request XML, send the request and wait for a response, extract the data from a response, and finally deliver that data to the user code. It's the details of how this processing actually occurs within the Axis framework that tend to be very confusing.
I've got an existing Axis client application I've used in prior Web services performance tests (see Resources for a link), so for this column I'll look at using the performance tracing aspect with that client application. I'll also run the corresponding Axis server application so the client has something to generate responses, but won't include that in the performance trace. The actual purpose of the application is to retrieve information from a database of earthquakes on the server.
For a first look at Axis execution, I want to see a basic time distribution that gives me an idea which parts of the code are taking the longest to execute (along with the paths that led to those parts, of course). To accomplish this task, I'll use a configuration for my aspect that tells it to intercept every method call within both my own application code and the Axis code, but only to go to a maximum nesting depth of seven -- that way I won't be overwhelmed by the amount of detail. Listing 5 gives the AspectWerkz configuration file I'll use:
Listing 5. First look configuration
<aspectwerkz>
<system id="AspectWerkzExample">
<package name="com.sosnoski.aspectwerkz">
<aspect class="TraceAspect">
<pointcut name="traceexecution">
execution(* org.apache.axis..*(..)) || execution(* com.sosnoski.seismic..*(..))
</pointcut>
<advice name="trace" type="around" bind-to="traceexecution"/>
<param name="maximum-depth" value="7"/>
</aspect>
</package>
</system>
</aspectwerkz>
|
Most of the Listing 5 configuration should look familiar
to readers of my last column. The main differences from the configurations I
used then are that I'm using a more complex expression to define the
pointcut for my aspect, and I've added a parameter value to define the maximum
nesting depth. In my earlier configurations, I used an attribute of the
pointcut element to define my expression; with the longer expression, it's
easier to make it the content of the element. AspectWerkz allows both forms to
be used interchangeably. The pointcut expression says I want to include all
method calls in all classes from packages in both the org.apache.axis and com.sosnoski.seismic hierarchies.
For my testing with Axis, I normally run multiple passes (request-response
exchanges) through the code, so that the first pass will handle initializations
and the following passes can run at full speed. In the tests with my performance
trace aspect, I'll just use two passes, and I'll only show the output from the
second pass. Listing 6 gives this output when the program is
run through AspectWerkz using the Listing 5 configuration.
Note that even though I'm using nanoTime() for timing,
I've converted all the time values to milliseconds to keep
the sizes of the numbers reasonable for printing.
Listing 6. First look performance trace
Running query for date range from 2002-04-02 13:12:53 to 2002-12-15 04:08:40, longitude range from 40.71031 to 160.91571, latitude range from -70.6799 to 22.293274, ->com.sosnoski.seismic.axislit.AxisDocLitClient.runQuery - >com.sosnoski.seismic.axislit.SeismicBindingStub.matchQuakes - ->org.apache.axis.client.Call.invoke - - >org.apache.axis.client.Call.invoke - - ->org.apache.axis.client.Call.invoke - - - *org.apache.axis.client.Call.invoke took 591.4 ms. - - - *org.apache.axis.message.RPCElement.getParams took 1447.0 ms. - - -<org.apache.axis.client.Call.invoke took 2042.0 ms. - - <org.apache.axis.client.Call.invoke took 2042.9 ms. - -<org.apache.axis.client.Call.invoke took 2043.1 ms. - ->org.apache.axis.client.Stub.extractAttachments - - >org.apache.axis.Message.getAttachments - - ->org.apache.axis.attachments.AttachmentsImpl.getAttachmentCount - - - *org.apache.axis.SOAPPart.saveChanges took 155.7 ms. - - -<org.apache.axis.attachments.AttachmentsImpl.getAttachmentCount took 157.5 ms. - - <org.apache.axis.Message.getAttachments took 157.7 ms. - -<org.apache.axis.client.Stub.extractAttachments took 157.9 ms. - <com.sosnoski.seismic.axislit.SeismicBindingStub.matchQuakes took 2202.4 ms. -<com.sosnoski.seismic.axislit.AxisDocLitClient.runQuery took 2202.9 ms. Results from query: Seismic region New Zealand Region has 1 regions and 2 matching quakes Seismic region Bismarck and Solomon Islands has 4 regions and 30 matching quakes Seismic region New Guinea has 9 regions and 55 matching quakes Seismic region Caroline Islands Area has 2 regions and 8 matching quakes Seismic region Guan to Japan has 2 regions and 10 matching quakes Seismic region Taiwan Area has 1 regions and 2 matching quakes Seismic region Philippine Islands has 5 regions and 22 matching quakes Seismic region Bornea - Sulawesi has 10 regions and 37 matching quakes Seismic region Sunda Arc has 12 regions and 49 matching quakes Seismic region Myanmar and Southeast Asia has 1 regions and 2 matching quakes Seismic region India - Xizand - Sichuan - Yunnan has 1 regions and 2 matching quakes Seismic region Indian Ocean has 9 regions and 27 matching quakes Seismic region Africa has 3 regions and 3 matching quakes Seismic region Macquarie Loop has 1 regions and 2 matching quakes Seismic region Andaman Islands to Sumatera has 4 regions and 14 matching quakes -*com.sosnoski.seismic.axislit.AxisDocLitClient.summarize took 1.7 ms. Result match count 265 in 2203 ms. |
My client test program prints results to the console, so these come out
intermixed with the performance trace information in Listing 6. Ignoring the
client program output, the trace information shows me at a
glance where the time is being spent -- more than 90 percent of the total time
was spent in the org.apache.axis.client.Call.invoke() method (the trace
output shown in bold type in Listing 6). Less than 30 percent of that time was spent in a call to another method
with the same name (or in a recursive call to the same method -- I can't really
tell from the trace), while essentially all the remaining time was spent in a
call to the org.apache.axis.message.RPCElement.getParams()
method.
By comparing the total execution time for the request shown in the last line of Listing 6 (2203 milliseconds) with the timing for requests run without going through AspectWerkz, I can gain an idea of how much overhead I've added to the program with my monitoring. In this case, it's about a factor of two -- the times when I run the program directly are typically in the range of 1000 to 1100 milliseconds.
Why is the program running only half as fast with the aspect applied? There are two separate components to the added overhead. The first is the overhead added by AspectWerkz's bytecode manipulations, which allow it to call my aspect anytime one of the target methods is being called. The second is the overhead added by my timing code and by the trace output to the console.
I can see how these two components compare by commenting out everything
except return join.proceed(); in the body of my
trace() method, so that the only overhead is that
added by the method call interceptions. It turns out it runs only about 10
percent faster than the full output shown in Listing 6, so
I know that in this case almost all the overhead is due to the calls to my
aspect.
I'd be able to reduce that overhead if I could be more selective about the pointcut methods -- for instance, to eliminate interceptions of trivial get methods that just return a field value. These trivial get methods probably make up the bulk of the method calls, so eliminating them would likely lower the overhead substantially. Unfortunately there's no consistent way to do that with the Axis code based on method names, which follows the standard Java language idiom of using get-type names for both trivial and complex operations. AspectWerkz could easily distinguish these trivial methods from ones that did at least some amount of processing at the time it's applying the aspect, but there's no way to tell it to exclude these methods from the pointcut. So I'm stuck with the interception overhead unless I want to either reduce the number of classes in the pointcut or list methods individually to be excluded from the pointcut. Neither is a very appealing option, so I'll live with the overhead and see how much useful information I can gain before my approach breaks down.
Referring back to Listing 6, more than 90 percent of the
total time was spent in the org.apache.axis.client.Call.invoke()
method. That seems like a natural target for a more detailed performance trace.
I'll use the Listing 7 configuration to get a more detailed view:
Listing 7. Drill-down configuration
<aspectwerkz>
<system id="AspectWerkzExample">
<package name="com.sosnoski.aspectwerkz">
<aspect class="TraceAspect">
<pointcut name="traceexecution">
execution(* org.apache.axis..*(..)) || execution(* com.sosnoski.seismic..*(..))
</pointcut>
<advice name="trace" type="around" bind-to="traceexecution"/>
<param name="maximum-depth" value="5"/>
<param name="trigger-method" value="org.apache.axis.client.Call.invoke"/>
</aspect>
</package>
</system>
</aspectwerkz>
|
The Listing 7 configuration keeps the same pointcut definition as before, but adds a trigger method parameter for the aspect and reduces the maximum nesting depth to 5. Listing 8 gives the output generated by this trace configuration (without the client program output, which stays the same):
Listing 8. Drill-down trace output
>org.apache.axis.client.Call.invoke >org.apache.axis.client.Call.invoke ->org.apache.axis.client.Call.invoke - >org.apache.axis.client.Call.invoke - ->org.apache.axis.MessageContext.setTargetService - - *org.apache.axis.AxisEngine.getService took 14.9 ms. - -<org.apache.axis.MessageContext.setTargetService took 16.7 ms. - ->org.apache.axis.client.Transport.setupMessageContext - - *org.apache.axis.transport.http.HTTPTransport.setupMessageContextImpl took 2.8 ms. - -<org.apache.axis.client.Transport.setupMessageContext took 3.4 ms. - ->org.apache.axis.message.MessageElement.output - - *org.apache.axis.message.SOAPEnvelope.outputImpl took 5.8 ms. - -<org.apache.axis.message.MessageElement.output took 6.5 ms. - ->org.apache.axis.client.Call.invokeEngine - - *org.apache.axis.client.AxisClient.invoke took 572.3 ms. - -<org.apache.axis.client.Call.invokeEngine took 573.4 ms. - >org.apache.axis.client.Call.invoke took 602.6 ms. - >org.apache.axis.message.RPCElement.getParams - ->org.apache.axis.message.RPCElement.deserialize - - *org.apache.axis.message.MessageElement.publishToHandler took 1447.2 ms. - -<org.apache.axis.message.RPCElement.deserialize took 1449.0 ms. - <org.apache.axis.message.RPCElement.getParams took 1449.4 ms. -<org.apache.axis.client.Call.invoke took 2054.9 ms. <org.apache.axis.client.Call.invoke took 2057.1 ms. <org.apache.axis.client.Call.invoke took 2057.4 ms. |
This more detailed view of the execution time is starting to show some good
details. I can now see that the innermost call to org.apache.axis.client.Call.invoke() is the call that wraps
the request processing, including the call to the server. This call has to wait for
the server response before returning, so clearly there is a lot of processing
going on that's outside what's being covered in the trace -- all the server code
is running in a separate JVM, without AspectWerkz. The real client processing
load is showing up in the org.apache.axis.message.RPCElement.getParams() call, and I
can see that essentially all of that processing time is spent in a call to org.apache.axis.message.MessageElement.publishToHandler().
I'll try drilling down further, into the org.apache.axis.message.MessageElement.publishToHandler() method
identified by the Listing 8 output as the place where most
of the client processing occurs. Listing 9 shows a
configuration to do this. Here I've dropped my client code from the pointcut,
because it's no longer of interest, and I've also lowered the minimum time for a
method call to be included in the trace down to .4 milliseconds (expressed in
nanoseconds). Finally, I changed the trigger method used to activate the trace
to the new method of interest.
Listing 9. Second-level drill-down configuration
<aspectwerkz>
<system id="AspectWerkzExample">
<package name="com.sosnoski.aspectwerkz">
<aspect class="TraceAspect">
<pointcut name="traceexecution"
expression="execution(* org.apache.axis..*(..))"/>
<advice name="trace" type="around" bind-to="traceexecution"/>
<param name="minimum-time" value="400000"/>
<param name="maximum-depth" value="5"/>
<param name="trigger-method" value="org.apache.axis.message.MessageElement.publishToHandler"/>
</aspect>
</package>
</system>
</aspectwerkz>
|
Listing 10 shows a sample of the output generated by the Listing 9 trace configuration. The full output is very long and very repetitive, which is why I've only included a sample. At this point we're looking at the details of the Axis XML message handling, and the times taken by individual method calls are really not very significant -- it's the sheer number of calls made in processing all the individual components of the message that add up.
Listing 10. Second-level drill-down trace output
>org.apache.axis.message.MessageElement.publishToHandler >org.apache.axis.message.SAX2EventRecorder.replay ->org.apache.axis.encoding.DeserializationContext.startElement - *org.apache.axis.message.RPCHandler.onStartChild took 0.7 ms. - >org.apache.axis.encoding.ser.BeanDeserializer.startElement - ->org.apache.axis.encoding.DeserializerImpl.startElement - - *org.apache.axis.encoding.ser.BeanDeserializer.onStartElement took 12.8 ms. - -<org.apache.axis.encoding.DeserializerImpl.startElement took 13.1 ms. - <org.apache.axis.encoding.ser.BeanDeserializer.startElement took 13.1 ms. -<org.apache.axis.encoding.DeserializationContext.startElement took 15.9 ms. ->org.apache.axis.encoding.DeserializationContext.startElement - *org.apache.axis.encoding.ser.BeanDeserializer.onStartChild took 0.6 ms. -<org.apache.axis.encoding.DeserializationContext.startElement took 1.0 ms. -*org.apache.axis.encoding.DeserializationContext.startElement took 0.6 ms. -*org.apache.axis.encoding.DeserializationContext.endElement took 0.5 ms. ->org.apache.axis.encoding.DeserializationContext.startElement - *org.apache.axis.encoding.ser.BeanDeserializer.onStartChild took 0.5 ms. -<org.apache.axis.encoding.DeserializationContext.startElement took 0.9 ms. ->org.apache.axis.encoding.DeserializationContext.startElement - >org.apache.axis.encoding.ser.BeanDeserializer.onStartChild - -*org.apache.axis.encoding.DeserializationContext.getTypeFromAttributes took 1.3 ms. - <org.apache.axis.encoding.ser.BeanDeserializer.onStartChild took 1.9 ms. -<org.apache.axis.encoding.DeserializationContext.startElement took 2.5 ms. -*org.apache.axis.encoding.DeserializationContext.endElement took 0.6 ms. ->org.apache.axis.encoding.DeserializationContext.startElement - *org.apache.axis.encoding.ser.BeanDeserializer.onStartChild took 0.4 ms. -<org.apache.axis.encoding.DeserializationContext.startElement took 0.8 ms. ->org.apache.axis.encoding.DeserializationContext.startElement - >org.apache.axis.encoding.ser.BeanDeserializer.startElement - ->org.apache.axis.encoding.DeserializerImpl.startElement - - *org.apache.axis.encoding.ser.BeanDeserializer.onStartElement took 121.8 ms. - -<org.apache.axis.encoding.DeserializerImpl.startElement took 123.3 ms. - <org.apache.axis.encoding.ser.BeanDeserializer.startElement took 123.3 ms. -<org.apache.axis.encoding.DeserializationContext.startElement took 123.9 ms. ->org.apache.axis.encoding.DeserializationContext.endElement - *org.apache.axis.encoding.DeserializerImpl.endElement took 0.5 ms. -<org.apache.axis.encoding.DeserializationContext.endElement took 0.8 ms. ... |
Listing 10 demonstrates the limits of what I can usefully do with my performance trace aspect. It's great for quickly getting a perspective on the major time sinks in an application's execution, but may not be very useful in actually isolating which code is running slow and can be improved. The latter issue is one that conventional execution profilers address very well, though, so all is not lost. Both performance tracing and execution profilers can be useful tools in analyzing and improving application performance.
This column shows how trace-type aspects can supply useful information about the execution of a program. Can they replace logging statements embedded in the program code? Not completely, in my view, because logging is often used to record specific information of interest to the developer or user that can only be accessed inside the logic of a method (where pointcuts cannot be defined). This information may not even be generated if logging is disabled, for performance reasons. The Axis code I analyzed in this column contains many logging statements providing this type of specific information.
Other types of logging statements consist of simple method entry and exit information. It'd be easy to replace these with an external aspect that could be activated at runtime. This aspect could be made to use the same logging framework as the embedded logging code so that the merged output gave a combined view of the program execution. I'm not going to write an aspect of this type, but hopefully these columns have made it clear how you can use this approach in your application.
Next month I'll look at the ASM bytecode manipulation framework. ASM is an alternative to the Javassist and BCEL frameworks I covered in my earlier articles on classworking. It makes some impressive performance claims, so I'll look into both performance and ease of use, rewriting one of the same examples I used for Javassist and BCEL to instead work with ASM.
| Name | Size | Download method |
|---|---|---|
| j-cwt04055-code.zip | 9 KB | HTTP |
Information about download methods
- Be sure to read all the installments in the Classworking toolkit series by Dennis Sosnoski.
- For more information about AspectWerkz, go straight to the project home page for comprehensive
documentation and links to articles all around the Web.
- Learn about the AspectJ aspect-oriented extension to the Java language in "Improve
modularity with aspect-oriented programming" by Nicholas Lesiecki
(developerWorks, January 2002).
- Find out how AOP can add teeth to binding agreements between components in
"Contract
enforcement with AOP" by Filippo Diotalevi (developerWorks, July 2004).
- Are your components suffering from a lack of flexibility? Then Andrew Glover's "AOP banishes
the tight-coupling blues" (developerWorks, February 2004) may be just
what you need to help shake out the kinks.
- If you're looking for help in understanding and maintaining existing systems,
check out "Use AOP
to maintain legacy Java applications" by Abhijit Belapurkar (developerWorks,
March 2004) to see how AOP can help.
- "A
look at aspect-oriented programming" by Gary Pollice (developerWorks, January
2004) looks beyond the details of AOP to consider its impact on the software
development process and the acceptance barriers faced by AOP.
- If you're already knowledgeable on AOP and want to move beyond the basics, check out the AOP@Work series, penned by six AOP experts.
- The Apache Axis framework provides
a widely used open source implementation of the JAX-RPC Java Web services
standard.
- Check out the author's Web Service Performance
comparison to see how different toolkits and styles perform, both relative to
other Web services toolkits and to Java RMI.
- Read the entire Java
programming dynamics series by author Dennis Sosnoski, which takes you on a tour
of the Java class structure, reflection, and classworking.
- To learn more about Java technology, visit the
developerWorks Java zone. You'll find technical documentation, how-to articles,
education, downloads, product information, and more.
- Visit the New to Java technology site for the latest resources to help you get started with Java programming.
- Get involved in the developerWorks community by participating in
developerWorks blogs.
- Browse for books on these and other technical topics.

Dennis Sosnoski is the founder and lead consultant of Seattle-area Java technology consulting company Sosnoski Software Solutions, Inc., specialists in XML and Web services training and consulting. His professional software development experience spans over 30 years, with the last several years focused on server-side XML and Java technologies. Dennis is a frequent speaker at conferences nationwide. He's also the lead developer of the open source JiBX XML Data Binding framework built around the Java classworking technology.
Comments (Undergoing maintenance)





