Quite often when I write a Java application I'll get a feeling that it's slower than I expected completing a particular operation. In the past I've usually resort to code inspection and printlns to time execution of sections, using trial and error!
In this post, I'll explain to you how I used the Health Center tool to investigate a badly-performing Java app. The problem turned out to be a simple lack of foresight when I wrote one of the methods for the app.
Pre-Reqs
- I'm using an IBM Java install - Java 5 SR8 or above and Java 6 SR1 or above
- I've installed the Health Center agent into the JVM running my app, and I've got the Health Center client installed within the IBM Support Assistant. Installation instructions are available here, and a YouTube™ video can be found here.
- Whenever I start the JVM running my app, I'm using the JVM options suggested in the help documentation for Health Center to enable the agent.
Finding out what's wrong
I want to improve a simple Java app which extracts some statistics from my Twitter account. The particular feature picked out here is to track the number of followers I have, over time. I've been downloading my followers list in xml format a few times a day for quite a while and have ended up with a couple of thousand of these xml files. There are more efficient ways to count followers, but I want to do more processing on the file at the same time so this approach fits for me.
Running the application, which opens each file and scans through each line for the <screen_name> tag to find follower names, I noticed that it was very slow - took about 25 seconds to go through the couple of thousand files. Running the code and looking at the task manager, I could see that the app was using a lot of processor time, which suggested to me that the bottleneck was in processing rather than file loading. I dug out the Health Center to get an idea of what was happening.
Connecting the Health Center client to my application and looking at the Profiling view, I can see the view of hot methods:

Hot method view showing java.util.regex.Pattern.compile() as the hottest method.
From this I can see that java.util.regex.Pattern.compile() is the hottest method - over 20% of the samples had compile() as the currently running method. Given what my application is doing, there should be some regex, but I wasn't expecting it to be far and away the most expensive task. Knowing that this is at the top of my list, and I hadn't expected it to be, there are two ways I can reduce its cost:
- Reduce the amount of processing done within the compile() method
- Reduce the number of times compile() is called
The compile() method is part of java.util.regex.Pattern and I can't change its source, so I'll have to look at reducing how often it's called.
Looking at the Invocation Paths view to see which methods call Pattern.compile():

I can see that my FollowerCount.findScreenName() seems to be the culprit for most of the calls. Here's the source for that method:
private String findScreenName(String line) {
Pattern p = Pattern.compile("\\s*<screen_name>(.*)</screen_name>.*" );
Matcher m = p.matcher(line);
if (m.matches()) {
return m.group(1);
}
return null;
}
This method is being called for every line of every file we try to read. The pattern returned from the Pattern.compile() call will be the same for every invocation, so we can change the code to cache the pattern after the first time:
Pattern p = null;
private String findScreenName(String line) {
if (p == null){
Pattern p = Pattern.compile("\\s*<screen_name>(.*)</screen_name>.*" );
}
Matcher m = p.matcher(line);
if (m.matches()) {
return m.group(1);
}
return null;
}
After making this change, the time is reduced from from 25 seconds to 12 seconds. Running the Health Center again, I can see that I have a new hottest method:

I'm pretty happy with 12 seconds to read in 2000 files. It's tempting to try and optimise further but I've achieved my goal!
Dave Nice