FindBugs, Part 2: Writing custom detectors

How to write custom detectors to find application-specific problems

FindBugs is a static analysis tool that can be extended and customized to meet your team's unique requirements. In the second article of this series, Senior Software Engineer Chris Grindstaff shows you how to create application-specific bug detectors. Don't miss Part 1 of this series to get the first part of the story.

Chris Grindstaff (chris@gstaff.org), Software Engineer, IBM

Chris Grindstaff is a Senior Software Engineer at IBM in Research Triangle Park, North Carolina. Chris wrote his first program at the age of 7, when he convinced his grade school teacher that "typing" sentences would be just as onerous a punishment as writing them by hand. Chris is currently interested in a variety of open-source projects. He has worked extensively with Eclipse and authored several popular Eclipse plug-ins, which may be found on his Web site. You can contact Chris at cgrinds@us.ibm.com or chris@gstaff.org.



25 May 2004

Also available in Japanese

In the first article in this series, I showed you how to set up and execute FindBugs. Now we'll take a look at FindBugs' most powerful feature -- custom bug detectors. First I'll provide a rationale for why custom bug detectors are useful, then I'll walk you through a detailed example.

Writing custom bug detectors

Why would you want to write custom bug detectors? I was confronted with this question when I was asked to examine one team's performance problems. It became evident that the team's home-grown logging framework (like all logging frameworks) had grown over time. Originally the Loggers were called with reckless abandon. Unfortunately, as the team grew, so did the poor performance of their application because they were always creating expensive logging messages -- only to have the logging framework throw them away when it realized that logging was disabled. The standard way to handle this problem is to check first to see if logging is enabled before constructing expensive logging messages. In other words, use a guard clause like Listing 1:

Listing 1. Example of guarded logging
if(Logger.isLogging()) {
    Logger.log("perf", anObjectWithExpensiveToString + anotherExpensiveToString);
}

Don't miss the first part of this series

The first article in this series, "Improve the quality of your code," introduces FindBugs, a static analysis tool that examines your class or JAR files looking for potential problems, and shows you how to use it effectively.

The team decided this practice was the appropriate logging idiom and went about changing the existing code to reflect the new practice. You won't be surprised to hear that as the deadline approached on this very large project, there were a lot of places in the code that were missed. The team needed a better way to identify the places that had been missed. Because this article is about FindBugs, we'll use FindBugs to solve the problem.

The goal is to write a FindBugs detector that will find all the places in the code that are making calls to the logging framework without being wrapped in a guard clause.

When I wrote this detector originally, I broke things down into several discrete steps:

  1. First I wrote a test case with an unguarded log statement.
  2. Next, I browsed the FindBugs source code looking for a detector I thought was similar to the one I wanted to write.
  3. I then created the correctly packaged JAR file (using a build script) so that FindBugs knew how to load the unguarded detector.
  4. I ran the test and implemented code to make the test pass.
  5. Finally, I added more tests and continued the sequence until done.

In particular, when browsing code I examined the subtypes of BytecodeScanningDetector and ByteCodePatternDetector. The scanning detectors are more work to implement, but they are also more general in the sort of problems they can detect. Pattern detectors are a good choice when what you're trying to detect can be expressed as a sequence of bytecode patterns. A good example of this is the BCPMethodReturnCheck detector -- it looks for places where the return type of various methods are suspiciously ignored. BCPMethodReturnCheck is easily described as a sequence of patterns that looks for the invocation of certain methods followed by a POP or POP2 instruction. The vast majority of the detectors are currently written as scanning detectors, although I think that's just because the developers haven't had enough time to move many of them over to ByteCodePatternDetector.

I settled on using FindRunInvocations as an example primarily because it was one of the smallest detectors. It wasn't obvious to me how to implement the detector using a sequence of patterns.

FindBugs utilizes the Byte Code Engineering Library, or BCEL (see Resources), to implement its detectors. All bytecode scanning detectors are based on the visitor pattern, which FindBugs implements. It provides default implementations of these methods that you will override when implementing a custom detector. Check out BetterVisitor and its subclasses for more detail. For our purposes, we'll focus on only two methods -- visit(Code) and sawOpcode(int). As FindBugs analyzes a class, it will call the visit(Code) method when the contents of a method are walked. Likewise, FindBugs invokes the sawOpcode(int) method as it analyzes each opcode within the method body.

With that background, let's take a look at implementations of these methods used to build the unguarded logging detector, as shown in Listing 2:

Listing 2. Unguarded logging detector: visit() method
18      public void visit(Code code) {
19          seenGuardClauseAt = Integer.MIN_VALUE;
20          logBlockStart = 0;
21          logBlockEnd = 0;
22          super.visit(code);
23      }

When reading through the code for the out-of-the-box detectors, one of the things that will jump out is the focus on whether the detector needs to build up state during the analysis. In other words, does the detector need to remember what it sees at the method, class, hierarchy, or whole program level? For example, the Inconsistent Synchronization detector builds up state for the entire program so that it can determine when fields are accessed in an inconsistent manner in terms of synchronization. Our detector needs to maintain state only during the byte code scanning phase, because we're looking for a method-level problem.

The method-specific state that the detector accumulates can be flushed or reset in the visit(Code) method (as shown in Listing 2), because FindBugs invokes this method before scanning the bytecodes of a method. In this case, the detector has three bits of state that it's maintaining:

  • seenGuardClauseAt: the value of the program counter when the logging guard clause is discovered in the analyzed code
  • logBlockStart: the index of the beginning of the guard clause
  • logBlockEnd: the index of the instruction after the end of the guard clause

Two important points about the implementation of the visit(Code) method should be made. The first thing to notice is the call to super.visit(), which is key because the superclass' implementation of this method is responsible for visiting the contents of the method that we want to analyze. If we didn't call the superclass' implementation, then the analyzed method would never be examined.

The second point is that the accumulated state is reset before calling the superclass' implementation, which is important because those variables are used by the next method we'll look at -- the sawOpcode() method. We want to make sure that they're reset before then. Listing 3 shows the implementation of the sawOpcode() method:

Listing 3. Unguarded logging detector: sawOpcode() method
25      public void sawOpcode(int seen) {
26          if ("cbg/app/Logger".equals(classConstant) &&
27                  seen == INVOKESTATIC &&
28                  "isLogging".equals(nameConstant) && "()Z".equals(sigConstant)) {
29              seenGuardClauseAt = PC;
30              return;
31          }
32          if (seen == IFEQ && (PC >= seenGuardClauseAt + 3 && PC < seenGuardClauseAt + 7)) {
33              logBlockStart = branchFallThrough;
34              logBlockEnd = branchTarget;
35          }
36          if (seen == INVOKEVIRTUAL && "log".equals(nameConstant)) {
37              if (PC < logBlockStart || PC >= logBlockEnd) {
38                  bugReporter.reportBug(
39                          new BugInstance("CBG_UNPROTECTED_LOGGING", HIGH_PRIORITY)
40                          .addClassAndMethod(this).addSourceLine(this));
41              }
42          }
43      }

As I mentioned before, when FindBugs is analyzing a method, it will call sawOpcode() for each bytecode instruction contained in the method. This method is doing three things. In fact, the original code was refactored into three methods, but for purposes of this article, I in-lined it to reduce the amount of space. This method does three things:

  1. Determines if the static method Logger.isLogging() is called, and if so what the program counter (PC) is
  2. Determines if an if instruction follows the call made to Logger.isLogging()
  3. Looks for cases where the log() method is called outside the guard clause

Listing 4 shows each of these parts individually and in more detail:

Listing 4. Unguarded logging detector: sawOpcode(), isLogging() called
25      public void sawOpcode(int seen) {
26          if ("cbg/app/Logger".equals(classConstant) &&
27                  seen == INVOKESTATIC &&
28                  "isLogging".equals(nameConstant) && "()Z".equals(sigConstant)) {
29              seenGuardClauseAt = PC;
30              return;
31          }

The classConstant, nameConstant, and sigConstant fields are protected fields that the detector inherits from its superclass. They contain details about the current opcode. When writing your detectors, it's often useful to print the values for them. Browse the BytecodeScanningDetector hierarchy to find more useful fields and methods in the DismantleBytecode class. Another very useful tool to use when writing detectors is the perennial javap. The Java disassembler is a very handy tool for understanding the logic flow and method names for writing detectors. The general approach is to write the pattern that you're looking for (in this case write the guard clause in a Java file), save it, and then compile it. Then use javap -c to look at the disassembled bytecodes and learn how to structure your sawOpcode(int) method. For example, Listing 5 shows the output of javap after being run across the class I was using for my test cases (a method that is correctly using a logging guard clause):

Listing 5. Disassembled example of guard clause with source
public void methodWithLogging_guarded();
  Code:
   0:   invokestatic    #28; //Method cbg/app/Logger.isLogging:()Z
   3:   ifeq    18
   6:   new     #16; //class Logger
   9:   dup
   10:  invokespecial   #17; //Method cbg/app/Logger."<init>":()V
   13:  ldc     #19; //String bob
   15:  invokevirtual   #23; //Method cbg/app/Logger.log:(Ljava/lang/Object;)V
   18:  aload_0
   19:  invokespecial   #31; //Method doWork:()V
   22:  return
 
   corresponds to the Java source code
   public void methodWithLogging_guarded() {
        if (Logger.isLogging()) {
            new Logger().log("bob");
        }
        doWork();
    }

Examining the output of javap helps you understand the control flow of the method and how to form the class, signature, and name constants that you need to specify in the sawOpcode() method. For example, Listing 6 shows the first line of code from javap in Listing 5:

Listing 6. Disassembled method invocation
0:   invokestatic    #28; //Method cbg/app/Logger.isLogging:()Z

If you look closely at lines 26 through 28 of the sawOpcode() method in Listing 4, you'll see that they're describing a way to match what we see in Listing 5 from javap. javap is a useful tool when determining how to match these forms.

Once we've determined that the Logger.isLogging() method has been called, we want to save the value of the program counter, as shown in Listing 7. The program counter is needed to determine whether the if clause comes after the call to Logger.isLogging(), which takes us to the next section of code.

Listing 7. Saving the value of the program counter
32          if (seen == IFEQ && (PC >= seenGuardClauseAt + 3 && PC < seenGuardClauseAt + 7)) {
33              logBlockStart = branchFallThrough;
34              logBlockEnd = branchTarget;
35          }

This chunk of code from Listing 3 is checking that we see an if branch statement anywhere between 3 and 7 bytecodes away from the aforementioned call to Logger.isLogging(). These values were determined by looking at the output of javap and by experimenting. Experimenting, you say? That's right; you sometimes have to resort to trial and error to find the right tradeoff between false positives and useful results. Think of the process as computer engineering with heuristics instead of computer science. Once we've determined that this statement is an if(Logger.isLogging()) statement, we need to figure out the boundaries of the if's block of code. We do that by saving branchFallThrough and branchTarget. The branchFallThrough is the beginning of the if clause, and the branchTarget represents the first line outside the if clause. With this information, we're now able to proceed to the last part of the method, as shown in Listing 8:

Listing 8. Checking for calls to log()
36          if (seen == INVOKEVIRTUAL && "log".equals(nameConstant)) {
37              if (PC < logBlockStart || PC >= logBlockEnd) {
38                  bugReporter.reportBug(
39                          new BugInstance("CBG_UNPROTECTED_LOGGING", HIGH_PRIORITY)
40                          .addClassAndMethod(this).addSourceLine(this));
41              }
42          }

This block of code, again from Listing 3, looks for calls made to the Logger's log() method. When a call to the log() method is found, we check to see if the program counter is outside the if block determined earlier. If it is, we report a bug by creating a new bug instance, specifying the type of bug (which we'll discuss in more detail later) and its priority. It's also convenient to add the class, method, and source line to the bug so the user knows where to go to fix the problem.

After the code is written, you need to create a specially packaged JAR file that FindBugs will recognize as a plug-in JAR. Listing 9 shows the target for the build script I used to create the JAR file and copy it to the correct location:

Listing 9. Build script to package our FindBugs detector
<property name="FindBugs.home" value="C:\apps\FindBugs-0.7.3"></property>
<target name="build">
	<jar destfile="cbgFindbugsPlugin.jar">
		<fileset dir="bin"/>
		<fileset dir="src"/>
		<zipfileset dir="etc" includes="*.xml" prefix=""></zipfileset>
	</jar>
	<copy file="cbgFindbugsPlugin.jar" todir="${FindBugs.home}/plugin" />
</target>

This code creates a JAR file containing the source files, class files, FindBugs.xml, and messages.xml. Listings 10 and 11 shows the contents of the two XML files:

Listing 10. Contents of FindBugs.xml
<FindbugsPlugin>
  <Detector class="cbg.findBugs.FindUnprotectedLogging" speed="fast" />

  <BugPattern abbrev="CBGL" type="CBG_UNPROTECTED_LOGGING" category="PERFORMANCE" />
</FindbugsPlugin>

For each new detector, you add a Detector element and a BugPattern element to the FindBugs.xml file. The Detector element specifies the class used to implement the detector and whether it is a fast or slow detector. The speed attribute is used when viewing the detectors in the UI, as shown in Figure 1. The possible values for the speed attribute are slow, moderate, and fast.

Figure 1. Configure detectors UI
Graphical representation of Configure Detectors

The BugPattern element specifies three attributes. The abbrev attribute defines a short abbreviation for the detector. The abbreviation is used to identify a detected bug when running from the command-line client. Several related detectors may be grouped together by sharing the same abbreviation.

The type attribute is a unique identifier that is used for two purposes. When using the Ant version of FindBugs or the command-line version with the output format set to XML, you use the type attribute to identity the problem. The type attribute is also what you specify in your detector's Java code to create the correct type of bug. Notice the type listed here matches the name used on line 39 of Listing 8.

The category attribute is an enumerated type. It's one of the following:

  • CORRECTNESS: General correctness issues
  • MT_CORRECTNESS: Multithreaded correctness issues
  • MALICIOUS_CODE: A potential vulnerability if exposed to malicious code
  • PERFORMANCE: A performance issue

That's it for the FindBugs.xml file. Listing 11 shows what the messages.xml file contains:

Listing 11. Contents of messages.xml
<MessageCollection>
  <Detector class="cbg.FindBugs.FindUnprotectedLogging">
    <Details>
<![CDATA[
<p> This detector finds logs statements that aren't contained in an if-logging block. 
It is a fast detector.
]]>
    </Details>
  </Detector>

  <BugPattern type="CBG_UNPROTECTED_LOGGING">
    <ShortDescription>Found unprotected logging</ShortDescription>
    <LongDescription>Found unprotected logging in {1}</LongDescription>
    <Details>
<![CDATA[
<p> This method logs without first checking that logging is enabled; for example
... more text omitted...
]]>
    </Details>
  </BugPattern>
  
  <BugCode abbrev="CBGL">Found unprotected logging</BugCode>
</MessageCollection>

The messages.xml file consists of three elements: Detector, BugPattern, and BugCode.

The Detector's class attribute should specify the class name of the detector. The Details element contains a brief HTML description of the detector, and should therefore be contained in a CDATA section. This description is used by the UI, as shown in Figure 2:

Figure 2. FindBugs UI highlighting the unprotected logging detector
Graphical representation of Configure Detectors

The BugPattern element is similar to the BugPattern element defined in FindBugs.xml. The type attribute is required and should match the same unique identifier that you used in FindBugs.xml and in your detector's Java code. BugPattern contains three elements that affect how information about the detector is displayed in the UI: ShortDescription, LongDescription, and Details -- all of which are fairly self explanatory.

ShortDescription is used when View > Full Descriptions is toggled off in the UI. Likewise, LongDescription is used when View > Full Descriptions is toggled on. You can pass information from your bug detector's Java code into the full description by using annotations. In the description, you specify variables using {0} to denote the first annotation, {1} to denote the second annotation, and so on. At runtime, when a bug is discovered, whatever annotations you attach to the bug instance will be substituted into the description. Notice that in line 40 in Listing 8, the class and method annotations were added to the BugInstance. The class annotation is at position 0, and the method annotation is at position 1. For more details, take a look at the various add*() methods on BugInstance.

As before, the Details element should contain an HTML description within a CDATA section. Figure 2 shows an example of the details for our detector. View > Full Descriptions is toggled on.

The BugCode element is used by the UI when using the By Bug Type tab. The text of the element is what appears in the tree as the red node, as shown in Figure 2. The common detectors all share the same abbreviation, so the BugCode element must specify the abbreviation as an attribute of the element.

With the creation of these two XML files, we're now ready to package up the complete JAR. After you've built the JAR and placed it in the FIND_BUGS_HOME\plugin directory, you're ready to test out the new detector.

Application-specific bug detectors

FindBugs can be a useful tool in your arsenal. But, as with all tools, you have to know when to use it. However, static analysis tools should complement your unit/system tests and code reviews.

Besides its utility for improving the quality of your code, FindBugs has many application-specific uses that I encourage you to explore. For example, you could write a set of detectors that find newbie style problems. Or maybe you could write detectors that check that your code follows your team's guidelines. Maybe you're building a framework and need to make sure that all of the classes in a package have a zero-arg constructor or that all fields prefixed with an underscore have a getter but not a setter. Or you could write a set of detectors that verify that your J2EE code is following the appropriate restrictions, such as no Thread or Socket creation.

The team from the unguarded logging example also had problems with catching exceptions. Laudably, they did not simply ignore these exceptions; instead, they asked them to print their stack trace, which is nice when you're building and debugging your application, but not ideal when you deploy it -- especially when there may be thousands of exceptions. (Of course, if your application is throwing thousands of exceptions, you have more serious problems than big log files, but bear with me for illustrative purposes.) The team needed a detector that would find the places in their code where exceptions were caught and asked to print their stack traces. Then they could change the code to pass the exception off to their logging framework instead.

I created an interesting variant of the unguarded logging detector. This detector was used to find all the places in the code where the message to log was being created outside of the guard clause -- also a fairly common problem and one that can be quite expensive if you have fancy toStrings.


Summary

Whether you're new to FindBugs or already familiar with it, I encourage you to experiment with your own application-specific detectors. In the meantime, I hope this article has provided you with a clear example of how custom detectors can be implemented and has inspired you to apply these ideas to your team's unique situation.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=10945
ArticleTitle=FindBugs, Part 2: Writing custom detectors
publish-date=05252004