Skip to main content

Automation for the people: Continual refactoring

Using static analysis tools to identify code smells

Paul Duvall, CTO, Stelligent Incorporated
Paul Duvall
Paul Duvall is the CTO of Stelligent Incorporated, an Agile consultancy that helps development teams deliver production-ready software. He is the co-author of the Addison-Wesley Signature Series book Continuous Integration: Improving Software Quality and Reducing Risk (Addison-Wesley Professional, 2007; Jolt Award 2008 winner). He also contributed to the UML 2 Toolkit (Wiley, 2003) and the No Fluff Just Stuff Anthology (Pragmatic Programmers, 2007).

Summary:  Refactoring is a well-accepted practice for improving existing code. Yet, how do you find the code that should be refactored, in a consistent and repeatable manner? This installment of Automation for the people explains how to use static analysis tools to identify code smells to refactor, with examples showing how to improve odiferous code.

View more content in this series

Date:  08 Jul 2008
Level:  Intermediate PDF:  A4 and Letter (106KB)Get Adobe® Reader®
Activity:  8130 views
Comments:  

Over the years, I've seen lots of source code from many projects, ranging from elegant designs to code that appeared to be bundled together with duct tape. I've written new code and maintained other developers' source code. I'd rather write new code, but I enjoy taking some existing code and reducing complexity in a method or extracting duplicate code into a common class. Earlier in my career, many believed that if you weren't writing new code, you weren't being productive. Fortunately, in the late 1990s, Martin Fowler's book Refactoring (see Resources) helped make the practice of improving existing code without changing external behavior — well — cool.

Much of what I cover in this series is about efficiency: how to reduce the redundancy of time-consuming processes and perform them more quickly. I'll do the same for the tasks I cover in this article and discuss how to perform them more effectively.

About this series

As developers, we work to automate processes for end-users; yet, many of us overlook opportunities to automate our own development processes. To that end, Automation for the people is a series of articles dedicated to exploring the practical uses of automating software development processes and teaching you when and how to apply automation successfully.

A typical approach to refactoring is to make small changes to existing code while you're introducing new code or changing a method. The challenges of this technique are that it's often inconsistently applied among developers on a team, and that it can be easy to miss opportunities for refactoring. This is why I advocate using static analysis tools to identify coding violations. With them, you gain visibility into the code base as a whole, and at the class or method level. Fortunately, in Java™ programming, you can choose from an abundance of freely available open source static analysis tools: CheckStyle, PMD, FindBugs, JavaNCSS, JDepend, and many others.

In this article, you'll learn how to:

  • Reduce conditional complexity code smells by measuring cyclomatic complexity using CheckStyle and providing refactorings such as Replace Conditional with Polymorphism

  • Remove duplicated code code smells by assessing code duplication using CheckStyle and providing refactorings such as Pull Up Method

  • Thin large class code smells by counting source lines of code using PMD (or JavaNCSS) and providing refactorings such as Extract Method

  • Wipe out too many imports code smells by determining a class's efferent coupling using CheckStyle (or JDepend) and providing refactorings such as Move Method

I'll use the following general format as we examine each code smell:

  1. Describe the smell that can indicate a problem in the code
  2. Define the measure that can find that smell
  3. Show the tool that measures the code smell
  4. Present the refactoring and, in some cases, the patterns to fix the code smell

In essence, this approach provides a framework for finding and fixing code problems throughout the code base. And you'll be better informed about the riskier parts of the code base prior to making changes. Better yet, I'll show you how you integrate this approach into automated builds.

Does your code smell?

A code smell is simply a hint that something could be wrong. Like patterns, code smells provide a common vocabulary that you can use to identify these types of potential issues quickly. An actual demonstration of code smells in an article is challenging because it could take so many lines of code that it would lengthen the article excessively. So I'll illustrate some smells and let you extrapolate the rest based on your experiences in viewing particularly smelly code.

Conditional complexity

Smell: Conditional complexity

Measure: Cyclomatic complexity

Tool: CheckStyle, JavaNCSS, and PMD

Refactorings: Replace Conditional with Polymorphism, Extract Method

Smell

Conditional complexity can manifest in several different ways in source code. One example of this code smell is to have multiple conditional statements such as if, while, or for statements. Another type of conditional complexity can be in the form of switch statements, as demonstrated in Listing 1:


Listing 1. Using a switch statement to perform conditional behavior
...
switch (beerType) {
  case LAGER:  
    System.out.println("Ingredients are..."); 
	...
	break;
  case BROWNALE:
    System.out.println("Ingredients are..."); 
	...
	break;
  case PORTER  
    System.out.println("Ingredients are..."); 
	...
	break;
  case STOUT:
    System.out.println("Ingredients are..."); 
	...
	break;
  case PALELAGER:
    System.out.println("Ingredients are..."); 
	...
	break;
  ...
  default:
    System.out.println("INVALID."); 
	...
	break;
}
...

switch statements themselves are not a bad thing. But when each statement includes too many choices and too much code, it could be an indication of code that needs to refactored.

Measure

To determine the conditional-complexity code smell, you determine a method's cyclomatic complexity. Cyclomatic complexity is a measure that dates back to 1975 as defined by Thomas McCabe. The Cyclomatic Complexity Number (CCN) measures the number of unique paths in a method. Each method starts with a CCN of 1 regardless of how many paths are in the method. Conditional constructs such as if, switch, while, and for statements are each given a value of 1, along with exception paths. The overall CCN for a method is an indication of its complexity. Many consider a CCN of 10 or more to indicate an overly complex method.

Tool

CheckStyle, JavaNCSS, and PMD are open source tools that measure cyclomatic complexity. Listing 2 shows a snippet of a CheckStyle rules file defined in XML. The CyclomaticComplexity module defines the maximum CCN for a method.


Listing 2. Configuring CheckStyle to find methods with a cyclomatic complexity of 10 or greater
<module name="CyclomaticComplexity">
  <property name="max" value="10"/>
</module>

Using the CheckStyle rules file from Listing 2, the Gant example in Listing 3 demonstrates how to run CheckStyle as part of an automated build. Gant is an automated build tool that provides an expressive programming language with support for build dependencies. Developers write Gant scripts using the power of the Groovy programming language. Because Gant provides full access to Ant's API, anything you can run in Ant can be run from a Gant script. (See the "Build software with Gant" tutorial to learn about Gant.)


Listing 3. Using a Gant script to execute CheckStyle checks
target(findHighCcn:"Finds method with a high cyclomatic complexity number"){
  Ant.mkdir(dir:"target/reports")
  Ant.taskdef(name:"checkstyle", 
    classname:"com.puppycrawl.tools.checkstyle.CheckStyleTask",
    classpathref:"build.classpath")
  Ant.checkstyle(shortFilenames:"true", config:"config/checkstyle/cs_checks.xml", 
    failOnViolation:"false", failureProperty:"checks.failed", classpathref:"libdir") {
    formatter(type:"xml", tofile:"target/reports/checkstyle_report.xml")
    formatter(type:"html", tofile:"target/reports/checkstyle_report.html")
    fileset(dir:"src"){
      include(name:"**/*.java")
    }
  }
}  

The Gant script in Listing 3 creates the CheckStyle report shown in Figure 1. The bottom of the figure indicates the CheckStyle cyclomatic-complexity violation for a method.


Figure 1. CheckStyle report indicating a method failure based on a high CCN
CheckStyle report indicating a method failure based on a high CCN

Refactoring

Figure 2 is a UML representation of the Replace Conditional with Polymorphism refactoring:


Figure 2. Replacing conditional statements with polymorphism
Replacing Conditionals with Polymorphism

See the full figure here.

In Figure 2, I:

  1. Create a Java interface called BeerType
  2. Define a generic showIngredients() method
  3. Create implementing classes for each of the BeerTypes

For brevity, I provided one method implementation per class. Ostensibly, you would have more than one method when creating an interface. Refactorings such as Replace Conditional with Polymorphism and Extract Method (which I'll explain later in the article) help make code much easier to maintain.


Duplicated code

Smell: Duplicated code

Measure: Code duplication

Tool: CheckStyle, PMD

Refactorings: Extract Method, Pull Up Method, Form Template Method, Substitute Algorithm

Smell

Duplicate code can creep into a code base unbeknownst to anyone. Sometimes, it's easier to copy and paste some code than to generalize the behavior into another class. One problem with copy-and-paste is that it forces an awareness of multiple copies of the code and the need to maintain them. A more insidious problem occurs when slight variations to the copied code cause inconsistent behavior, depending on which method is executing the behavior. Listing 4 is an example of code that closes a database connection, with the same code present in two methods:


Listing 4. Duplicate code
public Collection findAllStates(String sql) {
...
  try {
    if (resultSet != null) {
      resultSet.close();
    }
    if (stmt != null) {
      stmt.close();
    }
    if (conn != null) {
     conn.close();
   } 
   catch (SQLException se) {
     throw new RuntimeException(se);
   }
 }
...
}
...
public int create(String sql, Beer beer) {
...
  try {
    if (resultSet != null) {
      resultSet.close();
    }
    if (stmt != null) {
      stmt.close();
    }
    if (conn != null) {
     conn.close();
   } 
   catch (SQLException se) {
     throw new RuntimeException(se);
   }
 }
...
}

I've got a better IDE

Although I've used Gant in this article's examples to run tools for finding certain smells, you can also use your IDE to uncover these problems. The Eclipse IDE has plug-ins for many static analysis tools. But I still recommend that you use an automated build tool so that you can run integration builds in other environments without the use of an IDE.

Measure

The measure for finding duplicated code is to search for code duplication within classes and among other classes in the code base. Duplication among classes is more difficult to assess without the help of a tool. Because of the slight changes that copied code can often undergo, it's important to measure code that is not simply copied verbatim, but also code that is similar.

Tool

PMD's Copy/Paste Detector (CPD) and CheckStyle are two of the open source tools available for finding similar code throughout a Java code base. The CheckStyle configuration file example in Listing 5 demonstrates the use of the StrictDuplicateCode module:


Listing 5. Using CheckStyle to find at least 10 lines of duplicate code
<module name="StrictDuplicateCode">
  <property name="min" value="10"/>
</module>

The min property in Listing 5 sets the minimum number of duplicate lines that CheckStyle will flag for review. In this case, it will only indicate blocks of code that have at least 10 lines that are similar or have been duplicated.

Figure 3 shows the results of the module settings from Listing 5 after the automated build runs:


Figure 3. CheckStyle report indicating excessive code duplication
CheckStyle report indicating excessive code duplication

Refactoring

In Listing 6, I take the duplicated code in Listing 4 and reduce the duplication by using the Pull Up Method refactoring — extracting behavior from a larger method into an abstract class method:


Listing 6. Pull Up Method
...
} finally {
  closeDbConnection(rs, stmt, conn);
}
...

Don't forget to write tests

Whenever you change existing code, you must write corresponding automated tests using a framework such as JUnit. There is a risk in modifying existing code; one way to minimize it is to verify through testing that the behavior works now and in the future.

Duplicated code will happen. I would never recommend to a team to set the unrealistic goal of having no duplication whatsoever. However, an appropriate goal is to ensure that duplicate code within the code base does not increase. By using a static analysis tool such as PMD's CPD or CheckStyle, you can determine areas of high code duplication on a continual basis as part of running an automated build.


Long method (and large class)

Smell: Long method (and large class)

Measure: Source lines of code (SLOC)

Tool: PMD, JavaNCSS, CheckStyle

Refactorings: Extract Method, Replace Temp with Query, Introduce Parameter Object, Preserve Whole Object, Replace Method with Method Object

Smell

A general rule of thumb I try to adhere to is to keep my methods to 20 lines of code or fewer. Of course, there may be exceptions to this rule, but if I've got methods over 20 lines, I want to know about it. Often, there's a correlation between long methods and conditional complexity. And there's a connection between large classes and long methods. I'd love to show the example of the 2,200-line method I found on a project I had to maintain. I printed an entire class containing 25,000 lines of code and asked my colleagues to find the bugs. Let's just say I made my point as I rolled the printed code down the hallway.

The highlighted section in Listing 7 shows a small portion of an example of a long method code smell:


Listing 7. Long method code smell
public void saveLedgerInformation() {
...
try {
  if (ledger.getId() != null && filename == null) {
    getLedgerService().saveLedger(ledger);
  } else {
    accessFiles().decompressFiles(files, filenames);
  } 
  if (!files.get(0).equals(upload)) {
    upload = files.get(0);
    filename = filenames.get(0);
  }
  if (invalidFiles.isUnsupported(filename)) {
    setError(fileName, message.getMessage());
  } else {
  LedgerFile entryFile = accessFiles().add(upload, filename);
  if (fileType != null && FileType.valueOf(fileType) != null) {
    entryFile.setFileType(FileType.valueOf(fileType));
  }
  getFileManagementService().saveLedger(ledger, entryFile);
  if (!FileStatus.OPENED.equals(entryFile.getFileStatus())) {
    getFileManagementService().importLedgerDetails(ledger);
  }
  if (uncompressedFiles.size() > 1) {
    Helper.saveMessage(getText("ledger.file"));
  }
  
  if (user.getLastName() != null) {
    SearchInfo searchInfo = ServiceLocator.getSearchInfo();
    searchInfo.setLedgerInfo(null);
    isValid = false;
    setDefaultValues();
    resetSearchInfo();
    if (searchInfoValid && ledger != null) {
      isValid = true;
    }
  }
  
} catch (InvalidDataFileException e) {
  ResultType result = e.getResultType();
  for (ValidationMessage message : result.getMessages()) {
    setError(fileName, message.getMessage());
  }
  ledger.setEntryFile(null);


} ...

Measure

The SLOC measure has been misused in years past as an indication of productivity. We all know, though, that it's not necessarily true that the more lines you write, the better. However, SLOC can be a useful measure when it comes to complexity. The more lines in a method (or class), the more likely it will be difficult to maintain the code in the future.

Tool

The script in Listing 8 finds SLOC measures for long methods (and, optionally, large classes):


Listing 8. Gant script to identify large classes and methods
target(findLongMethods:"runs static code analysis"){
 Ant.mkdir(dir:"target/reports")
 Ant.taskdef(name:"pmd", classname:"net.sourceforge.pmd.ant.PMDTask",
   classpathref:"build.classpath")
 Ant.pmd(shortFilenames:"true"){
   codeSizeRules.each{ rfile ->
    ruleset(rfile)
   }
   formatter(type:"xml", tofile:"target/reports/pmd_report.xml")
   formatter(type:"html", tofile:"target/reports/pmd_report.html")
   fileset(dir:"src"){
     include(name:"**/*.java")
   }
  }  
}

Once again, I'm using Gant to access the Ant API to run Ant tasks. In Listing 8, I'm calling the PMD static analysis tool to search for long methods in the code base. PMD (along with JavaNCSS and CheckStyle) can also be used to find long methods, large classes, and many other code smells.

Refactoring

Listing 9 shows an example of the Extract Method refactoring to reduce the long method smell in Listing 7. After extracting behavior from the method in Listing 7 to the code in Listing 9, I can call the newly created isUserValid() method from the saveLedgerInformation() method in Listing 7:


Listing 9. Extract Method refactoring
private boolean isUserValid(User user) {
  boolean isValid = false;
  if (user.getLastName() != null) {
    SearchInfo searchInfo = ServiceLocator.getSearchInfo();
    searchInfo.setLedgerInfo(null);
    setDefaultValues();
    resetSearchInfo();
    if (searchInfoValid && ledger != null) {
      isValid = true;
    }
  }
  return isValid;
}

Often, long methods and large classes are indications of other code smells, such as conditional complexity and duplicated code. By finding these methods and classes, you can fix other problems.


Too many imports

Smell: Too many imports

Measure: Efferent coupling (fan-out per class)

Tool: CheckStyle

Refactorings: Move Method, Extract Class

Smell

Too many imports is an indication that a class relies on too many other classes. You'll notice you've got this code smell when a change to one class necessitates making changes to many other classes, because they are so tightly coupled to the class you are changing. The multiple imports in Listing 10 are an example:


Listing 10. Multiple imports in a class
import com.integratebutton.search.SiteQuery;
import com.integratebutton.search.OptionsQuery;
import com.integratebutton.search.UserQuery;
import com.integratebutton.search.VisitsQuery;
import com.integratebutton.search.SiteQuery;
import com.integratebutton.search.DateQuery;
import com.integratebutton.search.EvaluationQuery;
import com.integratebutton.search.RangeQuery
import com.integratebutton.search.BuildingQuery;
import com.integratebutton.search.IPQuery;
import com.integratebutton.search.SiteDTO;
import com.integratebutton.search.UrlParams;
import com.integratebutton.search.SiteUtil;

import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Collection;
import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.log4j.Logger;
...

Measure

A way to find a class with too much responsibility is through the efferent coupling measure, also referred to as fan out complexity. Fan out complexity assigns a 1 to every class the analyzed class is dependent on.

Tool

Listing 11 shows an example of using CheckStyle to set the maximum number of fan out complexity:


Listing 11. Setting a maximum fan out complexity with CheckStyle
<module name="ClassFanOutComplexity">
  <property name="max" value="10"/>
</module>

Refactoring to patterns

The factory method pattern is one of many design patterns that you can implement when applying a refactoring. The factory method is an approach to creating a class without needing to define explicitly the actual class you are creating. This pattern is one technique for making it easier to adhere to interfaces rather than an implementing class. You can apply other design patterns when implementing refactorings based on code smells; see Resources for the link to a book that focuses on this concept.

Refactoring

There are several ways to fix the tight coupling that can occur as a result of too many imports. For code such as Listing 10, these refactorings include the Move Method refactoring: I move methods from the individual *Query classes into a Java interface and define common methods that all Query classes must implement. Then, I use the factory method pattern so that the coupling is with the interface.

By using a Gant automated build script to execute the CheckStyle Ant task, I can search my code base for classes that rely on too many other classes. When I modify code in these classes, I can implement certain refactorings such as Move Method along with certain design patterns to improve the ease of maintenance incrementally.


Refactor ... early and often

Continuous Integration (CI) is the practice of integrating changes often. As it's typically practiced, an automated CI server, run from a separate machine, triggers an automated build with every change applied to a project's version-control repository. To ensure that the build scripts from Listing 3 and Listing 8 are run consistently and with every change to a code base, you'll want to configure a CI server such as Hudson (see Resources). Hudson is distributed as a WAR file that you can drop into any Java Web container.

Because the examples in Listing 3 and Listing 8 use Gant, I'll walk through the steps for configuring the Hudson CI server to run Gant scripts:

  1. From the Hudson dashboard, install the Gant plug-in for Hudson by selecting Manage Hudson, followed by Manage Plugins, and then select the Available tab. From this tab, select the Gant plug-in check box and click the Install button.

  2. Restart the Web container (for example, Tomcat).

  3. Select Manage Hudson, then Configure System. From the Gant installation section, enter a unique name and the location of the Groovy installation on the machine where Hudson is running. Save the changes.

  4. Go back to the Dashboard (by selecting the Hudson link) and select an existing Hudson Job, select Configure, click the Add build step button, and select the Invoke Gant script option.

With Hudson configured to run the automated build scripts written in Gant, you can get rapid feedback on measures that are related to code smells such as long method and conditional complexity as soon as they are introduced into the code base.


Other smells and refactorings

Not all smells have correlating measures. However, static analysis tools can uncover other smells than the ones I've demonstrated. Table 1 lists examples of other code smells, tools, and possible refactorings:


Table 1. Other smells and refactorings
SmellTool(s)Refactoring
Dead codePMDRemove Code
Temporary fieldPMDInline Temp
Inconsistent/uncommunicative namesCheckStyle, PMDRename Method, Rename Field
Long parameter listPMDReplace Parameter with Method, Preserve Whole Object, Introduce Parameter Object

This article provides a pattern of correlating a code smell to a measure that can be configured to be flagged from an automated static analysis tool. You can apply refactorings with or without the use of certain design patterns. This gives you a framework for consistently finding and fixing code smells in a repeatable manner. I'm confident that the examples in this article will help you use static analysis tools to find other types of code smells than the ones I've demonstrated.


Resources

Learn

Get products and technologies

  • Gant: Download Gant and start building software in a predictable and repeatable manner.

  • CheckStyle: Download CheckStyle to gather metrics and better assess certain code smells.

  • PMD: Download PMD to gather metrics and better assess certain code smells.

  • Hudson: A freely available open source server for Continuous Integration.

Discuss

About the author

Paul Duvall

Paul Duvall is the CTO of Stelligent Incorporated, an Agile consultancy that helps development teams deliver production-ready software. He is the co-author of the Addison-Wesley Signature Series book Continuous Integration: Improving Software Quality and Reducing Risk (Addison-Wesley Professional, 2007; Jolt Award 2008 winner). He also contributed to the UML 2 Toolkit (Wiley, 2003) and the No Fluff Just Stuff Anthology (Pragmatic Programmers, 2007).

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Open source
ArticleID=318606
ArticleTitle=Automation for the people: Continual refactoring
publish-date=07082008
author1-email=paul.duvall@stelligent.com
author1-email-cc=jaloi@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers