Skip to main content

In pursuit of code quality: Tame the chatterbox

Tools and metrics for measuring code verbosity

Andrew Glover (aglover@stelligent.com), President, Stelligent Incorporated
Andrew Glover is the President of Stelligent Incorporated, which helps companies address software quality with effective developer testing strategies and continuous integration techniques that enable teams to monitor code quality early and often. He is the co-author of Java Testing Patterns (Wiley, September 2004).

Summary:  Just seeing a sprawling code block from a distance gives some developers the willies -- and it should! Loquacious code is often the hallmark of complexity, which results in code that is hard to test and maintain. This month, learn three important ways to measure code complexity, based on method length, class length, and intra-class coupling. In this installment of In pursuit of code quality, quality expert Andrew Glover starts out with tips for eyeballing code excess, then shows you how to use tools like PMD and JavaNCSS for more precision when you need it.

View more content in this series

Date:  30 Jun 2006
Level:  Intermediate
Activity:  4472 views
Comments:  

I'm not ashamed to admit that my gut reaction to seeing a block of complex code is fear and trembling. In fact, I'll go so far as to say that you should tremble a little upon encountering extensive methods and sprawling classes. Casting about for an exit sign in these moments is not only perfectly human, it shows good developer instinct. Overly complex code is hard to test and maintain, which means it usually has a higher incidence of defects.

I've explained in previous articles in this series that cyclomatic complexity tends to be one of the harbingers of sticky code. Testing methods with high cyclomatic complexity values is almost always like stepping into a maze -- no easy exit in sight. Last month, I showed you how to use the Extract Method pattern to refactor your way out of the maze. Pushing the method's complexity into smaller chunks makes the code easier to test and maintain, as shown in Figure 1:


Figure 1. Reducing complexity makes code easier to maintain and test

Cyclomatic complexity isn't the only complexity measurement for spotting high risk code, however. You can also look out for class length, method length, and intra-class coupling. These measurements are intricately related and also easy to spot. This month, I'll explain why they're important and show you how to track them using PMD and JavaNCSS.

Too much code!

It's important to know the difference between simple code and easy code. Simple code isn't necessarily simplistic or easy to write, it's just easy to understand. You can write simple code in C++ just as easily as you can write it in Visual Basic. The quickest way to de-simplify your code in any language, however, is to write too much of it at once.

Improve your code quality

Don't miss Andrew's accompanying discussion forum for answers to your most pressing questions.

Think about how this rule applies to methods and classes. Most of us have trouble memorizing credit card numbers for the simple reason that we can only manage about seven pieces of data (plus or minus two) at once. Knowing this, it makes sense that excessive conditionals are challenging to follow, and hence, hard to test and maintain. This same principle applies to blocks of logic.

Any given body of code usually includes grouped statements that work toward the same goal, such as creating a collection and adding items to it. Grouping numerous blocks of logic in one long method can very quickly cloud the overall intention of the method, however, because few people can effectively handle such a large dataset. It is precisely this weakness that creates maintenance problems in a code base. Behemoth methods are havens for defects because very few people can effectively parse them. So long methods not only do too much work, but they require too much work to understand!

Just as long methods tend to confound developers, so do lengthy classes. The same argument applies to code in the aggregate -- long-winded classes are probably doing too much work and have too much responsibility.

What's too much?

What constitutes a long method or class is, of course, somewhat subjective. For a helpful rule of thumb, you could say that a method with more than 100 non-commenting lines of code is way too long. The actual number varies according to whom you talk to, however. For me, the cutoff point is more like 50 lines of code, whereas some developers would say that a method is too long if it requires you to scroll down to see its entire body. It's up to you to define your own cutoff point.

Similarly, you have to use your own good judgment to determine the correct size of a class. A rule of thumb advocated by many is that a class with more than 1,000 lines of code is too big. 500 lines of code is an easier number for others to stomach.


Intra-class coupling

The pattern of complexity increasing repeats itself when it comes to the relationships one object has with other objects. Not only is it somewhat difficult to understand a class that imports numerous outside dependencies or has a lot of public methods, but the resulting increased burden of responsibility leads to brittleness.

I'll start with dependencies. If an object imports more than 80 outside classes, excluding the normal Java™ system libraries, then that class is said to have a high degree of efferent coupling, which means that changes to the imported classes could affect the class itself. In the worst case, if the imports are concrete classes and their behavior changes, then the class doing the importing could break! (See Resources for more on efferent coupling.)

Watching the number of object imports is good for predicting brittleness, but it can be misleading if entire packages have been imported with the .* notation (for example, com.acme.user.*). For more precision, you can pay attention to the number of unique types an object possesses (which is done by parsing the code -- not import statements). The unique types metric can also be helpful if an application's package structure is coarsely laid out in such a way that includes many classes in few packages.

Classes that contain many public methods tend to also have a lot of imports. Such classes usually become central to a code base either as Facades or utility classes. Because of this responsibility (exposed through a large number of public methods), they're said to have high afferent coupling, which also results in a reversed brittleness. If any of these classes change, various, seemingly non-related parts of the application could break.


How complexity correlates

By now, a pattern has emerged suggesting that gluttonous code (long methods, too many public methods, excessive conditionals and imports, etc.) impairs readability, testability, and maintainability. Because this pattern repeats itself in various metrics, they all tend to correlate. For example, long methods generally suffer from high cyclomatic complexity values, as shown in Figure 2:


Figure 2. Long methods correlate to cyclomatic complexity

The correlation doesn't stop there, however. Classes with a plethora of imports have many unique types. These classes are typically pretty big. Big classes usually have long methods, and long methods often have high cyclomatic complexity values. Figure 3 shows how complexity metrics correlate:


Figure 3. How complexity metrics correlate


PMD and JavaNCSS

Spotting garrulous code is easy in PMD and (to a lesser extent) JavaNCSS, and both tools are easily incorporated into build platforms such as Ant and Maven.

You can think of PMD as a rules-based engine that analyzes source code and reports any instance of a rule being violated. PMD currently defines close to 200 rules with specific ones for method length, class length, unique types, and counting public methods. You can also define custom rules and modify existing rules (for example, to reflect domain needs).

Customizing PMD

For an example, I'll use PMD's aptly named ExcessiveMethodLength rule for finding long methods. This rule's default length threshold is 100 (meaning that if a scanned method has a length greater than 100 lines, PMD reports a rule violation), but you can lower that threshold if you like.

PMD rules can define properties, which, through excellent foresight on the part of the PMD development team, you can override at run time by using ruleset files. To lower the default value of 100 for the ExcessiveMethodLength rule to 50, you would simply add a properties element to the rule definition and reference the property's name. In Listing 1, I've added a property named minimum to the PMD rule definition:


Listing 1. Customizing the ExcessiveMethodLength rule
<rule ref="rulesets/codesize.xml/ExcessiveMethodLength">
 <properties>
  <property name="minimum" value="50"/>
 </properties>
</rule> 

Invoking PMD with a custom ruleset file in Ant requires providing a path to the custom file through the rulesetfiles attribute of the PMD task as shown in in Listing 2:


Listing 2. Referencing the custom ruleset file
<pmd rulesetfiles="./tools/pmd/rules-pmd.xml">
 <formatter type="xml" toFile="${defaulttargetdir}/pmd_report.xml"/>
 <formatter type="html" toFile="${defaulttargetdir}/pmd_report.html"/>
 <fileset dir="./src/java">
  <include name="**/*.java"/>
 </fileset>
</pmd> 

PMD reports violations by source file, as you can see in Figure 4. In this case, a few methods were found to have greater than 50 source lines of code:


Figure 4. Sample PMD Ant report

For long classes, PMD has the ExcessiveClassLength rule, which defaults to 1,000 lines of code. As with the ExcessiveMethodLength rule, it's easy to override the default value with a more apt value. Additionally, PMD has a rule for counting unique types, dubbed the CouplingBetweenObjects rule. For counting imports, check out the ExcessiveImports rule. Both rules are configurable.

Measuring verbosity with JavaNCSS

As opposed to PMD, which defines specific rules for analyzing source code, JavaNCSS analyzes a code base and reports everything relating to the code length, including class sizes, method sizes, and the number of methods found in a class. With JavaNCSS, thresholds don't matter -- it counts every file it finds and reports the values regardless of size. While this kind of data may seem pedestrian (and perhaps verbose!) compared to PMD, it couldn't be farther from the truth.

By reporting all file sizes, JavaNCSS makes it possible to understand relative values, which is often difficult with PMD. For example, PMD only reports files that contain violations, which means only understanding the data for a portion of the code base, while JavaNCSS provides code-length data in context, as shown in Figure 5:


Figure 5. Sample JavaNCSS Ant report


In conclusion

Greenfield development, where a development team starts out with a blank IDE console and fills it with beautiful, concise code is a very small slice in the life of a software application. Scores of organizations across the world today are still running applications based on COBOL, which, from a developer's perspective, means wrangling with code written long ago by someone you don't know.

While it's normal to turn nauseous when faced with such a beast, you can only call in sick so many days in a row. At some point, you're going to have to face that massive code block and call it your own. Using complexity metrics for class length, method length, and intra-class coupling (that is, object imports and unique types) is the first step toward understanding what you're up against. Start with some rules of thumb regarding class and method size and then use tools like PMD and JavaNCSS to hone in on the details.

You'll learn a tremendous amount the first time you use complexity metrics on your legacy code base, but don't stop there. By continuously monitoring complexity metrics, you can make smarter decisions and lower your risk while extending and maintaining your code base over time.


Resources

Learn

Get products and technologies

  • PMD: Scans Java code for problems.

  • JavaNCSS: A simple utility that measures source code metrics.

Discuss

About the author

Andrew Glover

Andrew Glover is the President of Stelligent Incorporated, which helps companies address software quality with effective developer testing strategies and continuous integration techniques that enable teams to monitor code quality early and often. He is the co-author of Java Testing Patterns (Wiley, September 2004).

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=144337
ArticleTitle=In pursuit of code quality: Tame the chatterbox
publish-date=06302006
author1-email=aglover@stelligent.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers