Evolutionary architecture and emergent design: Emergent design through metrics

Using metrics and visualizations to find and harvest hidden design in your code

Software metrics can help you find hidden design elements in your code, enabling them to emerge as idiomatic patterns. This installment of Evolutionary architecture and emergent design shows how intelligent use of metrics and visualizations lets you discover important code elements that are obscured by accidental complexity.

Share:

Neal Ford, Software Architect / Meme Wrangler, ThoughtWorks Inc.

Neal FordNeal Ford is a software architect and Meme Wrangler at ThoughtWorks, a global IT consultancy. He also designs and develops applications, instructional materials, magazine articles, courseware, and video/DVD presentations, and he is the author or editor of books spanning a variety of technologies, including the most recent The Productive Programmer. He focuses on designing and building large-scale enterprise applications. He is also an internationally acclaimed speaker at developer conferences worldwide. Check out his Web site.



04 March 2010 (First published 30 June 2009)

Also available in Chinese Japanese Vietnamese

One of the difficulties for emergent design lies in finding idiomatic patterns and other design elements hidden in code. Metrics and visualizations help you identify important parts of your code, allowing you to extract them as first-class design elements. The two metrics I focus on in this article are cyclomatic complexity and afferent coupling. Cyclomatic complexity is a measure of the relative complexity of one method versus another. Afferent coupling represents the count of how many other classes use the current class. You'll learn about some tools for visualizing and understanding both of these metrics and how combining metrics can help you reveal design characteristics.

About this series

This series aims to provide a fresh perspective on the often-discussed but elusive concepts of software architecture and design. Through concrete examples, Neal Ford gives you a solid grounding in the agile practices of evolutionary architecture and emergent design. By deferring important architectural and design decisions until the last responsible moment, you can prevent unnecessary complexity from undermining your software projects.

I covered cyclomatic complexity in "Test-driven design, Part 2," but it has some nuances I didn't discuss there. One complicating factor of cyclomatic-complexity measurements via Java™ tools is the unit of work. Cyclomatic complexity is a method-level measurement, but the unit of work in Java programming is the class. Consequently, cyclomatic-complexity measurements generally come as either the sum or the average of the complexity of all the methods in a class. Both measures are interesting.

For example, the following scenario is possible. Let's say a class has one massively complex method (CC = 40), but also lots of very small methods (such as the get/set method pairs common in Java code). With a tool such as JavaNCSS (see Resources) that reports this metric as the sum of all the methods, the cyclomatic complexity is a high number for the entire class. If you use a tool such as Cobertura — which reports the cyclomatic complexity as an average for the class — this class no longer looks so bad, because the slew of simple methods is amortizing the highly complex one. Because of this unit-of-work mismatch, it makes sense to look at both sum and average measures of cyclomatic complexity. If you consider them independently, noise can creep into the results. Using both numbers mitigates that possibility.

Software metrics vs. physical metrics

In software, a metric refers to applying an objective measurement to development artifacts to determine coarse-grained characteristics. Unlike a physical metric (such as a meter stick), most software metrics don't reflect some characteristic in the real world. A cyclomatic-complexity number such as 5 has no units of measurement; it tells you nothing about any physical properties of the code. The number only makes sense when compared to the cyclomatic complexity of other code.

The other metrics of interest for design are the two coupling numbers: efferent and afferent coupling. Efferent coupling measures the number of classes the current class references. It is easy to determine via simple inspection: open the class in question and count the references (in fields and parameters) to other classes. Afferent coupling is harder to determine and much more valuable. It measures how many other classes use the current class. You can use command-line fu to determine this, or one of several tools that understand this metric. One such tool is ckjm, an open source tool for running the Chidamber & Kemerer object-oriented metrics suite (see Resources). Although a bit complicated to get up and running, it provides both cyclomatic complexity (reported as the sum of the cyclomatic complexity of all the methods of a class) and both efferent and afferent coupling numbers.

Once you have those numbers, though, what do they mean, especially in terms of design? The numbers generated as metrics provide a single dimension of information about your code, and the raw numbers themselves frequently don't mean much. You can generate useful information from metrics in two ways. One is to look at how a particular value changes over time and spot trends. Or you can combine metrics to enrich the information density, which is the approach I'll show you in this article.

Metrics and design

I've been torturing the Struts code base in several articles in this series — not because I have a bias against Struts, but because it is a well-known open source project. Trust me: you can get unattractive design characteristics from most code you can find in the world! Having started with Struts, I'll continue using it to illustrate my points.

The output of ckjm is text, which is convertible to XML (and, via various transformations with XSLT, to other formats). Figure 1 shows the combination of several of the ckjm metrics, where WMC (Weight Methods per Class) is the sum of the cyclomatic complexity of the methods of the class and Ca is the afferent coupling:

Figure 1. ckjm metrics results in a table
Tabular view of ckjm metrics results

Figure 2 shows the same table, sorted by WMC:

Figure 2. ckjm metrics, sorted by WMC
ckjm results, sorted by WMC

Just by looking at this result, you can tell that DoubleListUIBean is the most complex class in the Struts code base. That suggests that it is a good candidate for refactoring to remove some of the complexity and see if you can find some abstractable, repeating patterns. However, the WMC number doesn't tell you whether investing in refactoring this class toward better design is a good use of time. Notice that the Ca for the class is 3. Only three other classes use this class, which suggests it's not worth investing lots of time improving the class's design.

Figure 3 shows the same ckjm results, sorted this time by Ca:

Figure 3. ckjm results, sorted by afferent coupling
ckjm results, sorted by CA

This combined view shows that the most-used class in Struts is Component (not surprising, given that Struts is a Web framework). Although Component isn't as complex, it's used by 177 other classes, which makes it a good candidate for design improvements. Making the design of Component better has a ripple effect on a large number of other classes.

The combination of WMC and Ca is the best way to read the perspective offered in Figure 3. This tells you both what's important and what's complex in the code base, in a single view. If you came to this code base with no prior knowledge, this view offers insights into where your effort potentially yields the best results. Although it's not infallible, you now have more information about the code base than you can derive from just looking at reams of code.

Numeric metrics provide insight into your code, but they exist at a pretty low level, providing information about specific classes but not much of a holistic view of a code base. Lots of tools are available now to take metrics to the next level via visualizations.


Metrics visualizations

Visualizations of metrics provide alternate views of specific dimensions, either of single dimensions or aggregations of several dimensions. The Smalltalk community created a huge number of metrics visualizations (and even created a platform, called Moose, to enable these kinds of visualizations; see Resources). Many of the metrics techniques developed by the Smalltalk community migrated to Java programming.

iPlasma and industry standards

Some of the common questions relating to cyclomatic complexity are "How does my code compare to others?" and "What is a good number for a particular class?" The iPlasma project answers these questions (see Resources). iPlasma is a platform, created as a university project in Romania, for quality assessment of object-oriented design. It generates a pyramid, showing key metrics for your project along with comparisons to industry-standard ranges for those numbers.

When you run iPlasma, you point to a source-code directory, and it churns away for a bit, producing a metrics pyramid like the one in Figure 4, which is based on the Struts 2.0.11 code base:

Figure 4. iPlasma metrics pyramid
iPlasma metrics pyramid

This pyramid is packed with information, once you understand how to read it. Each row has a colored percentage; the percentage is derived via the ratio of the number on this row and the one under it. Table 1 shows what the numbers indicate, starting at the top:

Table 1. Understanding the iPlasma pyramid
CodeDescription
NDDNumber of direct descendants
HITHeight of inheritance tree
NOPNumber of packages
NOCNumber of classes
NOMNumber of methods
LOCLines of code
CYCLOCyclomatic complexity
CALLCalls per method
FOUTFan out (number of other methods called by a given method)

The numbers indicate the ratios; the colors indicate where the ratios fit into the industry-standard ranges (derived from numerous open source projects). Each ratio is either green (within the range), blue (below the range), or red (outside the range). For the Struts code base, NDD and CYCLO are outside the industry standards for those values, and LOC and NOM are below. The ranges used appear in Table 2:

Table 2. iPlasma industry ranges for metrics
LowMediumHigh
CYCLO / Line0.160.200.24
LOC / method71013
NOM / class4710
NOC / package61726
CALLS / method2.012.623.20
FANOUT / call0.560.62 0.68

iPlasma also generates advice based on the pyramid, shown in the iPlasma display immediately below the pyramid. Figure 5 shows the advice for Struts:

Figure 5. iPlasma advice
iPlasma advice

The numbers iPlasma generates serve a couple of purposes. First, they allow you to compare your code base to others along several dimensions. Second, these numbers indicate places where you might want to expend effort to improve code hygiene and design. For example, for Struts, iPlasma indicates that the depth of the inheritance tree is pretty high and that the methods tend to be too complex. However, you must understand these numbers in context. A Web framework like Struts will tend to have a pretty elaborate hierarchy, meaning that the NDD number probably doesn't warrant concern. The CC number, though, has nothing to do with the context — it is too high and indicates a design smell at the method level.

For comparison purposes, Figure 6 shows an iPlasma pyramid for the Vuze project, an open source BitTorrent client written in the Java language (see Resources):

Figure 6. iPlasma pyramid for Vuze
iPlasma pyramid for Vuze

Vuze is a large project (more than 500,000 lines of code), with potential design issues around the depth of inheritance tree, the number of methods in each class, the number of lines of code per method, and the number of calls per method.

Dependencies

Emergent design requires visibility into relationships and other high-level abstractions in your code. Trying to see these high-level concepts from source code calls to mind the famous image of blindfolded people trying to understand an elephant solely by touch. Each part of the elephant seems like something else, but touch is too localized to allow a holistic view.

Determining dependencies between classes and objects suffers from the same kind of localization problem. Tools like iPlasma allow you to see summaries of the overall characteristics of the code, but they don't tell you specific sites to investigate. Fortunately, other tools can help you see the elephant in several different lights.

The Smalltalk community created a tool called CodeCrawler (see Resources), based on the Moose platform, which shows a graphical representation of code, with dimensions for class size, method length, and some other metrics. It is possible to get CodeCrawler to work with Java code, but it's daunting. Fortunately, you don't have to fight that battle because the X-Ray project already has (see Resources).

X-Ray is an Eclipse plug-in that produces several visualizations to help you see your code's overall structure, including pie views of dependencies between classes, as shown for Struts in Figure 7:

Figure 7. X-Ray visualizations for class dependencies
X-Ray class dependencies view

Each element along the edge of the circle is a class, and the lines indicate the dependencies between the classes. The boldness of the line indicates the strength of the dependency. Clicking on the class shows information about the class, and double clicking on it opens the class in the Eclipse editor. Of course, this view includes too much information to be useful. Fortunately, you can zoom in to see the individual lines. The bold lines indicate strong dependencies between classes (efferent coupling), which might indicate a design flaw when two classes are too intimately related to each other.

X-Ray also includes a similar view for package dependencies, shown in Figure 8:

Figure 8. X-Ray view of package dependencies
X-Ray view of package dependencies

Overall structure

One other X-Ray view shows a useful code visualization, also based on CodeCrawler. The system-complexity view shows your code base as a graph in which the inheritance hierarchy appears as a top-down tree view, the size of the box indicates the number of lines of code in the class, and the box's width indicates the number of methods. Figure 9 shows the system-complexity visualization:

Figure 9. X-Ray system-complexity view
X-Ray system-complexity view

This view also shows outgoing calls (efferent coupling) as pink lines and incoming calls (afferent coupling) as red lines. As in the previous visualizations, clicking on one of the boxes takes you to the class in Eclipse. This view of your code provides a unique perspective hard to achieve by looking at code. Finding design flaws around particular aspects becomes easier if you can quickly filter along certain dimensions, narrowing the parts of your code you should investigate further.


Summary

X-Ray and iPlasma represent just a small set of the types of visualizations available for Java code. Their judicious use allows you to narrow your focus quickly to aspects of your design that hide in the swamp of your project's code. Finding idiomatic patterns is one of the key enablers for emergent design, and tools that make it easy to see patterns (both good and bad) greatly reduce the investigatory effort, leaving more time for refactoring your code to make it better.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=405056
ArticleTitle=Evolutionary architecture and emergent design: Emergent design through metrics
publish-date=03042010