Skip to main content

Measuring up

Gary PolliceWorcester Polytechnic Institute
Gary Pollice is a professor of practice at Worcester Polytechnic Institute, in Worcester, Massachusetts. He teaches software engineering, design, testing, and other computer science courses, and also directs student projects. Before entering the academic world, he spent more than thirty-five years developing various kinds of software, from business applications to compilers and tools. His last industry job was with IBM Rational Software, where he was known as "the RUP Curmudgeon" and was also a member of the original Rational Suite team. He is the primary author of Software Development for Small Teams: A RUP-Centric Approach, published by Addison-Wesley in 2004. He holds a B.A. in mathematics and an M.S. in computer science.

Summary:  from The Rational Edge: Software developers are notoriously bad at planning projects because they can.t estimate schedules or cost reliably, and they fail to evaluate what happened at the end of a project. This article discusses how software developers can measure their work and arrive at more accurate estimates for planning purposes.

Date:  13 Aug 2004
Level:  Introductory
Activity:  614 views
Comments:  

Illustration When your project manager asks you how long it will take to deliver defect-free results for a particular development task, how confident are you of your answer? As software developers, we are notoriously bad at planning our projects. We don't estimate schedules or cost reliably. At the end of our projects, we typically fail to evaluate how we did, so it's hard to learn from the past and improve future performance.

This month, I discuss how you, as a software developer, can measure your work; I describe what to measure and the benefits of doing so. Next month, I'll step up a level to look at project and organization metrics.

Computer science is different from software engineering. Compare these definitions:

  • Computer science. "The systematic study of computing systems and computation. The body of knowledge resulting from this discipline contains theories for understanding computing systems and methods; design methodology, algorithms, and tools; methods for the testing of concepts; methods of analysis and verification; and knowledge representation and implementation." 1

  • Software engineering. "(1) The application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software: that is, the application of engineering to software. (2) The study of approaches as in (1)."2

Computer science forms a basis for software engineering, but software engineering is broader in scope and less concerned with theoretical results. In short, software engineering is like other engineering disciplines: It focuses on applying science and technology to practical problems.

Engineering requires observation, measurement, and calculation of things. For example, an engineer might ask how long it takes to build a certain type of bridge, or how much stress can be placed on a ceiling beam. Established engineering disciplines use a rich set of measures and metrics to predict how long a project will take, how much it will cost, and how well the project succeeded.

By contrast, software engineering practice needs more work. As industry practitioners, we do a poor job of gathering the data necessary to help us plan effectively and deliver products of high quality. Here are a few examples of the reasons -- excuses really -- for our poor performance:

  • Software is unique.

  • We never really do the same thing twice.

  • Software engineering is a knowledge-based discipline that often requires innovation and experimentation. How can you plan for that?

  • Technology changes so rapidly that any metrics based on one set of technologies may not apply to a different set.

Instead of apologizing for why we don't, or can't, measure, I plan to show you what we can do. In this article, I discuss what we can measure and how our observations can help us be better software engineers.

Measures, measurements, metrics, and indicators

The terms measure, measurement, and metric are often used interchangeably. However, for software engineering purposes, these terms have distinct definitions:

  • A measure (noun) is a quantitative indication of an attribute, such as its size, dimension, or capacity. An example of a measure is the number of defects found in a program. The measure is the raw number.

  • Measurement is the act of determining a measure. To perform a measurement, we measure (verb) the number of defects in the code to give us the defect measure (noun).

  • A metric is a quantitative indication of the degree to which a system or component possesses an attribute. So a metric might be the number of defects per thousand lines of code, which is the defect density metric. In other words, a metric is described in terms of the relationship between measures.

We also need to consider an indicator, a metric or combination of metrics that provides insight into our process or product. An indicator is often expressed as a trend, such as the defect density per month. A decrease in defect density is a good indicator of software quality.3


Why measure your personal work?

You might wonder why it's a good idea to make the effort to measure your work. It takes time to capture data, and that's time you really cannot afford. You probably have a good idea about your strengths and weaknesses as a software developer. However, are you really confident about your current knowledge, the quality of your deliverables, and the speed at which you work? Can you accurately predict the effort required for your next assignment?

Measuring your work is about becoming more accurate and confident in your estimates. It's about improving the quality of your work. It's about knowing yourself better. When you understand your strengths, you can capitalize on them. When you recognize your weak areas, you can work to improve your skills.

To illustrate these points, and to show how intuitive thinking doesn't help us, I relate two stories from a tutorial I attended on the Personal Software ProcessSM (PSP) given by Watts Humphrey from the Software Engineering Institute. Watts presented an overview of how PSP helps improve personal productivity and increase the quality of our products.

At several points during the tutorial, someone would question the usefulness of a practice Watts proposed. An interesting dialog ensued:

Attendee: I doubt if that practice is really useful.
Watts: Well, my data shows that it is effective and useful.
Attendee: But it doesn't work because ...
Watts: Show me your data to back that up.

At this point, the dialog ended. Watts Humphrey is an engineer. He focuses on data, not theories. He measures things and sees what works. Now, at that time, he gathered much of his data from classroom experiences, but it was data nonetheless. He also knew that none of us in the session had data to back up our feelings. So he "won" these exchanges by default.

Another incident occurred during a break. I had worked in a research and advanced development group in the mid 1980s. This group was unique in part because of its intense focus on quality. We worked hard to achieve zero defects in everything we did. Our practice was to ask the rest of the company to use our completed work. The first person to find each defect was entitled to lunch, courtesy of the software's author.

My first large project was to write a back end for our compiler. I was afraid that when I released it, I would lose my mortgage money buying lunches for people who discovered defects in my code. My fears were unfounded. In the first year, only three defects were reported. By the way, that was about average for our group. We tested, reviewed, and tested again before we released something. We had a personal stake in the product.

I thought I'd impress Watts with how well I had done. He'd surely be glad to hear that quality was not a dead issue in the real world. When I proudly told him that only three defects were reported in the first year, he replied: "How many defects did you remove before you released?" What could I say? I had no idea how many defects I had removed. I knew there were a lot. The point of Watts' inquiry was not to tell me I had done a poor job. Rather, he was inviting me to examine why I was able to deliver high-quality software and encouraging me to search for areas in which I could improve.

I returned to my job determined to use metrics to help me improve my work. Since then, whenever I've taken the time to measure my own work, I find that my estimates are more accurate, and I end up delivering higher quality work. When I don't measure, I slip backwards. I think it's like Weight Watchers. When you are on the program, you have a certain goal, and you can track your progress with data that you collect. When you go off the program, you know what you have to do, but you tend to fall into your old bad habits bit by bit.

The bottom line is that if you measure yourself, you can see where you are, where you're slipping, and where you're improving. Measurement is a tool for improving your software self-image.


What should you measure?

You can probably think of a lot of things that you do or produce that are candidates for measuring. Watts Humphrey advises that if you measure time, size, and defects, you can derive all of the metrics you need. So, let's talk about each of these from a typical software developer's point of view.

Time

Time is pretty simple, isn't it? Just measure how long it takes you to do a task. But what tasks should you measure by the clock, and should you group them? I suggest starting with your development process. If you use PSP, you will probably measure your planning,4 design, coding, testing, and compiling time. There's also a phase for postmortem activities in which you reflect upon your process and adjust it accordingly. If you do code reviews, you might record the amount of time you spend there, too.

If you use a process such as an instance of the Rational Unified Process®, or RUP®, I recommend that you record the time you spend in different workflows and workflow details for the roles you fill. If you use an agile process such as Extreme Programming (XP), you can time your programming, estimating, testing, and so on.

The decision about which groups to assign to your activities is not that important. The categories should make sense to you and the work you do. It is important that you time as accurately as possible and that you are consistent. I have found that measuring time in minutes is the easiest way to record my time data.

Figure 1 illustrates an example of my time data for a project I'm currently working on. The columns represent my planned time, the time I've actually spent so far, and how much time I've spent in each phase of PSP to date on all of my completed projects. Later in this article, I show how I get and present the data.

Figure 1: Time recording

Figure 1: Time recording

Figure 1 shows no planned or spent time for compiling. I use the Eclipse development platform for my work, and the compilation occurs incrementally. So I'm not able to easily measure compile time. That's fine as long as I record the information in a consistent way. Even though I don't count the compile time, I do record defects the compiler finds.

Size

Size can be a controversial measurement. How can you measure the size of your software? Do you measure the number of bits in the final product? The number of lines of code (LOC), function points, use case points, or something else? There are so many choices. Again, my advice is to take a simple approach and measure consistently. None of these measures is perfect, and many of them are hard to determine.

I use LOC to measure the size of my work.5 This is fairly simple and reasonably accurate if you have a coding standard and use software to do the counting. You are gathering data for your own use, so you can count however you like. For example, consider Code Sample 1 below. How many lines of code are there? The answer can be anywhere from five to eighteen, depending on how you count. It doesn't matter, as long as you use the same method to count each different piece of code.

Code sample 1: How many lines of code?

An easy way to measure the size is to use a code formatter that organizes the code according to your coding standards, and then use a program that counts the LOC for you. There are many formatters and LOC counters available from the open source community. (Look in

http://www.sourceforge.net

.) Your IDE may also have some of these features built in.

Once you establish a good counting standard, you still need to decide what to count. There are a lot of choices. Do you count deleted lines? What about modified lines? How easy is it to determine which is which? Then, there's the issue of code that is generated by another tool, such as your IDE. Do you reuse other code? If so, should it be treated the same as code you develop?

There are no easy answers to these questions. The tool I use has a line counter that lets me count new code as well as changes to a code baseline. Figure 2 shows a sample report on new code I wrote, and Figure 3 shows a sample report on code I added to the base.

Figure 2: Size record for new code

Figure 3: Add-on code measurement

Size can be as important as time for estimating project length. If you keep a record of both, you will find out whether you're better at estimating the time you think it will take or how big you think the project will be. If you are better at estimating size, you can arrive at a good time estimate by looking at your historical data.

Remember that if you use LOC as your metric and you work with different programming languages, your metrics will vary, especially if the languages are dissimilar. For example, the LOC required to write a program in a language such as Scheme is different from the LOC for the same program written in Java or C++. If you use a lot of different programming languages in your work, then I suggest you not use LOC to measure size.

Defects

In addition to time and size, you need to measure defects, both where you introduce them and where you remove them. There are many definitions of defect, and people often use other terms, such as fault, error, or bug. Let's agree that when you develop software, you make errors that cause defects to be injected into the software. At some point, defects are found -- through some sort of observation -- and removed.

Recording defect data serves several purposes. First, it gives you an idea of your defect injection rate, that is, the total number of errors you make that introduce defects into your product. Even if you immediately remove the defect, you still made the error. If you have kept records for a while and know that your defect injection rate is about 30/KLOC6 and your current work is showing only 5/KLOC, you need to figure out why. Maybe you've generated a lot of code, or you've taken quality pills to reduce your errors. Or maybe you've not done a lot of testing. But something is responsible for the drastic drop. Until you find out what that is, you probably shouldn't release the product.

If you group your defects by type or category, you can observe a distribution of your most common mistakes. This information can help guide your inspections and reviews. If you know, for example, that when you use loops you have a tendency to be off by one on the loop control counter, you might spend a little more time coding the loops, or at least, reviewing the code before you check it into your production system.

Knowing when you inject the error and when you discover and remove it provides you with valuable information about your testing process. You want to remove defects as soon as possible after they've been injected. A well-known maxim of software engineering is that the longer a defect goes unfixed, the more costly it is to repair. If the majority of your defects are discovered either late in your development process or after you deliver the product, you probably need to revise your process.

PSP suggests a small set of defect types that will cover most of the defects you will encounter. Measuring defects is easy. You identify where an error was injected, where you fixed it, and how long it took to fix. Figure 4 shows the defect injection and removal data for one of my projects. Because I use a test-first approach for much of my development, I injected a significant number of defects during my test phase.

Figure 4: PSP defect data



Analyzing the results

Data is just data. You need to work with it to turn it into information. In some cases, you analyze data by consolidating and reporting, as shown in the previous figures. We can gain even more insight by performing statistical analysis on the data we collect.

Statistical analysis can help us determine whether our data is significant and how well different collections of data correlate to each other. For example, is there a relationship between the amount of time we spend on software development and the size of the final result? After completing two projects, we can see an excellent correlation between the two, as shown in Figure 5. However, the correlation is not useful; two data points are insufficient to be statistically significant.

Figure 5: Data from two projects

After you've measured several projects, you have enough data to help you predict your performance. Figure 6 shows a chart from eighteen of my tasks, comparing the actual time I spent on the project against the time I planned to spend. The value R² is the square of the correlation. If R² is > 0.5 (ideally >= 0.75), then the data is useful for estimating and planning purposes. The closer it is to 1.0, the better the observed correlation is. In this case, the value of R² is 0.86, showing that I'm pretty good at predicting the actual time it will take me to complete a task. Some people are better than others at estimating, and it's taken me some time to get good at it. Experience helps, but having the PSP data to back me up has vastly improved my estimating ability.

Figure 6: Correlation between planned and actual time

You can perform many types of statistical analyses on your data and choose from among many software programs to help you. For example, the analysis in Figure 6 was done using Microsoft Excel. The key is to ensure that you consistently record and enter the data.

Determine what questions you want answered from your data and then select the analysis that is right for you. Watts' book in the Further Reading section of this article provides a good starting point.


Not all projects are equal

Be careful when you collect data from your software development work. Earlier, I mentioned that using different programming languages can affect the LOC measurement. Another difference can arise because of the type of work you are doing. It takes longer to produce a line of code when you are enhancing software, especially if you are unfamiliar with it. It takes longer to develop software when you are using technology that is unfamiliar to you, or when you're working in a domain that you've never worked in before.

If you record all of your data in one large group, without the ability to sort out similar projects, it will be much less valuable than if you identify a few categories of work that you are likely to do and keep separate data for them. I have different data repositories for my new software development projects, my maintenance and enhancement (of other people's) work, and for Eclipse plug-in work. Each of these affects my overall performance differently.


Final thoughts

Recording and analyzing data can be tedious, but tools can help you. A book I co-authored, Software Development for Small Teams, describes one such tool for capturing PSP data. That tool is primitive and not ready for wide distribution. More recently, I have found another tool that meets my requirements; it's extensible and open source. The tool is the Process Dashboard and is available from http://processdash.sourceforge.net/. I recommend it if you want to start measuring your personal process.

Figures 1 through 4 were taken from the Process Dashboard reports. Capturing the raw data is simple. A small Dashboard control panel sits at the top of my computer display, and I simply click to start or stop timing, change projects or activities, or display a defect reporting form. The Dashboard control panel is shown in Figure 7.

Figure 7: Process Dashboard control panel

Process Dashboard offers both ease of use and an extensive set of built-in analyses and reports. It's worth trying it if you want to start a personal measurement program without a lot of pain.

To improve your engineering capabilities, think about capturing your personal data. You don't have to share it with anyone. It's yours, and it's a valuable resource. After you've measured your work for a while, think how differently you'll feel the next time your project manager asks: "How long will it take you to do this task?" You will be able to answer with greater confidence. More important, think how your manager will feel when she realizes that your estimates are pretty good.


Further reading

Watts S. Humphrey, A Discipline for Software Engineering. Addison-Wesley, 1995, ISBN 0201546108.

Norman E. Fenton and Shari Lawrence Pfleeger, Software Metrics: A Rigorous and Practical Approach. Brooks Cole, 1998, ISBN 0534954251.

Lloyd Jaisingh, Statistics for the Utterly Confused. McGraw-Hill, 2000, ISBN 0071350055.

Freedman, Pisani, and Purves, Statistics, Third Edition. W.W. Norton & Company, Inc., 1998, ISBN 0393970833.

Pollice, Augustine, Lowe, and Madhur, Software Development for Small Teams: A RUP-Centric Approach. Addison-Wesley, 2003, ISBN 0321199502.


Notes

1 http://www.hpcc.gov/pubs/blue94/section.6.html.

2 IEEE Standards Collection: Software Engineering, IEEE Standard 610.12-1990, IEEE, 1993.

3 Of course, declining defect density might mean that our testers have all gone on vacation. No single metric or indicator is sufficient for most purposes.

4 Planning for PSP includes the time to develop and refine requirements.

5 Of course, that's only for my software tasks. As I write this article, I'm measuring the number of words, as counted by the software I'm using.

6 KLOC = thousand lines of code.


About the author

Gary Pollice is a professor of practice at Worcester Polytechnic Institute, in Worcester, Massachusetts. He teaches software engineering, design, testing, and other computer science courses, and also directs student projects. Before entering the academic world, he spent more than thirty-five years developing various kinds of software, from business applications to compilers and tools. His last industry job was with IBM Rational Software, where he was known as "the RUP Curmudgeon" and was also a member of the original Rational Suite team. He is the primary author of Software Development for Small Teams: A RUP-Centric Approach, published by Addison-Wesley in 2004. He holds a B.A. in mathematics and an M.S. in computer science.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=93006
ArticleTitle=Measuring up
publish-date=08132004
author1-email=
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers