|Joyful research: Finding meaningful metrics|
from The Rational Edge: Professor Gary Pollice talks about his research project at Worcester Polytechnic Institute to measure the attributes of software process, people, tools, and products, and to formulate meaningful metrics. The project focuses on developing a product called Tracer, an Eclipse-based metrics workbench.
In my position as professor of practice, my main jobs are to teach classes and advise projects. Any research I do is icing on the cake, so I can pick and choose projects for pure enjoyment. At this stage of my career, this is about the most perfect job I could have -- except perhaps teaching sailing in the Virgin Islands.
In my February column I told you about research projects my colleagues are working on. This month I want to dig a little deeper into one of my own projects: Tracer -- a project to develop a relationship management tool for collecting and evaluating software project metrics.
What exactly is research?
Studious inquiry or examination; especially: investigation or experimentation aimed at the discovery and interpretation of facts, revision of accepted theories or laws in the light of new facts, or practical application of such new or revised theories or laws.
Whether the researcher is a scientist or medical researcher trying to discover new ways to combat disease, or an historian trying to determine what really happened in the last days of Pompeii, he or she is motivated to extend the body of knowledge about a particular area of concern.
Although few of us will make breakthrough discoveries in our research, the small advances we provide will still be exciting. In high school, I spent almost a week trying to disprove Cavalieri's theorem in solid geometry.2 I still remember the thrill I felt when one of my favorite teachers sat me down, explained where I went wrong, and congratulated me on working so hard on the problem. It didn't matter that I didn't get the results I wanted. I simply felt great about having moved my boundaries. Now, when I attack a new problem, I still get the same kind of rush, even if others are also working on it or have already solved it. To me, the thrill of research lies in the process of investigation.
Research focus: Metrics and empirical studies
Meaningful metrics help us answer questions. They enable us to evaluate and predict. For example, we might need to evaluate the effect of using a new development tool on software quality (we'll leave the definition of quality fuzzy for this discussion). Or we might want to gather data from past projects, or even for specific teams or individuals, to predict how well they might perform on a similar, upcoming project.
Formulating meaningful metrics requires conducting empirical studies -- either experiments or case studies. Experiments give us more control, but they are harder to conduct and more expensive than a case study. Moreover, case studies usually yield more data that is accessible.
Metrics and relationship management
In software engineering, we discuss traceability with respect to requirements management. If your system is large enough to have different types of requirements -- features, use cases, test requirements, and so on -- then it is useful to have traceability among these requirements, so you can determine whether you have taken all of these requirements into account as the system develops. Traceability also lets you perform impact analysis. For example, if one of your use cases changes, then you might have to change your test requirements that trace to (or from, depending upon how you represent the relationship) the use case. In other words, by considering the traceability relationships, you can determine the potential impact of a requirement change.
Typically we represent traceability via a table. That might be a simple spreadsheet. Or, if you have a requirements management tool, such as IBM® Rational® RequisitePro,® you can create traceability tables and indicate change impacts automatically.
Traceability is just one type of relationship among requirements and other software project artifacts; it is not the only one. When you fix a defect, there is a relationship between the defect report and other artifacts that will change because of the fix. Although you could model this as a traceability relationship, the model might be semantically weak. The situation is analogous to depicting all relationships in a UML model as dependencies. UML provides additional semantics for associations that let you express more than a dependency.
However, the last form is usually reserved for functions, and there is no guarantee that a relation is a function.4
Most texts on discrete mathematics or set theory have readable sections on relations, of which there are many types, each with different properties. Relations can be symmetric: for any relation R, if (a, b) is in R, then (b, a) is in R. This is useful for modeling bi-directional relationships. Other common relations are reflexive and transitive.
We can think about representing relationships between artifacts as mathematical relations, although we might want to add semantic information -- attributes -- to help us describe specialized relationships. That way, we can be more specific than simply saying that artifacts "trace" to each other.
Some artifacts we produce during a software development project are critical for project success, whereas others are either optional or even useless. Understanding the relationships between requirements can help us understand the effort required to create and maintain corresponding artifacts, and then to make informed decisions about whether their value will be worth the investment.
We encounter a couple of issues if we try to measure the attributes of relationships between artifacts as a project progresses. First, we don't really know in advance what the relationships are. We can make assumptions, but from a research perspective it is desirable to iteratively zero in on a set that makes sense. We can do this by selecting an initial set, gathering and analyzing meaningful data, and then adjusting and repeating until our results show stability.
The second issue is that we don't always know which attributes to measure or how to measure them. We might begin measuring an attribute and find that our original assumptions about the accuracy and precision of our measure are incorrect. Consider measuring the size of a software component. There are many ways to do this. A simple measure is lines of code (LOC), but this varies with the coding style and may not measure the amount of code actually written. Today, many systems include code generated by the development environment. Of course, it is tempting to use LOC because it is very easy to compute; more complex measures of size are much harder to compute.
The Tracer Project
There are two things worth mentioning about these desires. First, they aren't really new. When planning the first version of Rational Suite, we conceived of a project view that would allow users to see relationships among all the different types of artifacts the tool suite manages. The view would support impact analyses for changes as well as readiness assessments -- to determine whether the software under construction was ready to ship. Unfortunately, this was neither an easy thing to accomplish nor a main feature of the suite, so it was scoped out of the project.
The second point about my desires is that linking so many diverse types of possible artifacts with relationships is a hard problem. Most projects use a variety of tools to develop their software, usually not all from the same vendor; artifacts associated with one tool are not always accessible in others. One of my research project's objectives is to eliminate this problem by abstracting the artifacts to the point where we can deal with any type of artifact produced during a project in a uniform manner. Another way of saying this is that we want any discrete item in our project's universe to be a possible artifact.
Basically, the project focuses on developing a product called Tracer, an Eclipse plug-in that allows us to define relationships between artifacts, perform calculations on their attributes and relationships, and view the results. It then lets us test new metrics and theories about the ability to predict or assess software projects and products. In short, Tracer is a metrics workbench.
Second, and more important, Eclipse has an extremely well-designed underlying framework that facilitates extensions to existing functionality and available features -- which are many. The Eclipse user interface in Figure 1 shows several possible candidates for artifacts. File-level artifacts, such as source files, documents, and so on, are shown in the Package Explorer view at the upper left. More detailed items -- parts of the class being edited -- are in the right pane. These consist of fields, methods, subclasses, and so on. The granularity can be quite small. As contributors add other plug-ins and tools to the Eclipse environment, we will see more possibilities for standard artifacts; and we can create artifacts of our own and add them to the environment.
Figure 1: Typical Eclipse user interface
Eclipse gives Tracer many more possibilities than would a closed platform and makes it easier to add others, because all Eclipse plug-ins conform to an implementation standard.
Another approach has been proposed to address the diversity of artifact types: Use a service-oriented architecture (SOA) for gathering artifact information. This seems to be a reasonable approach, and one that can be integrated with our approach in the future.
It's useful to have metrics that help determine how well a system is designed. From an object-oriented perspective, some such metrics have been around for a while, including the Chidamber-Kemerer metrics, described in 1994.5 These measure certain attributes of the code, such as lack of cohesion in the methods, depth of the inheritance tree, and so on. One can analyze the values of these metrics to make a statement about the code's structural design.
Another approach might be to look at relations between the number of source files that are changed for each defect fix. We could define a relation "causes change to" as a relation from a single defect report to some number of source modules. If our defect tracking tool uses Eclipse, then establishing and maintaining the relations would be rather simple. Over time, we would get the average number of modules that change per defect, look at the values calculated, and use some set of criteria to determine how well the system is designed. This could involve determining design pattern usage, naming, documentation, or other possible properties. Finally, we would gather the data for several projects.
The goal of such an effort would be to formulate an hypothesis, such as: "There is a significant correlation between the average span of change per defect and system quality." We might test this hypothesis on our data and find that well-designed systems have an average span of less than n module changes per defect, for some n. If we can empirically validate the hypothesis for a reasonable number of projects, then we might apply it to determine when to consider rewriting a system instead of revising it. That is, when does the code become so intertwined and difficult to modify that it might be worth starting from scratch?
Clearly, this is only one example of how we might use Tracer. The basic need is a way to quickly formulate our hypotheses and then collect the data to validate them. Many applications should grow out of our research.
The road ahead
I've also been working on a formal mathematical relationship model that will let us precisely define the different relations and metrics that we want to test. Once that model is complete, we can begin to formulate different models of artifacts and relations and derive applicable metrics. Then we can implement the model within the Tracer framework.
As for most research projects, part of the work thus far has been reviewing past research on traceability relations and papers about formal descriptions of traceability, including those shown on our bibliography page.6 To follow our progress, visit the Tracer project's wiki at http://www.cs.wpi.edu/cgi-bin/gpollice/tracewiki/wiki.pl?Home_Page.
This project is exciting and fun, and it may even produce some useful results. Whether or not we achieve the results we want, the joy is in the journey. And if this works, I may even go back and take another crack at Cavalieri's theorem.
2 Eric W. Weisstein, "Cavalieri's Principle." From MathWorld -- A Wolfram Web Resource. http://mathworld.wolfram.com/CavalierisPrinciple.html.
3 A metric is the measure of how much an entity exhibits a certain quality or attribute. A measure is a value that expresses a simple attribute. We can measure the size of a program in terms of lines of code and the number of defects found in a module. However, we often need to combine measures to get meaningful metrics. A metric of defects per thousand lines of code (KLOC) might be a good way to determine whether our quality assurance process is effective.
4 A function is a relation with exactly one independent element that corresponds to each dependent element. In the example here, the feature requirements are the independent elements and the use cases are the dependent elements. Since a feature might trace to more than one use case, t is not a function.
5 Shyam R. Chidamber and Chris F. Kemerer, "A Metrics Suite for Object Oriented Design." IEEE Transactions of Software Engineering, vol. 20, no. 6, June 1994.