All but the most trivial of programs manipulate some types of data. Static type systems provide a way to ensure that a program doesn't manipulate data of a given type inappropriately. One of the advantages of the Java language is that it is strongly typed, so that the possibility of a type error is eliminated before the program is ever run. As developers, we can use this type system to produce more robust and bug-free code. Often, though, the type system is not used to its full potential.
Many programs make less use of the static type system than they could, instead relying on special fields to contain tags that distinguish the types of data.
By relying on these special fields to distinguish types of data, such programs forego the very protection that the type system was designed to give them. When one of these tags mislabels its data, it generates a bug that I call the Impostor Type.
One common symptom of an impostor type bug is that many conceptually distinct types of data are all treated in the same (and incorrect) manner. Another common symptom is that data doesn't match any of the designated types.
As a rule of thumb, suspect this bug pattern whenever there is a mismatch between the conceptual type of data and the way it is handled by your program.
To illustrate how easily bugs of this pattern can be introduced, let's consider a simple example. Suppose we want to manipulate various Euclidean forms, such as circles, squares, etc. These forms will have no position, but they will have a scale, so that it will be possible to compute their area.
public class Form {
String shape;
double scale;
public Form(String _shape, double _scale) {
this.shape = _shape;
this.scale = _scale;
}
public double getArea() {
if (shape.equals("square")) {
return scale * scale;
}
else if (shape.equals("circle")) {
return Math.PI * scale * scale;
}
else { // shape.equals("triangle"), an equilateral triangle
return scale * (scale * Math.sqrt(3) / 4);
}
}
}
|
There are serious disadvantages to implementing forms in this way, even though you see it done often.
One of the most glaring disadvantages is that this method is not very extensible. If we wanted to introduce a new shape for our forms (such as, "pentagon"), we'd have to go in and modify the source code for the getArea() method. But extensibility is a separate concern; in this article, we'll focus on the susceptibility for errors that the implementation causes. I'll come back to the issue of extensibility in a future article.
Consider what would happen if, in some other part of the program, we constructed a new Form object as follows:
Form f = new Form("sqaure", 2);
|
Of course, "square" has been misspelled. But, as far as the compiler is concerned, this is perfectly valid code.
Now consider what will happen when we try to, say, call getArea() on our new Form object. Because the shape of the Form won't match any of the tests in the if-then-else block, its area will be computed in the else clause, as if it were a triangle!
There will be no error signaled. Indeed, in many circumstances, the return value will appear to be a perfectly reasonable number. Even if we put in some redundancy and check that the implied condition in the else clause holds (with, say, an assertion), the error won't be found until the code is run.
Many other similar bugs might occur with the above code. A clause might be accidentally left out of the if-then-else block, causing all Forms of the type corresponding to that clause to be handled improperly. Additionally, because the impostor type is just a String in a field, it might be modified, either accidentally or maliciously.
Either way, such modifications could wreak all sorts of havoc.
As you might have guessed, I suggest avoiding bugs of this type by using the type system to weed them out during static checking. Consider this alternative implementation:
public abstract class Form {
double scale;
public Form(double _scale) {
this.scale = _scale;
}
public abstract double getArea();
}
class Square extends Form {
public Square(double _scale) {
super(_scale);
}
public double getArea() {
return scale * scale;
}
}
class Circle extends Form {
public Circle(double _scale) {
super(_scale);
}
public double getArea() {
return Math.PI * scale * scale;
}
}
class Triangle extends Form {
public Triangle(double _scale) {
super(_scale);
}
public double getArea() {
return scale * (scale * Math.sqrt(3) / 4);
}
}
|
Now consider what would happen if we were to mistype "Sqaure" when creating a new Form. The compiler would signal an error, telling us that class Sqaure could not be found. The code would never even have a chance to run.
Similarly, the compiler would not allow us to forget to define getArea() for any of our subclasses. And, of course, it would be impossible for any object to change the type of a Form.
Before leaving this topic, I'd like to discuss one more possible implementation, a kind of cross between the two implementations I've discussed.
In this case, no impostor types are used, but the code has many of the same susceptibilities as if they were. In fact, this implementation is worse than implementing getArea() separately for each type.
public abstract class Form {
double scale;
public Form(double _scale) {
this.scale = _scale;
}
public double getArea() {
if (this instanceof Square) {
return scale * scale;
}
else if (this instanceof Circle) {
return Math.PI * scale * scale;
}
else { // this instanceof Triangle
return scale * (scale * Math.sqrt(3) / 4);
}
}
}
class Square extends Form {
public Square(double _scale) {
super(_scale);
}
}
class Circle extends Form {
public Circle(double _scale) {
super(_scale);
}
}
class Triangle extends Form {
public Triangle(double _scale) {
super(_scale);
}
}
|
Although the compiler would still catch type misspellings, and the types of objects could not be changed, we are again using an if-then-else block to dispatch on the appropriate type. Therefore, we are again susceptible to mismatches between the instanceof checks in the if-then-else block and the set of types we are operating on.
I should also mention that, like the first implementation, this implementation is not as extensible as the second.
So, in a nutshell, here is our latest bug pattern:
- Pattern: Impostor Type
- Symptoms: A program that treats data of conceptually distinct types in the same way, or doesn't recognize certain types of data.
- Cause: The program uses fields with tags in lieu of separate classes for the various types of data.
- Cures and preventions: Divide conceptually distinct types of data into separate classes whenever possible.
The important point is that the language offers you the best resources for avoiding this type of error -- just remember to use them.
- The JUnit home page provides links to many interesting articles discussing program testing methods, as well as the last version of JUnit.
- If you like JUnit, check out the entire set of xUnit
testing tools for many different languages.
- I'd be remiss if I didn't mention that the xUnit suite of tools is designed for use with Extreme Programming, a new and powerful way of developing clean, robust software quickly.
-
"The UML Profile for Framework Architectures" (PDF slide show) highlights a detailed case study of JUnit.
- Although not directly related to this discussion, I recommend checking out Martin Fowler's article discussing the role of UML and design in Extreme Programming.
- Take the "Java debugging" tutorial (developerWorks, February 2001) for help with general debugging techniques.
- New to Java development or looking to brush up on your Java programming skills? Take this comprehensive tutorial, "Introduction to Java programming."
- Read all of Eric's Diagnosing Java Code articles, many of which focus on bug patterns.
- Find more Java resources on the developerWorks Java technology zone.
Eric Allen has an A.B. in computer science and mathematics from Cornell University. He is a Ph.D. candidate in the Java programming languages team at Rice University. His research concerns the development of semantic models and static analysis tools for the Java language, both at the source and bytecode levels. Currently, he is implementing a source-to-bytecode compiler for the NextGen programming language, an extension of the Java language with generic run-time types. Contact Eric at eallen@cs.rice.edu.
Comments (Undergoing maintenance)





