Skip to main content

Diagnosing Java code: Walking the specification tightrope

Why well-defined specifications are critical to software systems

Eric Allen (eallen@cs.rice.edu), Ph.D. candidate, Java programming languages team, Rice University
Eric Allen has a bachelor's degree in computer science and mathematics from Cornell University and is a PhD candidate in the Java programming languages team at Rice University. Before returning to Rice to finish his degree, Eric was the lead Java software developer at Cycorp, Inc. He has also moderated the Java Beginner discussion forum at JavaWorld. His research concerns the development of semantic models and static analysis tools for the Java language, both at the source and bytecode levels. Eric has also helped in the development of Rice's compiler for the NextGen programming language, an extension of the Java language with generic run-time types. Contact Eric at eallen@cs.rice.edu.

Summary:  Program specifications are a critical but time-consuming part of any software project. In this installment of Diagnosing Java code, author Eric Allen discusses why a well-defined specification for your code is necessary and explores traditional software engineering approaches as well as extreme programming methods, comparing the advantages and disadvantages. Eric also discusses ways in which you don't want to define a program specification, and explains why. After reading this article, you'll understand how to weigh the costs and benefits of maintaining precise design specs.

View more content in this series

Date:  01 Feb 2002
Level:  Introductory
Activity:  1426 views
Comments:  

Program specifications are critical to building reliable software. Without well-defined specifications, it's difficult to diagnose misbehavior of a software system. But program specifications for many software systems are poorly defined. And what's worse, many don't exist at all.

Intuitively, a program specification is a description of the behavior of a program. It can take many forms, but one thread runs through all instances regardless of the form: it is essential to have some type of system specification because this is how you determine whether the system is behaving correctly.

A specification may be formalized or loosely defined, depending on the stability and criticality of the system under development, as well as the ease with which the system can be modified after deployment.

We'll begin by discussing why specifications are important, why they are often ignored, and how the situation can be improved.

Balancing the costs and benefits of precision

In the world of microprocessor design, systems are deployed on a massive scale in applications varying from personal computers to mission-critical medical and military systems. There is one natural, unbreakable rule in this world: modifying chip designs after deployment is extremely expensive.

Therefore, it's no surprise that the specifications of microprocessors are typically formalized. A formal specification has a tremendous advantage in that it can be interpreted and analyzed automatically. In the case of microprocessors, many aspects of the design can be automatically proven correct.

The software analogy: Programming languages

In the software world, the artifacts most similar to microprocessors in terms of their deployment and criticality are programming languages. A popular programming language may be used to write countless programs in systems at all levels of criticality.

As with chips, the cost of modifying a language design after people are using it can be quite high because the existing programs have to be modified and recompiled. Therefore, the specifications for programming languages, as compared with other software systems, are often quite formal.

This formality is especially important in the case of syntax. Virtually all modern programming languages have a formally specified syntax. Most parsers are constructed through the use of automatic parser generators that read in such grammars and produce full-fledged parsers as output.

Unfortunately, language semantics tend not to be specified so rigorously. It is not because such rigor is impossible.

Languages like ML have a formalized semantics and as a result, many theorems have been proven about them, verifying certain aspects of their correctness (such as, the soundness of their type systems). But languages like ML are the exception. We can identify two reasons as to why this is the case.

First, because it is much less tractable to actually prove properties about a programming language specification than a hardware design, there is less demand for a formal specification. Instead, many languages are specified in prose. These prosaic specifications are sufficient for most of the people who will actually use them, such as compiler writers. In fact, compiler writers often revel in less formal specification because it gives them more room in which to optimize a program.The other, occasional, users of a language are the programmers, and most of them greatly appreciate an informal specification that they can easily understand.

The second reason is that many languages are developed as "hobbies" by sole developers who may not specialize in the area of programming languages. Unfortunately, these developers are not always aware of the formalisms developed for specifying programming language semantics.

Examples of the costs of ambiguity

Nevertheless, the costs of ambiguity or inconsistency in a language specification can be quite high, leading to decreased portability, reliability, or even to a security hole. By looking at some of the languages in popular use today, we can see how the relative degree of precision in their specifications has affected them.

The C++ language specification has many ambiguities, even at the syntactic level. Additionally, many parts of the specification have been made implementation-dependent. The result: often C++ programs don't behave as intended on more than one platform.

The Python language specification leaves many details implementation-dependent or undefined. As a result, implementations such as Jython and CPython face formidable challenges to providing identical behaviors relative to each other. This problem would be much worse if it weren't for the relative simplicity of the Python language (and I mean that in a good way).

Although no formal specification (akin to that of ML) exists for the Java language, a great deal of care was put into the development of a precise informal specification. The language is typically compiled to bytecode for the JVM, which itself is well specified (although some ambiguities in that specification have been discovered by formal analysis). Additionally, the Java APIs are all specified as part of the JVM. This results in an unprecedented level of portability for Java code.

The conclusion we can draw from this is that it helps to have as precise a specification as possible. But even in the world of programming languages, where problems in a specification are most costly, such precision is rare, partly because producing a precise specification up front is expensive.

Companies have found that it's more cost-effective to ship early and flesh out the details of a specification later (or, more likely, never). Admittedly, for applications with shorter life cycles and a more narrow deployment path, precise up-front specification is just too expensive. A development team may not finish formally specifying its system before its competitors have already shipped their system.

Additionally, large specifications are rarely updated as customer requirements change, and are therefore ignored. But if up-front specification is too costly, what approach should a development team take in specifying their software?

Before we answer that question, let's consider one approach that is often taken, but is really the worst of all possible worlds.


Why implementations are not specifications

In contrast to the above approaches, a great deal of software is implemented without a discernible specification. If and when the software is completed, the implementation is then presented as the specification.

In other words, whatever behavior the software exhibits is said to be the specified behavior.

Some might argue that this is a good thing, since it prevents the developers from wasting their time working out some sort of formal plan that is bound to change. But, while it is true that project specifications often change, an implementation makes for a lousy specification in many respects. Following are a few of the reasons:

  • Implementations contain arbitrary choices.
  • By their very nature, every behavior is intended, so there are no bugs.

Many of the choices made in an implementation are arbitrary. Thus, any future attempt to implement on another platform has nothing to go on but an existing implementation. The developers will have to wade through numerous implementation details to determine the behavior that the implementation entails. It is much easier to determine such behavior when it is specified at a higher level of abstraction.

Also, if an implementation is literally taken as its own specification, then it is impossible to identify any behavior of that implementation as a bug. How many times have we seen software companies identify a deleterious or annoying behavior as "normal" just because it is an unforeseen effect of not developing a specification?


A cost-effective specification

A third, obvious reason why implementations cannot effectively serve as the specification for the initial developers is because no such implementation yet exists. These developers must rely on some model of behavior for the system they're creating, so the source of that model could serve as the software's specification.

This point sheds some light on what sort of a specification a developer might use with reasonable cost. While it's true that in order to determine how to implement a feature, a developer must have some mental model of what that feature is, he needn't have a mental model of the entire application.

In other words, specifications can be developed a piece at a time. Not only does this make them more tractable, it also allows them to be modified more efficiently as the customers' demands change.

In Extreme Programming (XP), the required functionality of a system is specified incrementally through the use of stories. Each story briefly describes one aspect of the system's behavior. For example, here is a story that might be included in the specification of a Java IDE:

As the user types words into the editor, occurrences of Java keywords are automatically colored blue. String and character literals are colored green, and comments are colored red.

Believe it or not, that's actually a hard story to get right for the Java language partly because of some peculiar properties of block comments. Depending on the velocity of the development team, it may be advisable to break that story up into two or more smaller ones.

But notice that this story is small, written in simple, clear language. That makes it easy to split up when necessary and it prevents coupling between the parts of a specification.

Additionally, because the stories are small, they can be updated and new ones added without completely overhauling the specification. For this reason, stories work particularly well in an industrial setting in which the requirements for the final product often change while it is being implemented.


No, unit tests aren't specification either

Before we leave the topic of specifications, there is one last issue I'd like to discuss concerning unit tests. In XP, unit tests are a way of life. Programmers start writing them before they write any of the implementation at all, and they continue writing more unit tests for each new aspect of the functionality.

A rigorous suite of unit tests laid over a software project provides two tremendous advantages:

  • They can be documentational.
  • They expedite the process of refactoring.

The unit tests, like static types, are an executable form of documentation. Because the unit tests ideally cover every aspect of the implementation, and because they invoke the functionality in simple ways to make sure it is working, it is easy for a programmer who is joining a project or starting to maintain some code to read through the unit tests to determine what the various functional components do.

Many people, when first hearing of the idea that a unit test can be documentation, are skeptical: "How can you write documentation for a program in the same language that the program is written in?" But this question misses the point of code documentation.

Code should never be documented to explain what the code is doing. The code itself already does that. Instead, documentation should explain why a block of code is doing what it does. Anyone reading the code should be familiar already with the language the code is written in -- if they are not, then documentation in any language is unlikely to help.

But it is not always clear how a block of code interacts with the rest of a program, and that's what documentation is for. Because the reader of the code is (or should be) familiar with the language it is written in, then it is perfectly valid to explain the intention behind the code in the same language as the code.

Also, unit tests expedite the process of refactoring. When a suite of unit tests can be run over the code at any time to determine if any of the functionality has been broken, programmers can refactor the code with much more confidence than they would have otherwise. The vast majority of bugs introduced can be immediately detected.

For these reasons, unit tests are a huge win when trying to write robust software. In fact, because the unit tests serve as a form of documentation and can be automatically enforced (by including them in the build process), it is tempting to suggest that the tests themselves should be used as the specification for a system.

It is reasonable for tests to form part of a specification, in the sense that we may require any valid implementation to pass all of the system-level tests. But using the unit tests to form the whole specification has some serious disadvantages.

For one thing, the set of tests over a system are inevitably incomplete. No matter how many tests we specify over a system, there will always be more inputs and states of the system than we could ever represent. We could interpret the tests as specifying their "most reasonable" extension, but such an extension will often be ambiguous.

Additionally, unit tests by their very nature enforce properties of a particular implementation. There will always be more ways than one that a system could be implemented. So, using unit tests as a specification has many of the same disadvantages as using a particular implementation as a specification.

Therefore, we are best advised to view tests as an augmentation to a specification, but not as the entirety of the specification.


Benefits can outweigh the costs

I hope the conclusion you've reached with me is that it is necessary to have as precise a specification as possible when designing software systems and can see where the extreme programming model of modular construction of the specification can balance the cost and benefits of crafting this definition.

Also, I hope I've demonstrated sufficiently the pitfalls of mistaking the implementation for the specification, as well as the problem with relying too completely on the wonderful concept of unit testing to finish off your spec definition.


Resources

  • Check out the Extreme Programming site for a summary of the ideas behind XP.



  • Roy Miller and Chris Collins offer a cogent explanation of Extreme Programming in their article "XP distilled" (developerWorks, March 2001).



  • Software migration is a process of taking an application developed in one language or platform and moving it to another language or platform. "Migrating to IBM WebSphere Application Server" is a multi-part article that addresses designing for change. The first installment, "Designing Software for Change" (WebSphere Developer Domain, July 2001), discusses some software development best practices to mitigate the expense of code migration. The second installment, "Stages of Migration" (WebSphere Developer Domain, October 2001), presents a general plan of attack for effecting migration.



  • "The Go-ForIt Chronicles: Memoirs of eXtreme DragonSlayers" tells the story of the DragonSlayers, a group of top IBM technical consultants who set out to slay a formidable dragon -- a full-blown Web application with all the bells and whistles. They designed, developed, and tested an application that incorporated the latest technologies and used IBM tools and products.



  • Download a copy of Jython, an implementation of the dynamic, object-oriented Python that is written in and seamlessly integrates with the Java language and platform.



  • Retrieve your own version of the Extended ML specification here, in PDF format; Extended ML is a framework for specification and formal development of Standard ML programs.



  • The official JUnit Web site provides a wealth of resources for developers implementing unit tests in the Java language.



  • Read all of Eric Allen's Diagnosing Java code columns.



  • Find more Java resources, including many articles on the topic of Extreme Programming, at the developerWorks Java technology zone.

About the author

Eric Allen has a bachelor's degree in computer science and mathematics from Cornell University and is a PhD candidate in the Java programming languages team at Rice University. Before returning to Rice to finish his degree, Eric was the lead Java software developer at Cycorp, Inc. He has also moderated the Java Beginner discussion forum at JavaWorld. His research concerns the development of semantic models and static analysis tools for the Java language, both at the source and bytecode levels. Eric has also helped in the development of Rice's compiler for the NextGen programming language, an extension of the Java language with generic run-time types. Contact Eric at eallen@cs.rice.edu.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=10631
ArticleTitle=Diagnosing Java code: Walking the specification tightrope
publish-date=02012002
author1-email=eallen@cs.rice.edu
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers