Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Java theory and practice: Enable initialization atomicity

The self-return idiom makes for more usable API design

Brian Goetz (brian@quiotix.com), Principal Consultant, Quiotix
Brian Goetz has been a professional software developer for over 18 years. He is a Principal Consultant at Quiotix, a software development and consulting firm located in Los Altos, California, and he serves on several JCP Expert Groups. See Brian's published and upcoming articles in popular industry publications.

Summary:  Decisions made during API design can have an effect on the API's usability. In designing an API, you need to put yourself in your user's shoes, imagining how the API might be used, and try and make the common use cases convenient for the user. This month, columnist Brian Goetz discusses an API design technique, the self-return idiom, that can make life easier for users of your API in certain circumstances.

View more content in this series

Date:  27 Apr 2005
Level:  Intermediate

Comments:  

In 1985 I had a summer job developing mainframe-based business applications in APL. For those who don't remember APL, it was a very terse language, requiring a special keyboard with all sorts of weird symbols, but it offered some extremely powerful mechanisms for manipulating data stored in arrays. As an example, Conway's "Life" cellular automata game can be implemented in 30-40 characters of APL code, and a program to find all prime numbers below a certain number could be written in 20 characters. (To the uninitiated, such an APL program pretty much looks like line noise.) APL jocks liked to joke that any program could be written in one line in APL. (Reading such a program, on the other hand, was not as easy.)

Macho competition and obfuscated programming contests aside, in what situations is it valuable, from a software engineering perspective, to be able to perform an arbitrary sequence of operations on an object in a single expression? In the Java™ language, several cases exist where being able to instantiate and initialize a complex object in a single expression can improve the code's readability (field initializers, method parameters). There is even one situation where it is downright inconvenient if instantiation and initialization cannot be completed in a single expression (using constructor arguments to instantiate a new object and then passing the new object to a super() or this() constructor).

Mutability: Yes, no, and sometimes

Some objects are immutable (meaning that once constructed, their state does not change), whereas others are mutable. For some immutable objects, immutability is guaranteed by the classes' implementation (such as the String class); for others, immutability is simply assumed by convention, specification, or documentation. (The virtues of immutable objects -- simplicity, safety, thread-safety -- have been extolled in this column and elsewhere, so I won't belabor them here.) Whether immutability is enforced or simply a convention for a given object, the behavior is the same -- once the object is initialized, its state is not modified again.

Some entities can be sensibly modeled as either mutable or immutable objects. The String class is immutable, but that was simply one sensible way to implement a string class. (The C++ STL implements a mutable string class, which is another sensible way to implement String.) Other objects can only be mutable -- for example, it would make no sense for a counter to be immutable. Strictly defined, a mutable object is any object whose state can be changed in an observable way after construction. But by defining mutability only in terms of whether state might ever change, we can miss out on some important object lifecycle distinctions.

Mutability lifecycles

Some objects are mutated throughout their lifecycle -- such as counters or other status-holding objects. Others are initialized to a desired state, perhaps through a series of calls to setters or other mutative methods, and then not modified again until they are garbage collected. Strictly speaking, such objects are mutable, because their state was not set entirely in the constructor and can be mutated, but from the perspective of the program that uses them, they might as well be immutable. Take, for instance, a Properties object that is used by an application to hold the contents of a properties file for configuration purposes. Early in the application, the Properties object is instantiated and loaded with values from the file, but thereafter, it is not modified again. The lifecycle of this Properties object has two phases -- a phase where it is being initialized (treated as mutable) and a phase where it is being used (treated as immutable). Typically, a Properties object used in this manner is not published to the rest of the application until the first phase is complete. So, from the perspective of the rest of the application, the Properties object might as well be immutable.

This phase-change behavior is quite common. Some classes, such as SimpleDateFormat, tend to be used in such a two-phase manner almost exclusively -- once the formatting options are set, the formatter may be used many times, but the settings tend not to be changed after the formatter is "fully initialized." Other classes, such as Properties or HashMap, are sometimes used in this two-phase manner, but frequently treated as fully mutable objects as well. No established name exists to refer to objects that have this two-phase lifecycle, so I'm going to make one up: immutable-once-initialized (IOI). IOI objects generally have a lifecycle that goes something like: construct-modify-modify-modify-publish-use-use-use.


The self-return idiom

API designers can make life easier for programmers by anticipating when an object might be used in an IOI manner, and in those situations, using the "self-return idiom" to facilitate easier initialization. The self-returning idiom involves having mutator methods (setXyz() and appendFoo()) return the this reference after performing their action. The StringBuffer class illustrates the self-return idiom -- all the append() methods return a reference to the StringBuffer itself after updating the state of the internal buffer, as shown in Listing 1:


Listing 1. Self-return idiom in StringBuffer.append()

public StringBuffer append(String str) {
    // append str to the internal buffer
    return this;
}

The benefit of the self-return idiom is that it enables you to chain multiple calls together, rather than writing them each out as separate statements:

stringBuffer.append("a=").append(a)
    .append("; b=").append(b);

This code is more readable and more compact than the alternative, which involves four statements. Using the self-return idiom generally has little negative effect on the API design, as many mutative methods (setters, add() and append()) do not return a value anyway, but it can make life a lot easier for your callers. It's too bad more classes don't follow StringBuffer's lead -- it could make some classes a lot more convenient to use.

Static initialization

How many times have you wanted to statically initialize a Set with several known values, but not intended your program to modify the Set? Using the existing Collections classes, this approach would require a static initializer block, and for the collection to be initialized in a different place than it is constructed. Let's say you wanted to pre-initialize a Set with some regular expression patterns you are going to search for in a document. Using the existing API, you would have to do it like Listing 2:


Listing 2. Statically initializing a Set

private static Set<Pattern> patternSet = new HashSet<Pattern>();
static {
    s.add(Pattern.compile("\b(roast beef)\b"));
    s.add(Pattern.compile("\b(on rye)\b"));
    s.add(Pattern.compile("\b(with mustard)\b"));
}

Granted, writing a static initializer and putting it near the object's declaration is not a terrible hardship, but it is somewhat annoying, and the more that initialization and declaration are separated, the greater the chance that future modifications will subvert an intended invariant. Further, if you want to make patternSet immutable from the perspective of your program (a good practice, because it prevents subtle coding errors), you would have to instantiate a temporary Set in the static initializer block, wrap it with Collections.unmodifiableSet(), and then stuff the wrapped set back into patternSet.

In this case, it would have been nice if the Collections classes used the self-return idiom, because then we could have constructed and initialized the Set all in one place. But we can still build an adapter that does what we want. Listing 3 shows an adapter class that simplifies the process of initializing a Set:


Listing 3. Set adapter class that adds self-returning append() methods.

public class SetAdapter<T> implements Set<T> {
    private final Set<T> s; 
    public SelfReturnSetAdapter(Set<T> s) { this.s = s; }

    public Set<T> append(T t) { s.add(t); return this; }
    public Set<T> unmodifiableSet() { return Collections.unmodifiableSet(s); }

    // delegate other Set methods to s
}

Now, using SetAdapter, we can initialize the set of patterns more easily, and without separating the initialization of the set from the initialization of the variable. As an added bonus, we can easily "close" the set by having the last call wrap the set with an unmodifiable wrapper, and still make the patternSet variable final without introducing temporary variables. The only loss of transparency is that we cannot override the add() method to return a value, so we have to give our mutative methods different names, such as append(). Listing 4 shows patternSet initialized inline with SetAdapter instead of with a static initializer block:


Listing 4. patternSet initialized inline with SetAdapter

private final static Set<Pattern> patternSet 
    = new SetAdapter(new HashSet<Pattern>())
          .append(Pattern.compile("\b(roast beef)\b"))
          .append(Pattern.compile("\b(on rye)\b"))
          .append(Pattern.compile("\b(with mustard)\b"))
          .unmodifiableSet();

Instantiating DOM documents

If the designers of the DOM API understood this concept, building representations of XML documents would be a lot easier. (Sure, criticizing the DOM APIs is a bit like shooting fish in a barrel.) Suppose we want to build the following XML document, representing an article and its embedded links:

<article title="Flossing Penguins - A Dentist's Journey to the Pole"
       author="Jeremy Stringfellow, DMD"
       url="http://www.penguinfloss.com/travel/stringfellow.html">
   <link anchor="Glide Floss" url="http://www.crest.com/glide/index.jsp" />
   <link anchor="Antarctica Facts" url="http://www.cia.gov/cia/publications/factbook/geos/ay.html" />
</article>

Constructing this document with DOM would be an exercise in annoyance, involving many temporary variables. We must create a document, an article element, and two link elements, add the attributes to them, and attach the elements to their parents. Unfortunately, each of these operations must be a separate statement, as shown in Listing 5:


Listing 5. Instantiating the DOM Element

Document document = documentFactory.newDocument();
Element articleElement = document.createElement("article");
articleElement.setAttribute("title", article.getTitle());
articleElement.setAttribute("author", article.getAuthor());
articleElement.setAttribute("url", article.getURL());
        
Element linkElement = document.createElement("link");
linkElement.setAttribute("anchor", link.getAnchor());
linkElement.setAttribute("url", link.getURL());
articleElement.appendChild(linkElement);
        
linkElement = document.createElement("link");
linkElement.setAttribute("anchor", anotherLink.getAnchor());
linkElement.setAttribute("url", anotherLink.getURL());
articleElement.appendChild(linkElement);
        
document.appendChild(articleElement);

Now, suppose the DOM classes supported the self-return idiom for setAttribute() and appendChild(). Each element could be created complete in a single expression, and several temporaries could be eliminated. As a bonus, it is even possible to make the structure of the code look like the structure of the resulting document, as shown in Listing 6:


Listing 6. Instantiating the DOM Element with a fictitious, self-returning DOM API

document.appendChild(
    document.createElement("article")
        .setAttribute("title", article.getTitle())
        .setAttribute("author", article.getAuthor())
        .setAttribute("url", article.getURL())
        .appendChild(
            document.createElement("link")
                .setAttribute("anchor", link.getAnchor())
                .setAttribute("url", link.getURL()))
        .appendChild(
            document.createElement("link")
                .setAttribute("anchor", anotherLink.getAnchor())
                .setAttribute("url", anotherLink.getURL())));

While there isn't all that much less code here, were the API to work this way, entire DOM Elements could be instantiated and initialized in a single statement, which would make it slightly easier (and clearer) to create methods that return DOM Elements, or to initialize DOM Elements in variable initializers. And note that without the self-return idiom, it is impossible to pass a complete DOM Element to a super() or this() constructor without writing a helper function, because the DOM API makes it impossible to build an element in a single statement and the super() or this() constructor must be the first statement in a constructor.


Harnessing laziness

The self-return idiom can sometimes improve the readability of code, and enables you to completely initialize a logical entity in a single expression, meaning that you can eliminate temporary variables and helper functions when initializing fields or passing arguments to super() and this() constructors. But there is another benefit of the self-return idiom, which comes from co-opting laziness. By reducing the amount of work involved in using a given API, you increase the likelihood that the API will be used properly and effectively. API designers often do not give this aspect sufficient consideration -- that in many situations a developer is faced with a choice of "doing it right" or "doing it well enough." API designers should encourage developers to do things right by making APIs so easy to use that laziness will not discourage developers from using them. (DOM API designers clearly did not understand this lesson.) When designing an API, you should look for use cases where objects might be created in an IOI manner, and provide appropriate methods for building such objects easily - preferably offering users the opportunity to build them in a single expression so that they can be used in initializers or superclass constructor arguments. Similarly, think about the expected role of setters. Are they truly accessors for updating the state of mutable objects, or are they likely to be used as part of an extended construction process, as in SimpleDateFormat? If the latter, it costs nothing to have them return the this reference.

On the subject of laziness, how often do you give your classes a useful toString() implementation (before being forced to for debugging)? Writing a good toString() is certainly not hard, but laziness often prevents these methods from being written, or from updating then when fields are added to a class. Truth be told, they are annoying to write and modify, involving long string concatenations.

Listing 7 shows a simple utility class, which is a crutch for writing toString() implementations. It is a trivial class, built atop StringBuffer, which allows you to build up the toString() value, appending state variables as you go, in a single expression, using the self-return idiom.


Listing 7. ToString class

public class ToString {
    private StringBuffer sb = new StringBuffer();

    public ToString(String title) { sb.append(title).append(" "); }
    public ToString(Object o) {     this(o.getClass().getName()); }

    public ToString add(String name, String value) {
        sb.append(name).append("=\"").append(value).append("\" ");
        return this;
    }

    public ToString add(String name, Object value) {
        return add(name, value == null? "null" : value.toString());
    }

    public ToString add(String name, int value) {
        sb.append(name).append("=").append(value).append(" ");
        return this;
    }
    // name-value versions for other primitive types  

    public ToString addGroup(String name, String value) {
        sb.append(name).append("={").append(value).append("} ");
        return this;
    }

    public ToString add(String name, String[] value) {
        sb.append(name).append("=[");
        for (int i = 0; i < value.length; i++) 
            sb.append("\"").append(value[i]).append("\" ");
        sb.append("] ");
        return this;
    }

    public String toString() {
        return sb.toString();
    }
}

While the resulting toString() code is again not all that different from the by-hand implementation, it is slightly easier to read and edit (especially with IDEs such as Eclipse). The benefit is not that you save a few seconds writing toString() -- it is that laziness is less likely to inhibit the creation of a toString() method at all. Listing 8 shows a typical toString() method using both the by-hand approach and the ToString approach. (The ToString class can also be used independently of the toString() method for producing informative, structured strings to write as log messages.)


Listing 8. toString() using the by-hand and ToString approach

// by hand
public String toString() {
    return "Address " 
        + "streetAddress=" + streetAddress + " "
        + "city=" + city + " " + "state=" + state + " "
        + "zipCode=" + zipCode + " ";
}


// with ToString
public String toString() {
    return new ToString("Address")
        .add("StreetAddress", streetAddress)
        .add("city", city).add("state", state)
        .add("zipCode", zipCode)
        .toString();
}


Conclusion

The self-return idiom is not a particularly deep or revolutionary technique, but it can offer an incremental improvement to the usability of an API. When objects are used in an immutable-once-initialized manner, it is extremely convenient to be able to declare and initialize them in a single statement or expression. While it is possible to work around the limitations of a class that does not permit initialization atomicity through initializer blocks and helper functions, the readability of the code can suffer, and often needlessly so. When writing APIs, think about the likely mutability lifecycle of the object, and consider whether the self-return idiom might make life easier for your callers.


Resources

About the author

Brian Goetz has been a professional software developer for over 18 years. He is a Principal Consultant at Quiotix, a software development and consulting firm located in Los Altos, California, and he serves on several JCP Expert Groups. See Brian's published and upcoming articles in popular industry publications.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=75727
ArticleTitle=Java theory and practice: Enable initialization atomicity
publish-date=04272005
author1-email=brian@quiotix.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).