Language designer's notebook: Package deals

When language features invite their friends

When a significant new feature is added to a language, it is quite common that the new feature necessitates, or at least encourages, the addition of other new features as well — for better or worse. In this installment of Language designer's notebook, Brian Goetz discusses how language features invite their friends with them.

Share:

Brian Goetz, Java Language Architect, Oracle

Brian Goetz photoBrian Goetz is the Java Language Architect at Oracle and a veteran contributor to developerWorks. Brian's writings include the Java theory and practice column series published here from 2002 to 2008, and the definitive work on Java concurrency, Java Concurrency in Practice (Addison-Wesley, 2006).



25 October 2011

Also available in Chinese Japanese

About this series

Every Java™ developer probably has a few ideas about how the Java language could be improved. In this series, Java Language Architect Brian Goetz explores some of the language design issues that have presented challenges for the evolution of the Java language in Java SE 7, Java SE 8, and beyond.

Some language features, such as the new "binary integer literals" feature in Java SE 7, stand well enough by themselves. But many major language features end up needing additional features to make them work well, or to work around interactions with existing features. This is potentially a problem, because adding big language features is already risky, and dragging additional features along for the ride only ups the risk.

Increased pressure for convenience features

One vector by which one language feature invites another is when a new feature increases the pressure to add an unrelated "convenience feature." Consider autoboxing, added in Java SE 5. Java had primitive types (for example, int) and object "box" wrapper classes (for example, Integer) for those types from day one, along with methods for converting between them. The autoboxing feature — an implicit conversion between a primitive type and its corresponding wrapper class — is one that could have been added at any time since the beginning, and there were some calls for it prior to Java SE 5. But it was the addition of generics, and the attendant generification of collections, that finally generated enough pressure to add this convenience feature. The situation in which you needed to convert between primitives and their wrappers had existed before, but the generic collections made this situation far more common, because it was now convenient to create collections whose keys or values were boxed primitives. What had been a minor inconvenience before became a major inconvenience, and the pressure to add autoboxing increased.

A similar example is the relationship between enums and static imports, also added in Java SE 5. Static imports were another convenience feature waiting to happen; having to say Math.PI instead of just PI was always annoying. However, it was the addition of the enum feature in Java SE 5 that generated increased pressure to justify static imports. Enums made it easier to create named structured constants — such as Color.RED, Color.BLUE, and so on — and when you make something easier, you get more of it. Whereas before only a few static system-defined constants were available, enums opened the door to user-created static constants — making the annoyance of needing to qualify the name on every use (instead of just saying RED or BLUE) that much greater. So although static import was a stand-alone feature that could have been added long before enums were added, enums created sufficient additional pressure to justify adding static imports.

Often this is not a problem, but these convenience features can have their dark sides. For example, autoboxing interacts particularly poorly with the ternary conditional operator, and it can cause NullPointerExceptions to be thrown from code that does not appear to deal with object references at all.


LinQ: Six features for the price of one

Perhaps the best example of a language feature that invited a whole pile of friends to move in was the LinQ (Language-Integrated Query) facility, the central improvement added in .NET 3.0. LinQ enables developers to embed object-valued queries directly in their code. These queries can operate not only against databases, but also against other data providers such as XML documents or in-memory collections. The idea of embedding a query language into a general-purpose programming language seems straightforward, but once you start digging, you discover that a lot of additional features are required to pull it off.

This C# code illustrates a typical LinQ query against a collection. It takes a collection of Person objects and selects and prints the first and last names of people younger than 18:

var results = 
    from p in people
    where p.Age < 18
    select new {p.FirstName, p.LastName};   

foreach (var r in results) {           
    Console.WriteLine(r.FirstName + " " + r.LastName); 
}

A lot of things must come together for this to work. What is the result type of this query? It is not a collection of Person objects, because the query asks only for the first- and last-name properties. Instead, it is a collection of a class with properties only for the first and last name; the compiler generates this class based on the selected fields of the query. For this, .NET introduced anonymous classes; otherwise, the developer would have to create a new named class type for the result of nearly every query.

LinQ also requires support for lambda expressions (closures), though this is not obvious from the previous query example. The implementation strategy for LinQ involves rewriting the query into API calls against a provider. (This provider API is how queries can work against disparate data sources.) The compiler rewrites the query as:

var results = 
    people.Where(p => p.Age < 18)
          .Select(p => new {p.FirstName, p.LastName});

The Where() method takes a predicate that determines whether the given element should be selected, and it produces a stream of elements that pass the filter. Then, for each selected element, the Select() method maps the element to a new instance of an anonymous class containing only properties for the first and last name.

But we're not done yet. If the data provider is an SQL database, the WHERE clause must be applied to each record. One way to do that would be to pull all the records from the database into the application, and then test the Age property of each. But this is likely to be very inefficient — we would rather the WHERE clause be evaluated closer to the data. The sensible thing to do is to send the WHERE clause to the database, but this means that we would have to translate the predicate p.Age < 18 into SQL and send that to the database instead.

The LinQ solution for this problem is expression trees, a reflection-like mechanism whereby you can reflect over not only a class's members, but also a method's code. This allows the SQL LinQ provider to analyze the closure passed to the Where() method and translate it into SQL.

Translating queries into calls to an API also required the addition of extension methods. The Where() method is being invoked on a collection object, but it is not a member of the collection framework. Instead, it is a static method defined by the LinQ subsystem and injected into IEnumerable (the .NET equivalent of Java's Iterable). Otherwise, one would not have been able to express LinQ queries against collections as easily.

Finally, we need implicitly typed variables, so that we can assign the query to a variable without having to declare its type explicitly. Because the result of the query is an IEnumerable of some anonymous type, the result's type is known by the compiler but not denotable in C#. (Alternately, for some queries, the result type is denotable but inconveniently verbose to write down.) This is why C# supports declaring variables with var, which lets the compiler figure out the type instead of demanding the user spell it out — not so much because it wants to encourage the programmer to be lazy, but because sometimes there's no way to write down the type.

This was a long way to go! What started out as a simple-sounding goal — embedding queries in a general-purpose language — ended up requiring anonymous classes, implicitly typed variables, closures, extension methods, and a reflection mechanism for expressions. Each of these features was critical to one of the key goals for LinQ.

As a user, one might conclude that this is a wonderful deal: six features for the price of one. But adding new language features to an existing language always has a cost. When you add language feature B to support language feature A, there's no requirement that B only be used with A. B might not be so desirable by itself, or it might interact badly with other language features. Clearly, there was a specific goal for adding A, such as making the language safer or more expressive. But to evaluate success, you need to evaluate that goal not relative to A by itself, but to the resulting new language — including all the friends that A invited along with it. If that's not the language you wanted to end up with, you might need to rethink the initial feature.


Lambda expressions in Java

The central language improvement for Java SE 8 is lambda expressions, or closures. But just as with LinQ in .NET, lambda expressions need to drag a number of additional features along with them — including SAM conversion, enhanced type inference, method references, and extension methods — in order to give users the full benefit.

Because lambda expressions — expressions representing functions — are a new sort of value in Java, we need a way to write down their type. Early proposals for lambda expressions in Java called for adding function types to the type system, such as "function from int to int." Function types are indeed the natural way to represent the type of a lambda expression, but unfortunately they interact badly with an existing language "feature": erasure. Because the natural way to represent a function type in the underlying bytecode would be to use generics, primitive types in function signatures would be boxed, and one would not be able to overload multiple methods that took function types, even if their arguments were completely different. Function types might be the natural way to express the type of a lambda, but erased function types are not.

So instead of function types, lambda expressions in Java SE 8 will bring along a different friend, SAM conversion. SAM (Single Abstract Method) types are how we have represented functions in the Java language all along — interfaces with one method, like Runnable, Comparator, or ActionListener. If we build APIs with SAM types (many of which already exist in libraries), the compiler can convert between a lambda expression (which is like a function literal) and a SAM type whose argument types, return type, and exception types are compatible with the lambda expression. For example, the following code declares a Comparator<String> that compares strings by length and uses a lambda expression to define the Comparator:.

Comparator<String> c 
    = (String a, String b) -> a.length() — b.length();

Because the lambda expression has the right argument and return types, the compiler verifies that it can be converted into a Comparator<String> and generates the appropriate code for doing so. This is called SAM conversion.

The primary rationale for lambda expressions is to have a way to express code as data, so that code literals can be passed to libraries that will invoke them at convenient times. Another motivation is to reduce the verbosity of inner classes, which are currently the closest way to get this effect. Once you start down the path of eliminating redundant syntactic constructs, you often want to keep going down that path. So lambda expressions bring with them another friend — expanded type inference through target typing. Because the previous lambda expression is being assigned to a Comparator<String>, the explicit declaration that a and b are of type String is redundant — the compiler can usually figure this out for us. By using the type of the assignment context to infer the types of a and b, we can reduce this example to:

Comparator<String> c 
    = (a, b) -> a.length() — b.length();

If we had a collection of Person objects, and we wanted to sort that list by the last name of the Person, we would write this today as:

Collections.sort(people, new Comparator<Person>() { 
    Public int compare(Person a, Person b) { 
        return a.getLastName().compareTo(b.getLastName());
    }
}

Using a lambda expression, we can make this more compact:

Collections.sort(people, (a, b) -> 
    a.getLastName().compareTo(b.getLastName());

This is a big step forward in reducing verbosity but is still not any more abstract — it still forces the user to calculate the comparison function imperatively. With some small changes in the libraries, we can make better use of lambda expressions to separate out the central aspect of sorting — selection of a sort key. Because String is Comparable, the sort method should already know how to do the comparison after the sort key is extracted:

Collections.sortBy(people, p -> p.getLastName());

This is definitely getting better — the code is starting to read more like the problem statement "sort people by last name." But, as we strip away the boilerplate, we realize that the idiom used for extracting the sort key — here, the last name — is in itself somewhat convoluted. The lambda expression above does nothing but take its arguments (in this case, none) and pass them to an existing method, getLastName(), treating the first argument as the object on which to invoke the method. Although in this case it doesn't look so bad — because there are no extra arguments that would have to be given names (and these names repeated twice) — it would look much nicer just to name the method directly. A related feature, method references, lets us do this — refer to a method by name and treat it as a function-valued datum just like a lambda expression:

Collections.sortBy(people, Person::getLastName);

Finally, now that the boilerplate has been stripped away, it is even more obvious than it was before that the sortBy() method really shouldn't be a static method in some utility class, but instead an instance method on the collection. But, one of the unfortunate properties of interfaces is that once we specify them, we cannot add new methods without breaking existing implementations. The final feature being introduced along with lambda expressions is virtual extension methods, which allows us to add new methods to interfaces in a compatible way, by providing an (overridable) default implementation along with the method declaration. This will let us add lambda-friendly (and potentially parallel-friendly) methods like forEach() to List. By adding an extension method to List for sortBy(), our example now looks like:

people.sortBy(Person::getLastName);

Oddly, our final version doesn't use lambdas at all! But it embodies the central goal of adding lambda expressions to the language — the ability to capture portions of a computation such that they can be passed around as data, enabling us to more richly parameterize library functionality such as sorting. In this particular example, a method reference is a clearer expression of what we mean than a lambda expression, but the idea is the same.

It is entirely possible that lambda expressions could have been added to Java without SAM conversion, type inference, method references, and extension methods. However, it is likely that lack of those features would have eventually become pain points — perhaps without us even realizing the exact source of the pain.


Sometimes it goes the other way

The reasoning for not adding function types to Java can be characterized as not wanting — or maybe not being able to afford — an additional language feature that insisted on coming along for the ride. Although function types would be the natural way to express the type of a lambda expression, and would reduce the need for a lot of nominal types like Predicate and Mapper, the interaction with erasure is just too unpleasant. The obvious response is to get rid of the badly interacting feature, by inviting another friend along — reification. There are pros and cons of adding reified generics to the Java language, but the reality is that it was impractical to add something as huge and far-reaching (affecting the language, compiler, and libraries) as reification at the same time as adding lambda expressions. So given that we couldn't afford to put up its friend, we had to say goodbye — at least for now — to function types as well.


Conclusion

Most big language features don't stand entirely on their own — they almost always need to invite their friends to get the full benefit that they were intended to achieve. When considering adding a feature that has friends, we have to consider carefully whether we really want to invite those friends along for the ride — because we're going to be stuck with them. If we can't live with the feature's friends, we probably have to conclude we don't want the feature either.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=767551
ArticleTitle=Language designer's notebook: Package deals
publish-date=10252011