In a recent installment of the Crossing borders series, my friend and colleague Bruce Tate wrote about the power of closures using Ruby to illustrate. At the recent JavaPolis conference in Antwerp, one of the most well-attended sessions was Neal Gafter's "Adding closures to the Java language." And on the public white boards at JavaPolis, where attendees could write their thoughts about anything related to (or unrelated to) Java technology, nearly half the space was dedicated to the closures debate. It seems like everyone in the Java community is talking about closures these days -- even though closures were a well-developed concept more than 20 years before the development of the Java language.
In this article, my goal is to provide you with a 10,000-foot view of the debate over closures in the Java language. This article starts with an overview of closures, details their application, and then summarizes the competing proposals currently on the table.
A closure is a block of code that may contain free (unbound) variables; these variables are not defined in the block of code or in any global context but instead in the environment in which the block of code is defined. The name "closure" comes from the combination of the block of code to execute (which, by virtue of free variables, is not closed with respect to variable reference) with an evaluation environment (the scope) that supplies bindings for the free variables. Varying degrees of closure support can be found in Scheme, Common Lisp, Smalltalk, Groovy, JavaScript, Ruby, and Python, among others.
The value of closures is that they can serve as function objects or anonymous functions, and this has consequences for the type system in that the type system must be able to represent not only data but code as well. Most languages with closures support functions as first-class objects, meaning that functions can be stored in variables, passed as parameters to other functions, and -- most importantly -- created dynamically and returned from functions. Consider this example in Scheme (from SICP 3.3.3), shown in Listing 1:
Listing 1. Example from the Scheme programming language of a function that takes a function as argument and returns a memoized version of that function
(define (memoize f)
(let ((table (make-table)))
(lambda (x)
(let ((previously-computed-result (lookup x table)))
(if (not (null? previously-computed-result))
previously-computed-result
(let ((result (f x)))
(insert! x result table)
result))))))
|
This code defines a function called memoize
that takes as its argument a function f and returns another
function that computes the same function as f but caches
previously computed results in a table so they can be returned more
efficiently. The returned function is created using the lambda
construct, which dynamically creates a new function object. The
identifiers in italics are the ones that are free in the newly defined
function; their values are bound in the environment that creates the
function. For example, the table variable, used to store cached
values, is created when memoize is invoked, and because it is
referenced by the newly created function, is not garbage collected
until the resulting function is collected. When the resulting function
is called with argument x, it first looks to see if it has
already computed f(x). If it has, it returns the known
f(x); otherwise, it computes f(x) and caches it in
the table for future use before returning it.
Closures provide a compact, natural way to create and manipulate parameterized computations. You can think of closure support as providing the ability to treat "blocks of code" as first class objects: pass them around, invoke them, and dynamically create new ones. To support closures fully, a language needs to provide support for manipulating, invoking, and creating functions at run time and for functions to capture the environment in which they were created. Many languages provide a subset of these features, which may enable some, though not all, of the benefit of closures. In the debate over whether to add closures to the Java language, the key question is whether the benefit of the incremental expressiveness is worth the cost of the incremental complexity.
Anonymous classes and function pointers
The C language provides function pointers, which allow you to pass functions as arguments to other functions. However, functions in C cannot have free variables; all variables must be known at compile time, which reduces the expressiveness of function pointers as an abstractive mechanism.
The Java language provides inner classes, which can contain references to fields of their enclosing object. This feature is richer than function pointers because it allows the inner class instance to retain a reference to the environment in which it was created. Indeed, at first glance, inner classes seem to provide most, if not all, of the value of closures; one could easily construct an interface called UnaryFunction and create a memoizing wrapper that can memoize any unary function. But this approach is often unwieldy in practice; it forces all the code that interacts with your function to be written with an awareness of this function "framework."
Anonymous classes let you create objects that capture part of the
environment in which they were defined, but objects and blocks of code
are not the same thing. As an example, consider any repetitive coding
pattern, such as executing a block of code with a Lock held. If we
want to increment a counter with a Lock held, the code looks like
Listing 2 -- frustratingly verbose for such a simple operation:
Listing 2. Canonical idiom for executing a block of code with a lock held
lock.lock();
try {
++counter;
}
finally {
lock.unlock();
}
|
It would be nice to abstract out the lock management
code, which would make the code more compact and less error-prone. As
a first attempt, you might create a withLock() method like Listing 3:
Listing 3. Attempt at abstracting the concept of "do this while holding lock," which suffers from a lack of exception transparency
public static void withLock(Lock lock, Runnable r) {
lock.lock();
try {
r.run();
}
finally {
lock.unlock();
}
}
|
Unfortunately, this approach only takes you part of the way to where you want to be. One of the goals of creating this abstraction was to make the code more compact; unfortunately, the syntax of anonymous inner classes is not very compact, and the calling code will look something like Listing 4:
Listing 4. Client code for the withLock() method in Listing 3
withLock(lock,
new Runnable() {
public void run() {
++counter;
}
});
|
That's still a lot of code just to increment a counter with a lock
held! Further, the abstraction barrier introduced by turning the
block of code that is guarded by a lock into a method invocation
complicates matters severely -- what happens if the guarded block of
code might throw a checked exception? Now we cannot use Runnable as
our task representation; we must create a new representation that
allows exceptions to be thrown across the method
invocation. Unfortunately, generics don't help us very much here
either; while you can create a method that has a generic type
parameter E identifying a checked exception it might throw, this
approach doesn't generalize well to methods that throw more than one
checked exception type (which is why the call() method in Callable is
declared to throw Exception, rather than a type specified by a type
parameter). The approach in Listing 3 is hampered by a lack of
exception transparency. It also suffers from other forms of
nontransparency; statements like return or break would mean something
different in the context of the Runnable in Listing 4 than they would
in the try block in Listing 2.
Ideally, you'd like to be able to have your guarded increment operation look something like Listing 5 and have the code in the block mean the same as it would in the expanded form from Listing 2:
Listing 5. Ideal (but hypothetical) form for client code for Listing 3
withLock(lock,
{ ++counter; });
|
By adding closures to the language, it becomes possible to create
methods that can behave like control flow constructs, such as "execute
this code with a lock held," "operate on this stream and close it when
you're done," or "time how long it takes to execute this block of
code." This strategy has the potential to simplify certain types of
code that repeatedly use particular coding patterns or idioms, such as
the locking idiom in Listing 2. (Another technique that offered
similar expressiveness, to a certain extent, was the C preprocessor;
it is possible to express the withLock() operation as a preprocessor
macro, though macros are harder to compose and less safe than
closures.)
Closures for generic algorithms
Another place where closures offer an opportunity to dramatically simplify code is in the use of generic algorithms. As multiprocessor machines become cheaper, it becomes more important to exploit fine-grained parallelism. Specifying computations using generic algorithms provides a natural opportunity for library implementations to exploit parallelism in the problem space.
For example, consider calculating the sum of the squares of a large collection of numbers. Listing 6 shows one way to perform this calculation, but this approach calculates the results sequentially and might not be the most efficient way to calculate the desired result on large-scale multiprocessor systems:
Listing 6. Calculating the sum of squares sequentially
double sum;
for (Double d : myBigCollection)
sum += d*d;
|
Each loop iteration has two operations: squaring a value and adding that value to a running total. Each of the squaring operations is independent of each other and could be executed in parallel, and instead of executing N addition operations, if the computation is organized properly, the sum can be calculated in log(N) operations.
The operation in Listing 6 is an example of the map-reduce algorithm, where a function is applied to each of a large number of data elements and the result from each application is combined with some sort of combining function. If you imagine you have a map-reduce implementation that takes a data set, a unary function that is applied to each element, and a binary function for combining the results, you could express the sum-of-squares calculation as in Listing 7:
Listing 7. Calculating the sum of squares with MapReduce, allowing for the possibility of parallel execution
Double sumOfSquares = mapReduce(myBigCollection,
new UnaryFunction<Double> {
public Double apply(Double x) {
return x * x;
}
},
new BinaryFunction<Double, Double> {
public Double apply(Double x, Double y) {
return x + y;
}
});
|
The mapReduce() implementation assumed in Listing 7 knows which
operations can be evaluated in parallel and could therefore
parallelize both the function application and combining steps,
resulting in improved throughput on a parallel system. But the code in
Listing 7 is really ugly; it takes many more lines of code to
express the generic algorithm equivalent of the three lines in Listing
6.
Closures give you a way to make the code in Listing 7 more manageable. For example, the closure syntax in Listing 8 does not correspond to any of the current closure proposals for the Java language, but instead it is intended to convey the spirit of how closures support generic algorithms:
Listing 8. Calculating the sum of squares using MapReduce and a hypothetical closure syntax
sumOfSquares = mapReduce(myBigCollection,
function(x) {x * x},
function(x, y) {x + y});
|
The closure-based version in Listing 8 has the best of both worlds: The code is easy to read and write, it is specified at a higher level of abstraction than the sequential loop, and it can be efficiently parallelized by the library.
At least two proposals are on the table for adding closures to the Java language. The first, dubbed "BGGA" (for its authors, Gilad Bracha, Neal Gafter, James Gosling, and Peter von der Ahe), extends the type system to incorporate function types. The second, dubbed "CICE" (for Concise Inner Class Expressions) and supported by Joshua Bloch, Doug Lea, and "Crazy Bob" Lee, has a more modest goal: simplify the creation of anonymous inner class instances. It is likely that a JSR will soon be proposed to consider the form and degree of closure support to be suggested for a future version of the Java language.
The BGGA proposal creates a concept of function types, where a function has a typed argument list, a return type, and a throws clause. In the BGGA proposal, the sum-of-squares code would look like the code in Listing 9:
Listing 9. Calculating the sum of squares using the BGGA closure syntax
sumOfSquares = mapReduce(myBigCollection,
{ Double x => x * x },
{ Double x, Double y => x + y });
|
The code inside the braces to the left of the => symbol identifies the
names and types of the arguments; the code to the right represents the
implementation of the anonymous function being defined. This code can
refer to local variables defined within the block, arguments to the
closure, or variables from the scope in which the closure is created.
In the BGGA proposal, you can declare variables, method arguments, and
method return values that are function types. You can supply a closure in any context where an instance of a single abstract method
class (like Runnable or Callable) is expected; for anonymously typed
closures, an invoke() method is provided so you can call them with a
specified argument list.
One of the primary goals of the BGGA proposal is to allow programmers
to create methods that act like control structures. Accordingly, BGGA
also proposes some syntactic sugar to allow you to call methods that
accept closures as if they were new keywords so that you can create
methods like withLock() or forEach() and invoke them as if they were
control primitives. Listing 10 shows how the withLock() method would
be defined under the BGGA proposal; Listing 11 and Listing 12
show how it would be invoked, using both the standard form and the "control construct" form:
Listing 10. Coding the withLock() method under the BGGA closures proposal
public static <T,throws E extends Exception>
T withLock(Lock lock, {=>T throws E} block) throws E {
lock.lock();
try {
return block.invoke();
} finally {
lock.unlock();
}
}
|
The withLock() method in Listing 10 accepts a lock and a closure. The
return type and throws clause of the closure are generic arguments;
type inference in the compiler generally allows it to be invoked
without specifying the value of T and E, as in Listing 11 and Listing 12:
Listing 11. Invoking withLock()
withLock(lock, {=>
System.out.println("hello");
});
|
Listing 12. Invoking withLock() using the control construct shorthand
withLock(lock) {
System.out.println("hello");
}
|
Like generics, much of the complexity of closures under the BGGA proposal is borne by library creators; using library methods that accept closures is much simpler.
The BGGA proposal also works to repair a number of transparency
failures that are present when trying to use inner class instances to
gain the benefits of closures. For example, the semantics of return,
break, and this are different in a block of code than they are in a
Runnable (or other inner class instance) that represents the same
block of code. These elements of nontransparency can cause confusion
when migrating code to take advantage of generic algorithms.
The CICE proposal is a simpler proposal that addresses the problem
that instantiating inner class instances is too cumbersome. Rather
than create a notion of function types, it simply creates a more
compact syntax for instantiating instances of inner class with a
single abstract method (such as Runnable, Callable, or Comparator).
Listing 13 shows what the sum of squares code would look like under
CICE. It makes explicit the UnaryFunction and
BinaryFunction types used by
mapReduce(). The arguments to mapReduce()
are anonymous classes derived from UnaryFunction and
BinaryFunction; the syntax simply elides much of the
redundancy associated with creating an anonymous instance.
Listing 13. Sum of squares code under the CICE closures proposal
Double sumOfSquares = mapReduce(myBigCollection,
UnaryFunction<Double>(Double x) { return x*x; },
BinaryFunction<Double, Double>(Double x, Double y) { return x+y; });
|
Because the objects representing the functions passed to
mapReduce() are ordinary anonymous class instances, their
bodies can refer to variables defined in the enclosing scope; the
only difference between the approaches in Listing 13 and Listing 7 is
the verbosity of the syntax.
The BGGA proposal adds a powerful new facility to the language. As such, it adds measurable complexity to the syntax and semantics of the language. The CICE proposal, on the other hand, is more modest: it takes a facility already present in the language and makes it easier to use but does not offer any significant new functionality. Closures are indeed a powerful mechanism for abstraction; once they get used to them, most people don't want to give them up. (Ask friends who are experienced Scheme, Smalltalk, or Ruby programmers how they feel about closures, and they'll probably respond by asking you how you feel about breathing.) But languages are organic entities, and adding a new feature to a language that was not anticipated in the original design is fraught with compromise and creates complexity. The issue being debated is not whether closures are a good idea -- because they clearly are -- but whether the benefits of retrofitting the Java language with closures are worth the costs.
Learn
-
BGGA closures proposal: Extending the type system to incorporate function types.
-
Closures for Java: Neal Gafter's presentation on closures from JavaPolis 2006.
-
Neal's blog: More about closures from Neal Gafter.
-
Concise Instance Creation Expressions: Closures without Complexity: CICE closures proposal.
-
Structure and Interpretation of Computer Programs (SICP)
: Source of the example in Listing 1.
-
Closures: More on closures from Wikipedia.
-
MartinFowler.com: Martin Fowler's opinion on why closures are an important language feature.
-
The Java technology zone: Hundreds of articles about every aspect of Java programming.
Discuss
-
developerWorks blogs: Get involved in
the developerWorks community.
Brian Goetz has been a professional software developer for 20 years. He is a senior staff engineer at Sun Microsystems, and he serves on several JCP Expert Groups. Brian's book, Java Concurrency In Practice, was published in May 2006 by Addison-Wesley. See Brian's published and upcoming articles in popular industry publications.





