Java.next: Extension without inheritance, Part 2

Explore Clojure protocols

The Java™ language suffers from intentional limitations in its extension mechanisms, relying primarily on inheritance and interfaces. Groovy, Scala, and Clojure offer many more extension alternatives. This installment further explores Clojure's use of protocols as an extension mechanism.

Share:

Neal Ford, Director / Software Architect / Meme Wrangler, ThoughtWorks Inc.

Neal FordNeal Ford is Director, Software Architect, and Meme Wrangler at ThoughtWorks, a global IT consultancy. He is also the designer and developer of applications, instructional materials, magazine articles, courseware, and video/DVD presentations, and he is the author or editor of books spanning a variety of technologies, including the most recent Presentation Patterns. He focuses on designing and building large-scale enterprise applications. He is also an internationally acclaimed speaker at developer conferences worldwide. Check out his website.



30 July 2013

Also available in Chinese Russian Japanese

About this series

The Java legacy will be the platform, not the language. More than 200 languages run on the JVM, and it's inevitable that one of them will eventually supplant the Java language as the best way to program the JVM. This series explores three next-generation JVM languages: Groovy, Scala, and Clojure, comparing and contrasting new capabilities and paradigms, to provide Java developers a glimpse into their own near future.

"Extension without inheritance, Part 1" focuses on mechanisms in Groovy, Scala, and Clojure that add new methods to existing classes — one of the many ways in which the Java.next languages enable extension without inheritance. This article explores how Clojure's protocols extend Java extension capabilities in novel ways and offer elegant solutions to the Expression Problem.

Although this installment primarily concerns extensibility, it also touches on some Clojure features that allow Clojure and Java code to interoperate seamlessly. The two languages are distinctly different at their cores (Java being imperative and object-oriented, and Clojure being functional), yet Clojure implements several conveniences that enable it to work with Java constructs with minimal friction.

Clojure protocols revisited

Protocols are an important part of the Clojure ecosystem. The last installment demonstrates the use of protocols as a way to add methods to existing classes. Protocols also help Clojure to mimic many of the familiar features of object-oriented languages. For example, Clojure mimics object-oriented classes — combinations of data and methods — by binding records and functions together through protocols. To understand the interaction between protocols and records, I must first discuss maps, the core data structure that underlies records in Clojure.

Maps and records

In Clojure, a map is a collection of name-value pairs (a familiar concept in other languages). The read–eval–print loop (REPL) interaction in Listing 1, for example, starts by creating a map that contains information about the Clojure programming language:

Listing 1. Interacting with a Clojure map
user=> (def language {:name "Clojure" :designer "Hickey" })
#'user/language
user=> (get language :name)
"Clojure"
user=> (:name language)
"Clojure"
user=> (:designer language)
"Hickey"

Clojure uses maps heavily, so it includes special syntactic sugar to make interacting with them easier. To retrieve the value that's associated with a key, you can use the familiar (get ) function. But Clojure tries to make such common operations less verbose.

In the Java environment, the source code of the language isn't a native data structure; it must be parsed and translated. In Clojure (and other Lisp variants), the source code representation is a native data structure, such as a list — which helps to explain the language's odd syntax. When the Lisp interpreter reads a list as source code, it tries to translate the first element of the list into something callable, such as a function. So in Listing 1, the (:name language) expression returns the same result as the (get language :name) expression. Clojure offers this syntactic sugar because retrieving items from a map is a common operation.

Moreover, in Clojure several constructs can inhabit the function-call slot, thereby extending callability— the ability to called like a function. Java programs can invoke only methods and in-built language statements. Listing 1 illustrates that map keys, such as (:name language), can act as function calls in Clojure. Maps themselves are also callable; you can use the alternate syntax (language :name) if you think it's more readable. Clojure's extensive callability graph makes the language easier to use, cutting down on repetitive syntax (such as the omnipresent get and set in the world of Java programs).

However, maps don't emulate JVM classes in their entirety. Clojure provides other ways to help you model problems that include both data and behavior — and that integrate more seamlessly with the underlying JVM. You can create several constructs, including types and records, that correspond with varying degrees of completeness to similar underlying JVM classes. You create a type — traditionally used to model mechanical structures — with (deftype ). For example, if you needed a data type to hold XML, you'd likely use (deftype MyXMLStructure) for the mechanics of extracting data that is embedded within XML. Records are used idiomatically in Clojure for data — as a record of information that's core to the application's purpose. To support this usage, Clojure automatically includes a slew of interfaces in the underlying record definition that include features such as callability. The REPL interaction in Listing 2 demonstrates a record's underlying classes and superclasses:

Listing 2. Record's underlying classes and superclasses
user=> (defrecord Person [name age postal])
user.Person

user=> (def bob (Person. "Bob" 42 60601))
#'user/bob
user=> (:name bob)
"Bob"
user=> (class bob)
user.Person
user=> (supers (class bob))
#{java.io.Serializable clojure.lang.Counted java.lang.Object 
clojure.lang.IKeywordLookup clojure.lang.IPersistentMap 
clojure.lang.Associative clojure.lang.IMeta 
clojure.lang.IPersistentCollection java.util.Map 
clojure.lang.IRecord clojure.lang.IObj java.lang.Iterable 
clojure.lang.Seqable clojure.lang.ILookup}

In Listing 2, I create a new record named Person, with fields for name, age, and postal code. I can construct this type of new record by using Clojure's syntactic sugar for constructor calls (using the class name plus a period as the function call). The return value is the namespaced instance. (All REPL interactions take place in the user namespace by default.) The callability rules still hold, so I can access members of the record using the syntactic sugar as in Listing 1.

When I call the (class ) function, it returns the namespace plus classname that Clojure created (which is interoperable with Java code). I can also access the superclasses of the Person record by using (supers ). In the last four lines of Listing 2, Clojure implements several interfaces for you, including callability interfaces such as IPersistentMap, which allows Clojure's native syntax for maps to work with classes and objects. The set of automatically included interfaces is one of the key differences between records and types, which get no automatic interface implementations.


Implementing protocols with records

A Clojure protocol is a named set of named functions and their signatures. The definition in Listing 3 creates a protocol object and a set of polymorphic protocol functions:

Listing 3. A Clojure protocol
(defprotocol AProtocol
  "A doc string for AProtocol abstraction"
  (bar [this a] "optional doc string for aar function")
  (baz [this a] [this a b] 
     "optional doc string for multiple-arity baz function"))

The functions in Listing 3 dispatch on the first argument's type, which makes them polymorphic on that type (traditionally named this to mimic the Java context holder). All protocol functions must therefore have at least one argument. Traditionally, protocols are named with CamelCase; because they reify Java interfaces at the JVM level, matching Java naming conventions eases interoperability.

A record can implement a protocol, much like implementing an interface in the Java language. The record must (as a runtime check) implement functions that match the protocol signatures. In Listing 4, I create a record that implements AProtocol:

Listing 4. Implementing a protocol
(defrecord Foo [x y]
   AProtocol
   (bar [this a] (min a x y))
   (baz [this a] (max a x y))
   (baz [this a b] (max a b x y)))

;exercising the record
(def f (Foo. 1 200))
(println (bar f 4))
(println (baz f 12))
(println (baz f 10 2000))

In Listing 4, I create a record named Foo with two fields: x and y. To implement the protocol, I must include functions to match its signature. After I implement the protocol, I can call the functions as regular functions against instances of the object. Within the function definitions, I have access to both the record's internal fields (x and y) and the function parameters.


Protocol extension options

As a way to extend existing classes and hierarchies cleanly, protocols were designed with the Expression Problem in mind. (See the last installment for the full Expression Problem definition.) Because the extensions are functions (like everything else in Clojure), many of the identity and inheritance problems endemic to object-oriented languages don't appear. Yet this mechanism allows for various useful extensions.

Clojure is a hosted language: It is designed (using protocols) to run on multiple platforms, including .NET and JavaScript (through the ClojureScript compiler). JavaScript needs an environment that can set up, tear down, load, and evaluate code. So ClojureScript defines a BrowserEnv record, which handles lifecycle functions such as setup and teardown for whatever JavaScript environment (browser, REPL, or faked) is appropriate. The record definition for BrowserEnv is in Listing 5:

Listing 5. ClojureScript's BrowserEnv record
(defrecord BrowserEnv []
  repl/IJavaScriptEnv
  (-setup [this]
    (do (require 'cljs.repl.reflect)
        (repl/analyze-source (:src this))
        (comp/with-core-cljs (server/start this))))
  (-evaluate [_ _ _ js] (browser-eval js))
  (-load [this ns url] (load-javascript this ns url))
  (-tear-down [_]
    (do (server/stop)
        (reset! server/state {})
        (reset! browser-state {}))))

The lifecycle methods that are defined in the IJavaScriptEnv protocol give implementers (such as a browser) access to a common interface. The hyphen at the start of each of the function names (for example, (-tear-down )) is a ClojureScript (not Clojure) convention.

Another goal of solutions to the Expression Problem is to be able to add new features to existing hierarchies without recompiling or otherwise "touching" them. In version 1.5, Clojure introduced an advanced collection library called reducers. This library adds automatic concurrent processing for many collection types. To take advantage of reducers, existing types must implement one of the library's methods, coll-fold. Using protocols and the handy extend-protocol macro — which enables you to extend a protocol to multiple types at once — the (coll-fold ) function is magically available across several core types, as in Listing 6:

Listing 6. Reducers attaching (coll-fold ) to multiple types
(extend-protocol CollFold
 nil
 (coll-fold
  [coll n combinef reducef]
  (combinef))

 Object
 (coll-fold
  [coll n combinef reducef]
  ;;can't fold, single reduce
  (reduce reducef (combinef) coll))

 clojure.lang.IPersistentVector
 (coll-fold
  [v n combinef reducef]
  (foldvec v n combinef reducef))

 clojure.lang.PersistentHashMap
 (coll-fold
  [m n combinef reducef]
  (.fold m n combinef reducef fjinvoke fjtask fjfork fjjoin)))

The (extend-protocol ) call in Listing 6 attaches the CollFold protocol (which contains a single method, (coll-fold )) to the nil, Object, IPersistentVector, and PersistentHashMap types. Even nil (Clojure's variant of the Java language's null) works correctly with this library, handling the common edge case of empty collections. The reducers library also attaches to two core collection classes, IPersistentVector and IPersistentHasMap, to add reducer functionality near the top of those collection hierarchies.

Clojure uses an elegant set of building blocks to allow simple but powerful extension. Because the language is function-based rather than class-based, some developers struggle with code organization without classes as the main organizing principle. Clojure organizes code in much the same way as the Java language, less one piece. Java has packages, classes, and methods, whereas Clojure has namespaces (which roughly correspond to packages) and functions (which roughly correspond to methods). Clojure protocols also generate native Java interfaces when needed, which developers can use for interoperability. In Clojure, the convention is to define protocols at component boundaries and place similar functions and protocols within a namespace. Clojure lacks classes as an information-hiding mechanism, but you can define namespace-private functions (with the (defn- ) function definition).

Clojure's code organization into namespaces makes clean, centrally located extension possible. Consider the CollFold protocol in Listing 6, which appears in the reducers.clj file in Clojure's source. The protocol, new types, and extensions all live in this file, added with Clojure 1.5. With protocol extension, you can reach back to core types (such as Object) and add reducer functionality, some of which is implemented through namespace-private functions within the reducers namespace. Clojure manages to add significant new behavior to an existing hierarchy with surgical precision, without elaborate hackery, and keeps all the pertinent details in one place.

The (extend-type ) macro is similar to the (extend-protocol ) macro; with the (extend-type ) macro, you can add several protocols to a type simultaneously. Listing 7 shows how ClojureScript adds collection functionality to arrays:

Listing 7. Adding collection functions to JavaScript arrays
(extend-type array
  ICounted
  (-count [a] (alength a))

  IReduce
  (-reduce [col f] (array-reduce col f))
  (-reduce [col f start] (array-reduce col f start)))

In Listing 7, ClojureScript needs JavaScript arrays to respond to Clojure functions such as (count ) and (reduce ). The (extend-type ) macro enables the implementation of multiple protocols in one place. Clojure expects collections to respond to count rather than length, so attaching the ICounted protocol and function adds the appropriate method alias.

Records aren't required for the reification of protocols. Like anonymous objects in Java, protocols can be reified and used inline, as in Listing 8:

Listing 8. Inline reification of protocols
(let [z 42
      p (reify AProtocol
       (bar [_ a] (min a z))
       (baz [_ a] (max a z)))]
  (println (baz p 12)))

In Listing 8, I use a let block to create two local bindings: x and p, the inline protocol definition. When I create an anonymous protocol, I still have access to the local scope: The presence of z as a parameter is legal because z is in scope for this let block. In this way, the reified protocol encloses its environment like a closure block. Notice that I don't implement the protocol completely; some of the baz function's arity versions are missing. Unlike Java interfaces, protocol implementations are optional. Clojure doesn't enforce the protocol at compile time but generates a runtime error if it expects a protocol method that doesn't exist.


Conclusion

This Java.next installment investigates how common Java conventions such as classes and interfaces map to structures in Clojure. It also explores the various uses of protocols in Clojure, and how Clojure solves the Expression Problem simply and elegantly, with multiple real-world variations. In the next installment, I conclude the Extension without inheritance miniseries with an exploration of mixins in Groovy.

Resources

Learn

Get products and technologies

  • Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users as you explore the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Open source
ArticleID=938384
ArticleTitle=Java.next: Extension without inheritance, Part 2
publish-date=07302013