Java.next: Concurrency in Clojure

How Clojure abstracts concurrency and shared state

Clojure has the most radical approach to concurrency of all the Java.next languages. This installment delves into some of the many facets of concurrency in Clojure, including the epochal time model and software transactional memory.

Neal Ford, Director / Software Architect / Meme Wrangler, ThoughtWorks

Photo of Neal fordNeal Ford is Director, Software Architect, and Meme Wrangler at ThoughtWorks, a global IT consultancy. He is also the designer and developer of applications, instructional materials, magazine articles, courseware, and video/DVD presentations, and he is the author or editor of books spanning a variety of technologies, including the most recent Presentation Patterns. He focuses on designing and building large-scale enterprise applications. He is also an internationally acclaimed speaker at developer conferences worldwide. Check out his website.



15 April 2014

Also available in Chinese Japanese

About this series

The Java™ legacy will be the platform, not the language. More than 200 languages run on the JVM, and it's inevitable that one of them will eventually supplant the Java language as the best way to program the JVM. This series explores three next-generation JVM languages: Groovy, Scala, and Clojure, comparing and contrasting new capabilities and paradigms, to provide Java developers a glimpse into their own near future.

Among all the Java.next languages, Clojure has the most radical concurrency mechanisms and functionality. While both Groovy and Scala offer a combination of improved abstractions and syntactic sugar for concurrency, Clojure leverages its hard-line stance on immutability to offer behavior that's unique on the JVM. In this Java.next installment, I cover a few of the vast number of concurrency options in Clojure. I start with the fundamental abstraction underlying mutable references in Clojure: the epochal time model.

Epochal time model

Perhaps Clojure's most radical departure from other languages revolves around mutable state and values. A value in Clojure can be any data of interest: the number 42, the map structure {:first-name "Neal :last-name "Ford"}, or some larger data structure like Wikipedia. Basically, the Clojure language treats all values as other languages treat numbers. The number 42 is a value, and you can't redefine it. But you can apply a function to that value that can return another value. For example, (inc 42) returns the value 43.

In Java and other C-based languages, variables hold both identity and value, which is one of the factors that makes concurrency so difficult in the Java language. The language designers created the variable abstraction before the threading abstractions, and the design of variables didn't take into account the added complexity of concurrency. Because variables in Java assume a single thread, cumbersome mechanisms such as synchronization blocks are required to protect variables in a multithreaded environment. Clojure's designer Rich Hickey resuscitated the archaic word complect— defined as "to entwine or weave" — to describe the design flaws in Java variables.

Clojure separates value from reference. In the Clojure world view, data exists as a series of immutable values, as shown in Figure 1.

Figure 1. Values in the epochal time model
Illustration of values in the epochal time model

Figure 1 shows that independent values such as v1 represent data such as 42 or Wikipedia, modeled as boxes. Independent of values are functions, which take values as parameters and produce new values, as shown in Figure 2.

Figure 2. Functions in the epochal time model
Illustration of functions in the epochal time model

Figure 2 shows functions as circles that are independent of values. New values are generated by function calls, using values as parameters and results. The succession of values is held in a reference, which represents the identity of the variable. Over time, this identity might point to different values (because of function application), but the identity never changes, as illustrated by the dotted line in Figure 3.

Figure 3. References in the epochal time model
Illustration of references in the epochal time model

In Figure 3, the entire image represents a single reference's change over time. The dotted line is a reference, which holds a succession of values over its lifetime. You can assign a new immutable value to the reference at some point; what the reference points to can change without changing the reference.

During the lifetime of the reference, one or more observers (other programs, a user interface, anything interested in the value held by the reference) will dereference it to see (and probably act in some way on) the value, as shown in Figure 4.

Figure 4. Dereferencing
Illustration of dereferencing

In Figure 4, observers (represented by the two wedge shapes) can hold onto the reference itself (as signified by the arrows from the dotted-line reference) or can dereference it, retrieving the value (as signified by arrows directly from the value). For example, you might have a function that relies on a database connection passed to you as a parameter, which you in turn pass to a lower-level persistence function. In this case, you hold the reference but never need its value; the persistence function will likely dereference it to retrieve the value to connect to a database.

Notice that the observers in Figure 4 don't coordinate — they don't rely on each other for anything. That structure enables the Clojure runtime to guarantee some useful properties language-wide, such as never allowing readers to block, which makes reads very efficient. If you want to change a reference (that is, have it point to a different value), you use one of Clojure's APIs for updates, which enforces the epochal time model.

The epochal time model underlies updates to references throughout Clojure. Because the runtime controls all updates, it can guard against threading conflicts that developers must contend with in less sophisticated languages.

Clojure has a wide variety of ways to update references, depending on what sort of characteristics you desire. Next, I discuss two, the simple atom and the sophisticated software transactional memory.

Atoms

An atom in Clojure is a reference to an atomic piece of data, no matter how large. You create an atom and initialize it, then apply a mutating function. Here, I create a reference called counter to an atom, initialized to 0. If I want to update the reference to a new value, I can use a function such as (swap! ), which atomically swaps in a new value for the reference:

(def counter  (atom 0))
(swap! counter + 10)

By convention in Clojure, the names of mutating functions include an ending exclamation point. The (swap! ) function accepts the reference, the function to apply (in this case, it's the + operator), and any additional arguments.

Clojure atoms hold data of any size, not just primitive values. For example, I can create an atomic reference around a person map and update it with map functions. Using a (create-person) function (not shown), I create a person record in an atom, then update the reference using (swap! ) and (assoc ), which updates a map association:

(def person (atom (create-person)))
(swap! person assoc :name "John")

Atoms also implement a common optimistic locking pattern with atoms via the (compare-and-set! ) function:

(compare-and-set! a 0 42)
=> false

(compare-and-set! a 1 7)
= true

The (compare-and-set! ) function accepts three parameters: the atom reference, the expected existing value, and the new value. If the value of the atom doesn't match the expected value, the update doesn't occur and the function returns false.

Clojure has a wide variety of mechanisms that follow reference semantics. For example, a promise, which is a different kind of reference, promises to deliver a value later. Here, I create a reference to a promise named number-later. This code doesn't generate any values, just the promise that it will do so eventually. When the (deliver ) function is called, a value is bound to number-later:

(def number-later (promise))
(deliver number-later 42)

Although this example uses the futures library in Clojure, the reference semantics are consistent with simple atoms.

Software transactional memory

No other feature of Clojure gets more attention than software transactional memory (STM), the internal mechanism that enables Clojure to encapsulate concurrency in the way that the Java language encapsulates garbage collection. In other words, you can write high-performance multithreaded Clojure applications and never think about synchronization blocks, deadlocks, threading libraries, and so on.

Clojure encapsulates concurrency by controlling all mutation to references through STM. When you update a reference (the only mutable abstraction), you must do so within a transaction to allow the Clojure runtime to manage the update. Consider the classic banking problem of debiting one account while simultaneously crediting another account. Listing 1 shows a simple Clojure solution.

Listing 1. Banking transactions
(defn transfer
  [from to amount]
  (dosync
   (alter from - amount)
   (alter to + amount)))

In Listing 1, I define a (transfer ) function that accepts three parameters: the from and to accounts — both references — and the amount. I decrement the amount in the from account and add it to the to account, but this operation must occur with the (dosync ) transaction. If I attempt either of the (alter ) calls outside the transaction block, the update fails with an IllegalStateException:

(alter from - 1)
=>> IllegalStateException No transaction running

In Listing 1, the (alter ) function still adheres to the epochal time model but uses STM to ensure either that both operations complete or that neither completes. Toward that end, STM — much like a database server — retries temporarily blocked operations, so it's important that your update functions don't have side effects outside the update. For example, if your function also writes to a log, you might see multiple entries because of retries. STM also gradually increases the priority of pending transactions as they age and displays other behavior that's more common in database engines.

STM is simple to use, but the underlying machinery is sophisticated. True to the name, STM is a transaction system. STM implements the ACI parts of the ACID standard for transactions: All changes are atomic, consistent, and isolated. The durable part of ACID doesn't apply here because STM operates in-memory. Having a high-performance mechanism like STM built into the core of a language is rare; Haskell is the only other mainstream language with a serious STM implementation — not surprising because Haskell, like Clojure, heavily favors immutability. (The .NET ecosystem tried to build an STM manager but eventually gave up because dealing with transactions and mutability became too complex.)


Reducers and number classification

No coverage of parallelism would be complete without a discussion of an alternative implementation of the number classifier problem from the preceding installment. An idiomatic version without parallelism appears in Listing 2.

Listing 2. Number classifier in Clojure
(defn classify [num]
  (let [facts (->> (range 1 (inc num))
           (filter #(= 0 (rem num %))))
   sum (reduce + facts)
      aliquot-sum (- sum num)]
         (cond
         (= aliquot-sum num) :perfect
         (> aliquot-sum num) :abundant
         (< aliquot-sum num) :deficient)))

The version of the classifier in Listing 2 collapses to a single function that returns a Clojure keyword (indicated by a leading colon). The (let ) block enables me to establish local bindings. To determine factors, I use the thread-last operator to filter the range of numbers and make the code more sequential. The calculations for both sum and aliquot-sum are straightforward; the Aliquot sum of a number is the sum of its factors minus the number itself, which makes my comparison code simpler. The last line of the function is a (cond ) statement that evaluates the aliquot-sum against the calculated value, returning the proper keyword enumeration. An interesting facet of this code is the way methods in my previous implementations collapse to simple assignments in this version. When the calculations are simple and terse enough, you need to create fewer functions formally.

Clojure includes a powerful concurrency library known as reducers. (The explanation of how reducers were developed — which includes optimizations to take advantage of the native fork/join facilities of recent JVMs — is a fascinating narrative.) The reducers library provides drop-in replacements for common operations like map, filter, and reduce, enabling the operations to take advantage of multiple threads automatically. For example, replacing standard (map ) with (r/map ) (the r/ is the reducer's namespace) causes your mapping operations to be automatically parallelized by the runtime.

A version of number classifier that takes advantage of reducers appears in Listing 3.

Listing 3. Classifier using the reducers library
(ns pperfect.core
  (:require [clojure.core.reducers :as r]))

(defn classify-with-reducer [num]
  (let [facts (->> (range 1 (inc num))
		   (r/filter #(= 0 (rem num %))))
	sum (r/reduce + facts)
	    aliquot-sum (- sum num)]
               (cond
               (= aliquot-sum num) :perfect
               (> aliquot-sum num) :abundant
               (< aliquot-sum num) :deficient)))

You must look closely to find the differences between Listing 2 and Listing 3. The only differences are the introduction of the reducers namespace and alias and the addition of r/ to both filter and reduce. With those small changes, my filtering and reducing operations can now use threads automatically.


Conclusion

In this installment, I covered a few of the concurrency options in Clojure, a rich subject area. I discussed the core underlying abstraction — the epochal time model — and showed how atoms and STM use this concept. I also illustrated an easy drop-in library for making an existing application use advanced concurrency capabilities such as fork/join.

Numerous other concurrency options exist in Clojure, including simpler parallel functions like pmap (parallel map). Clojure also includes agents — autonomous workers bound to threads in a pool (either system- or user-defined), loosely analogous to Scala's actors. Clojure also encapsulates all modern concurrency advances in the Java language, making it easy to use modern libraries like fork/join.

Perhaps more than any other Clojure feature, the concurrency facilities show the engineering focus of the Clojure ecosystem: taking great advantage of language features to build powerful abstractions. Clojure isn't just trying to create a Lispy version of Java. The designers are fundamentally rethinking core infrastructure and implementation.

In the next installment, I cover Java 8 as a Java.next language.

Resources

Learn

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users as you explore the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Open source
ArticleID=968146
ArticleTitle=Java.next: Concurrency in Clojure
publish-date=04152014