5 things you didn't know about ...
Multithreaded Java programming
On the subtleties of high-performance threading
Content series:
This content is part # of # in the series: 5 things you didn't know about ...
This content is part of the series:5 things you didn't know about ...
Stay tuned for additional content in this series.
While few Java™ developers can afford to ignore multithreaded programming and the Java platform libraries that support it, even fewer have time to study threads in depth. Instead, we learn about threads ad hoc, adding new tips and techniques to our toolboxes as we need them. It's possible to build and run decent applications this way, but you can do better. Understanding the threading idiosyncrasies of the Java compiler and the JVM will help you write more efficient, better performing Java code.
In this installment of the 5 things series, I introduce some of the subtler aspects of multithreaded programming with synchronized methods, volatile variables, and atomic classes. My discussion focuses especially on how some of these constructs interact with the JVM and Java compiler, and how the different interactions could affect Java application performance.
1. Synchronized method or synchronized block?
You may have occasionally pondered whether to synchronize an entire method call or only the thread-safe subset of that method. In these situations, it is helpful to know that when the Java compiler converts your source code to byte code, it handles synchronized methods and synchronized blocks very differently.
When the JVM executes a synchronized method, the executing thread
identifies that the method's method_info
structure has the
ACC_SYNCHRONIZED
flag set, then it automatically acquires the
object's lock, calls the method, and releases the lock. If an exception
occurs, the thread automatically releases the lock.
Synchronizing a method block, on the other hand, bypasses the JVM's built-in support for acquiring an object's lock and exception handling and requires that the functionality be explicitly written in byte code. If you read the byte code for a method with a synchronized block, you will see more than a dozen additional operations to manage this functionality. Listing 1 shows calls to generate both a synchronized method and a synchronized block:
Listing 1. Two approaches to synchronization
package com.geekcap; public class SynchronizationExample { private int i; public synchronized int synchronizedMethodGet() { return i; } public int synchronizedBlockGet() { synchronized( this ) { return i; } } }
The synchronizedMethodGet()
method generates the following
byte code:
0: aload_0 1: getfield 2: nop 3: iconst_m1 4: ireturn
And here's the byte code from the synchronizedBlockGet()
method:
0: aload_0 1: dup 2: astore_1 3: monitorenter 4: aload_0 5: getfield 6: nop 7: iconst_m1 8: aload_1 9: monitorexit 10: ireturn 11: astore_2 12: aload_1 13: monitorexit 14: aload_2 15: athrow
Creating the synchronized block yielded 16 lines of bytecode, whereas synchronizing the method returned just 5.
2. ThreadLocal variables
If you want to maintain a single instance of a variable for all instances
of a class, you will use static-class member variables to do it. If you
want to maintain an instance of a variable on a per-thread basis, you'll
use thread-local variables. ThreadLocal
variables are
different from normal variables in that each thread has its own
individually initialized instance of the variable, which it accesses via
get()
or set()
methods.
Let's say you're developing a multithreaded code tracer whose goal is to
uniquely identify each thread's path through your code. The challenge is
that you need to coordinate multiple methods in multiple classes across
multiple threads. Without ThreadLocal
, this would be a
complex problem. When a thread started executing, it would need to
generate a unique token to identify it in the tracer and then pass that
unique token to each method in the trace.
With ThreadLocal
, things are simpler. The thread initializes
the thread-local variable at the start of execution and then accesses it
from each method in each class, with assurance that the variable will only
host trace information for the currently executing thread. When it's done
executing, the thread can pass its thread-specific trace to a management
object responsible for maintaining all traces.
Using ThreadLocal
makes sense when you need to store variable
instances on a per-thread basis.
3. Volatile variables
I estimate that roughly half of all Java developers know that the Java
language includes the keyword volatile
. Of those, only about
10 percent know what it means, and even fewer know how to use it
effectively. In short, identifying a variable with the
volatile
keyword means that the variable's value will be
modified by different threads. To fully understand what the
volatile
keyword does, it's first helpful to understand how
threads treat non-volatile variables.
In order to enhance performance, the Java language specification permits the JRE to maintain a local copy of a variable in each thread that references it. You could consider these "thread-local" copies of variables to be similar to a cache, helping the thread avoid checking main memory each time it needs to access the variable's value.
But consider what happens in the following scenario: two threads start and
the first reads variable A as 5 and the second reads variable A as 10. If
variable A has changed from 5 to 10, then the first thread will not be
aware of the change, so it will have the wrong value for A. If variable A
were marked as being volatile
, however, then any time a
thread read the value of A, it would refer back to the master copy of A
and read its current value.
If the variables in your applications are not going to change, then a
thread-local cache makes sense. Otherwise, it's very helpful to know what
the volatile
keyword can do for you.
4. Volatile versus synchronized
If a variable is declared as volatile
, it means that it is
expected to be modified by multiple threads. Naturally, you would expect
the JRE to impose some form of synchronization for volatile variables. As
luck would have it, the JRE does implicitly provide synchronization when
accessing volatile variables, but with one very big caveat: reading a
volatile variable is synchronized and writing to a volatile variable is
synchronized, but non-atomic operations are not.
What this means is that the following code is not thread safe:
myVolatileVar++;
The previous statement could also be written as follows:
int temp = 0; synchronize( myVolatileVar ) { temp = myVolatileVar; } temp++; synchronize( myVolatileVar ) { myVolatileVar = temp; }
In other words, if a volatile variable is updated such that, under the hood, the value is read, modified, and then assigned a new value, the result will be a non-thread-safe operation performed between two synchronous operations. You can then decide whether to use synchronization or rely on the JRE's support for automatically synchronizing volatile variables. The better approach depends on your use case: If the assigned value of the volatile variable depends on its current value (such as during an increment operation), then you must use synchronization if you want that operation to be thread safe.
5. Atomic field updaters
When incrementing or decrementing a primitive type in a multithreaded
environment, you're far better off using one of the atomic classes
found in the java.util.concurrent.atomic
package than you
would be writing your own synchronized code block. The atomic classes
guarantee that certain operations will be performed in a thread-safe
manner, such as incrementing and decrementing a value, updating a value,
and adding a value. The list of atomic classes includes
AtomicInteger
, AtomicBoolean
,
AtomicLong
, AtomicIntegerArray
, and so
forth. The latest additions to the atomic package are
DoubleAccumulator
, DoubleAdder
,
LongAccumulator
and LongAdder
classes.
They maintain a set of internal variables in order to reduce contention and operate around the given lambda expression.
The challenge of using atomic classes is that all class operations,
including get
, set
, and the family of
get-set
operations, are rendered atomic. This means that
read
and write
operations that do not modify the
value of an atomic variable are synchronized, not just the important
read-update-write
operations. The workaround, if you want
more fine-grained control over the deployment of synchronized code, is to
use an atomic field updater.
Using atomic updates
Atomic field updaters like AtomicIntegerFieldUpdater
,
AtomicLongFieldUpdater
, and
AtomicReferenceFieldUpdater
are basically wrappers applied to
a volatile field. Internally, the Java class libraries make use of them.
While they are not widely used in application code, there's no reason you
can't use them too.
Listing 2 presents an example of a class that uses atomic updates to change the book that someone is reading:
Listing 2. Book class
package com.geeckap.atomicexample; public class Book { private String name; public Book() { } public Book( String name ) { this.name = name; } public String getName() { return name; } public void setName( String name ) { this.name = name; } }
The Book
class is just a POJO (plain old Java object) that has
a single field: name.
Listing 3. MyObject class
package com.geeckap.atomicexample; import java.util.concurrent.atomic.AtomicReferenceFieldUpdater; /** * * @author shaines */ public class MyObject { private volatile Book whatImReading; private static final AtomicReferenceFieldUpdater<MyObject,Book> updater = AtomicReferenceFieldUpdater.newUpdater( MyObject.class, Book.class, "whatImReading" ); public Book getWhatImReading() { return whatImReading; } public void setWhatImReading( Book whatImReading ) { //this.whatImReading = whatImReading; updater.compareAndSet( this, this.whatImReading, whatImReading ); } }
The MyObject
class in Listing 3
exposes its whatAmIReading
property as you would expect, with
get
and set
methods, but the set
method does something a little different. Instead of simply assigning its
internal Book
reference to the specified Book
(which would be accomplished using the code that is commented out in Listing 3), it uses an
AtomicReferenceFieldUpdater
.
AtomicReferenceFieldUpdater
The Javadoc for AtomicReferenceFieldUpdater
defines it as
follows:
A reflection-based utility that enables atomic updates to designated volatile reference fields of designated classes. This class is designed for use in atomic data structures in which several reference fields of the same node are independently subject to atomic updates.
In Listing 3, the
AtomicReferenceFieldUpdater
is created by a call to its
static newUpdater
method, which accepts three parameters:
- The class of the object containing the field (in this case,
MyObject
) - The class of the object that will be updated atomically (in this case,
Book
) - The name of the field to be updated atomically
The real value here is that the getWhatImReading
method is
executed without synchronization of any kind, whereas the
setWhatImReading
is executed as an atomic operation.
Listing 4 illustrates how to use the setWhatImReading()
method
and asserts that the value changes correctly:
Listing 4. Test case that exercises the atomic update
package com.geeckap.atomicexample; import org.junit.Assert; import org.junit.Before; import org.junit.Test; public class AtomicExampleTest { private MyObject obj; @Before public void setUp() { obj = new MyObject(); obj.setWhatImReading( new Book( "Java 2 From Scratch" ) ); } @Test public void testUpdate() { obj.setWhatImReading( new Book( "Pro Java EE 5 Performance Management and Optimization" ) ); Assert.assertEquals( "Incorrect book name", "Pro Java EE 5 Performance Management and Optimization", obj.getWhatImReading().getName() ); } }
See Related topics to learn more about atomic classes.
Conclusion
Multithreaded programming is always challenging, but as the Java platform
has evolved, it has gained support that simplifies some multithreaded
programming tasks. In this article, I discussed five things that you may
not have known about writing multithreaded applications on the Java
platform, including the difference between synchronizing methods versus
synchronizing code blocks, the value of employing ThreadLocal
variables for per-thread storage, the widely misunderstood
volatile
keyword (including the dangers of relying on
volatile
for your synchronization needs), and a brief look at
the intricacies of atomic classes.
Downloadable resources
Related topics
- Develop and deploy your next app on the IBM Bluemix cloud platform.
- Java Concurrency in Practice (Brian Goetz, et. al. Addison-Wesley, 2006): Brian's remarkable ability to distill complex concepts for readers makes this book a must on any Java developer's bookshelf.
- "Java bytecode: Understanding bytecode makes you a better programmer" (Peter Haggar, developerWorks, July 2001): A tutorial introduction to the byways of bytecode, including an earlier example illustrating the difference between synchronized methods and synchronized blocks.
- "Java theory and practice: Going atomic" (Brian Goetz, developerWorks, November 2004): Explains how atomic classes enable the development of highly scalable nonblocking algorithms in the Java language.
- "Java theory and practice: Concurrency made simple
(sort of)" (Brian Goetz, developerWorks, November 2002):
Guides you through the
java.util.concurrent
package. - "5 things you didn't know about ... java.util.concurrent, Part 1" (Ted Neward, developerWorks, May 2010): Get introduced to five concurrent collections classes, which retrofit standard collections classes for your concurrency programming needs.