Contents


Java 8 idioms

Functional purity

Understanding streams and mutables in the functional pipeline

Comments

Content series:

This content is part # of # in the series: Java 8 idioms

Stay tuned for additional content in this series.

This content is part of the series:Java 8 idioms

Stay tuned for additional content in this series.

Earlier in this series I introduced you to function composition and the Collection Pipeline pattern. In this article, we'll revisit the benefits and building blocks of functional pipelines. You'll learn more about using java.util.stream to build functional pipelines, and why it is beneficial to maintain functional purity in your pipelines.

Functional pipelines and the Stream API

We use Streams to build functional pipelines in Java™. There are three benefits to using Streams in your functional-style code:

  • A Stream is concise, expressive, elegant, and the code reads like the problem statement.
  • It is lazily evaluated, making it very efficient in your programs.
  • It may be used in parallel.

In this series you've already learned a great deal about the benefits of elegant and concise code. In this article we'll focus on the other two benefits. Efficiency is one of the main benefits you will look for when using functional pipelines, so we'll start there.

Lazy evaluation

The following imperative-style code is quite efficient: it does no more work than is absolutely necessary.

List<Integer> numbers = Arrays.asList(2, 5, 8, 15, 12, 19, 50, 23);
          
Integer result = null;
for(int e : numbers) {
  if(e > 10 && e % 2 == 0) {
    result = e * 2;
    break;
  }
}

if(result != null)
  System.out.println("The value is " + result);
else
  System.out.println("No value found");

The code iterates over the elements in the numbers collection, but only until it finds an element that satisfies its two requirements, of being greater than 10 and an even number. After that first number is found, no other values will be processed.

Now let's rewrite the above code using a functional pipeline:

List<Integer> numbers = Arrays.asList(2, 5, 8, 15, 12, 19, 50, 23);
System.out.println(
  numbers.stream()
    .filter(e -> e > 10)
    .filter(e -> e % 2 == 0)
    .map(e -> e * 2)
    .findFirst()
    .map(e -> "The value is " + e)
    .orElse("No value found"));

This functional-style version produces the same result as the imperative version. In the given example, the imperative version doesn't process any value past 12, and neither does the functional version. What is different is how the code processes the given variables.

Stream processing

A Java Stream is fundamentally lazy, kind of like my teenage children. Here's a scenario at my home that may help you understand the behavior of streams.

My wife to my son: "Turn off the TV."

It's like no words were spoken.

She: "Put the trash out."

No muscles were moved.

Again: "Do your homework."

No pencils are picked up.

She: "I'm calling daddy."

The kid springs into action, pressing the off button on the TV remote...

Rather like a teenager, Streams have just two kinds of methods: intermediate and terminal. The latter is the equivalent of the callDaddy() or callMommy() method, depending on the role of each parent in the family.

The Stream accumulates and combines or fuses intermediate operations, and then executes them. But like a teenager, it does only as much as necessary to satisfy the terminal operation. Because the intermediate operations are fused, there is an important distinction to how streams process data in the pipeline: rather than execute each function on a collection of data, as the imperative code does, the Stream executes the fused collection of functions on each element, but only as required.

We can verify this behavior by making a small change to our original functional-style code:

List<Integer> numbers = Arrays.asList(2, 5, 8, 15, 12, 19, 50, 23);
System.out.println(
  numbers.stream()
    .peek(e -> System.out.println("processing " + e))
    .filter(e -> e > 10)
    .filter(e -> e % 2 == 0)
    .map(e -> e * 2)
    .findFirst()
    .map(e -> "The value is " + e)
    .orElse("No value found"));

Here, we've added a call to peek right before the first filter in the functional pipeline. The peek method is useful for debugging purposes, enabling us to take a peek into the Stream during execution. Here's the output of the new code:

processing 2
processing 5
processing 8
processing 15
processing 12
The value is 24

The code processed all the values up to and including 12, but it didn't touch any value past that desired one. That's because the terminal operation findFirst triggers the termination of stream processing. Furthermore, the operations in the two filters and the map calls are fused, then evaluated on each element in the sequence. The elements are not evaluated past the internal termination signal from findFirst.

In this case, it is clear that laziness leads to efficiency, as the functional pipeline performs no unnecessary work. It exemplifies efficiency with elegance.

Parallelization

Parallelization can be very useful for cases where you have a large collection, or where you need to execute tasks that will consume significant time. The following code simulates a time-consuming operation.

import java.util.*;

class Sample {
  public static int simulateTimeConsumingComputation(int number) {
    try { Thread.sleep(1000); } catch(Exception ex) {}
    return number * 2;
  } 
  
  public static void main(String[] args) { 
    List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
                         
    numbers.stream()
      .map(Sample::simulateTimeConsumingComputation)
      .forEachOrdered(System.out::println);
  }
}

If you run this code normally, you will find it takes about 10 seconds to run. That is too long. We can improve the speed by using a parallel stream, like so:

...
    numbers.stream()
      .parallel()
      .map(Sample::simulateTimeConsumingComputation)
...

The parallel stream yields a much faster execution time. The new code takes about one second to run on a 16-core processor or two seconds on an 8-core processor. That's because, by default, parallel streams use as many threads as the number of cores on the system.

You will also note that it took relatively little effort to parallelize this code. The structure of a functional pipeline executing sequentially is no different from one running in parallel, which makes functional pipelines very easy to parallelize.

The rules of functional purity

So far you might like the looks of these techniques: laziness that leads to efficiency and parallelization that is as easy to write as sequential processing—sign me up! But there is a catch: the success of these techniques relies on the purity of your code. All lambda expressions and closures in your functional pipeline must be pure.

Before we go further, you should understand a few things about pure functions. First, pure functions are idempotent—meaning there are no limits to how many times a pure function may be called. Second, no matter how many times you call it, a pure function will yield the same result for the same given input. Third, a pure function has no side-effects: no matter what you do with it, the pure function will not change any other element in your program.

This last characteristic is the most important to remember if you want to write pure functions. In essence, there are two rules to functional purity:

  • The function does not change anything.
  • The function does not depend on anything that may possibly change.

A pure function never effects a change or experiences a change in the middle of its execution.

Why functional purity matters

Lazy evaluation means that a function may be evaluated now, or later, or the evaluation may be skipped entirely. Anything goes as long as the desired result is obtained. But if the function has a side-effect then lazy evaluation won't work. The next example shows what happens when the function pipeline includes an impure function.

List<Integer> numbers = Arrays.asList(1, 2, 3);
                        
int[] factor = new int[] { 2 };
Stream<Integer> stream = numbers.stream()
  .map(e -> e * factor[0]);

factor[0] = 0;

stream.forEach(System.out::println);

Java assumes that lambda expressions and closures provided to operations are pure. If your code doesn't meet this expectation then you will suffer the consequences.

Just for fun, ask a few colleagues what output they would expect from this code. You are unlikely to get any single, consistent response. More likely, you will see a lot of head scratching and uncertainty.

In this example, the closure passed to map is not pure. It fails the second rule of purity because a variable the closure depends on could change (and in fact it does change). Due to lazy evaluation, the closure passed as argument to map is not evaluated until the forEach call.

Because factor[0] is mutable, the value could be anything from the time the closure was created to the time it is eventually evaluated. The mutable variable makes the code hard to follow and understand. Code that is hard to understand is hard to maintain, and is often a source of errors.

The same is true for the parallel stream: if the state passed to operations is not pure, the results will be unpredictable.

Avoid shared mutability

The lambda expressions and closures passed to the operations should be pure. They should not modify any outside state nor depend on any mutable outside state.

Developers often ask if they should avoid mutability entirely. The short answer is no. Instead, avoid shared mutability. In both intermediate and terminal operations, if a shared mutable variable is modified then the code becomes hard to reason. Shared mutability also makes it impossible to get correct results with parallel and/or lazy evaluation. You could choose not to use parallelization, but you have no control over lazy evaluation because it is an implicit behavior of streams.

While shared mutability will cost you, you may get good results from carefully mutating isolated variables, which are variables that are strictly not shared by multiple threads. Mutating an isolated variable can improve performance when working with a very large volume of data. In a recent project working with a collection of millions of objects, my team used isolated mutability to increase performance to a reasonable level for the data load. It worked because we carefully verified that there was no shared mutability. We also verified that our results were not only fast but correct.

For collections of small or moderate size, or cases where you are able to achieve reasonable performance without mutability, then it is most prudent to avoid mutability in lambdas and closures. If you do employ mutability in one of these elements, make sure you are mutating an isolated variable and never, ever, a shared variable. The state closures depend on should never be modified by more than one thread between the start and finish of the function pipeline.

Conclusion

Lazy evaluation and easy parallel execution are two significant benefits of using a functional pipeline. Both features hinge on functional purity, meaning that lambdas and closures must not have any side-effects in your program. In this article you've learned about the rule and why it exists, and you've also explored its one exception, where it is possible to safely mutate isolated variables.

Understanding functional purity is important because Java doesn't produce an error, not even a warning, if you violate the expectation of purity in lambdas and closures. So it is up to you to verify that your lambdas do not rely on shared mutable state, and that the result of execution are both efficient and correct.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java development
ArticleID=1056429
ArticleTitle=Java 8 idioms: Functional purity
publish-date=01052018