Get Your Feet Wet With Java Parallel Streams
Updated: Nov 1, 2021
Parallelism in Java can be spooky, but it doesn’t have to be. Let’s see how streams look when they’re not sequential.
Run it #
These samples are also available in src/App.java. You can run them with:
javac App.java && java App
Boilerplate #
To reduce boilerplate, the following functions will be used. You can see their definitions in src/App.java.
log
– an easy loggersleep
– sleep the current threadtime
– time a functionlogAvailableProcessors
– log available processorslogCommonPoolParallelism
– log available threads from the ForkJoinPool
Sequential Streams #
Let’s start with a list of numbers:
static List<Integer> createNumbers() {
return Arrays.asList(1, 2, 3);
}
Then let’s simulate a costly mapper. We’ll log the current thread name, sleep for 1 second, then multiply the arg by 1 just for kicks.
static Integer costlyMapper(Integer i) {
log("costlyMapper | val: %s | thread: %s", i, Thread.currentThread().getName());
sleep(1000);
return i * 1;
}
Then let’s use a stream to call our costly mapper and get the sum of the numbers. A stream is sequential by default, I just tacked on sequential()
here to make it explicit.
static void sumSequential() {
int sum = createNumbers()
.stream()
.sequential()
.mapToInt(App::costlyMapper)
.sum();
log("Sum: %s", sum);
}
What do you think will happen when timing the function?
time("sumSequential", App::sumSequential);
Which thread(s) will be used? How long will it take?
Note that:
- The main thread was used
- It took about 3 seconds
- The stream pipeline honored the order of the list elements
Started sumSequential
costlyMapper | val: 1 | thread: main
costlyMapper | val: 2 | thread: main
costlyMapper | val: 3 | thread: main
Sum: 6
Completed sumSequential in 3.01 second(s)
Parallel Streams #
Okay, now onto the main event. This is what you came for, right?
Let’s sum the numbers again, but this time, in parallel:
static void sumParallel() {
int sum = createNumbers()
.stream()
.parallel()
.mapToInt(App::costlyMapper)
.sum();
log("Sum: %s", sum);
}
Then time it:
time("sumParallel", App::sumParallel);
Again: Which thread(s) will be used? How long will it take?
Note that:
- The main thread was used, along with threads from the ForkJoinPool
- It took about 1 second! 🔥
- The stream pipeline did not honor the order of the list elements
Started sumParallel
costlyMapper | val: 2 | thread: main
costlyMapper | val: 3 | thread: ForkJoinPool.commonPool-worker-5
costlyMapper | val: 1 | thread: ForkJoinPool.commonPool-worker-19
Sum: 6
Completed sumParallel in 1.01 second(s)
There are 3 elements in the list, so 3 threads were used during the parallel run.
Let’s compare the available processors on my current machine to the available threads for parallelism.
logAvailableProcessors();
This outputs:
Available processors: 12
So, this means there must be 12 available threads, right?
logCommonPoolParallelism();
The answer may surprise you:
Common pool parallelism: 11
11 threads? What?! Did they short us? Nope. The missing thread is actually the main thread.
I hope this little exploration gave you more confidence with parallel streams. See the below references for deeper dives as well as Do’s and Don’ts. Happy computing.
References #
- This most excellent talk by Venkat Subramaniam. I’ve yet to listen to a talk by Venkat that did not keep my attention
- The Java tutorials on parallelism
- The BaseStream and Stream docs
- The ForkJoinPool docs