This is a question I constantly ask myself when designing a data intensive application: When is it appropriate to use stream()
over parallelStream()
? Would it make sense to use both? How do I quantify the metrics and conditions to intelligently decide which ones to use at runtime. From what I understand, parallelStream()
is a great facility to process entries in parallel but it all comes down to execution time and overhead. Does the end justify the means?
In my particular use case, do to the nature of the application, the velocity and volume of the data I am processing will be all over the place. There will be times where the volume is so large, my application would massively benefit from parallelizing the workload. Then there are times where a single thread will accomplish the task much more efficiently. I have profiled my application a dozen times and have had mixed results.
So this brings me to my question. Is there a way in Java 8 (or later) to switch between stream()
and parallelStream()
intelligently? I considered at one point defining boundaries on the data that would allow for alternating between the two but in the end, not every piece of equipment is designed the same. Some systems may deal with single threaded workload much better then others. And vice versa.
It might be relevant to mention that I am using Apache Kafka, using Kafka Streams with Spring Cloud Streams. For the most part, I feel like I have squeezed everything out of Kafka in terms of performance and want to focus internally on optimizing my own service.