Is functional programming faster in multithreading because I write things differently or because things are compiled differently?

Question

I'm diving into the world of functional programming and I keep reading everywhere that functional languages are better for multithreading/multicore programs. I understand how functional languages do a lot of things differently, such as recursion, random numbers etc but I can't seem to figure out if multithreading is faster in a functional language because it's compiled differently or because I write it differently.

For example, I have written a program in Java which implements a certain protocol. In this protocol the two parties send and receive to each other thousands of messages, they encrypt those messages and resend them (and receive them) again and again. As expected, multithreading is key when you deal in the scale of thousands. In this program there's no locking involved.

If I write the same program in Scala (which uses the JVM), will this implementation be faster? If yes, why? Is it because of the writing style? If it is because of the writing style, now that Java includes lambda expressions, couldn't I achieve the same results using Java with lambda? Or is it faster because Scala will compile things differently?

Afaik functional programming doesn't make multithreading faster. It makes multithreading easier to implement and safer because there's some features of functional programming like immutability and functions not having side effects which help in this regard. — Pieter B, Feb 15 '16 at 15:30
Note that 1) better isn't really defined 2) it is *surely* **not** defined as simply "faster". A language X that requires a billion times the size of the code for 0.1% performance gain respect to Y is not better than Y for any reasonable definition of better. — Bakuriu, Feb 15 '16 at 18:14
Did you mean to ask about "functional programming" or "programs written in functional style"? Often faster programming doesn't yield a faster program. — Ben Voigt, Feb 15 '16 at 22:17
Don't forget there's always a GC that has to run in the background and keep up with your allocation demands... and I'm not sure it's multithreaded... — user541686, Feb 16 '16 at 00:00
Basically functional programming makes parallel code fast because there are no locks in functional program because there are no global states (at least in pure-functional programming). If you have managed to write a lock-free Java program then congratulations, you've probably managed to write a functional Java program. — slebetman, Feb 16 '16 at 07:25
@slebetman Yet my program consists of classes and objects, so isn't it a bit far fetched to claim that I wrote a functional Java program? — Aventinus, Feb 16 '16 at 12:21
The simplest answer here is: functional programming allows write programs that would consider *less* race condition issues, however it doesn't mean that programs written imperative style will be slower. — Dawid Pura, Feb 16 '16 at 13:26
@Aventinus the concepts of OO and functional are Orthogonal, doing one doesn't mean you can't do the other. — Pieter B, Feb 16 '16 at 15:43
It has nothing, whatsoever, to do with being "faster" (i.e. higher performance). It just (arguably) makes it easier. — Fattie, Feb 16 '16 at 18:11
@Aventinus: OO programming also exist in functional land. Lisp for example treats OO as a library. In the beginning there were several OO libraries for Lisp but over time people standardised on CLOS. Functional programming is powerful enough that OO can be implemented in the language itself rather than needing to modify the language specification. OO programming can sort of implement functional style but you get things like the command pattern instead of real closures. — slebetman, Feb 17 '16 at 02:05
I've tried to implement a multithreaded monte carlo ray tracer to render terrain in Haskell. I've invested days and weeks in making it faster. After optimization, the code was no longer maintainable. Correct, yes, but ugly. And it was wayyy slower than the highly more elegant equivalent in plain C++ with only minor optimizations. I would not do this again given the current state of things. (code is here: https://github.com/phresnel/excygen/tree/master/excyrender) — phresnel, Feb 17 '16 at 10:50
@phresnel I don't doubt your C++ implementation is faster, but what is "ugly" about your Haskell code? I took a look and don't see anything hideous. Also, what do you mean "the current state of things", and what is "this" that you won't do again? Writing anything in Haskell, another raytracer, or what? — Andres F., Feb 17 '16 at 20:17
@AndresF.: Thanks :) To be honest, I don't find the heavy-optimization parts anymore. Maybe I did not even publish them; I think it had to do with stream fusion. And the divergence between how beautifull it _could be_ in Haskell, and the state after optimization, that struck me. "Don't do it again": Writing a ray tracer that must be very performant on single home computers. I am sure other problems (e.g. writing a distributed ray tracer or writing insurance or medical apps fit really well; btw, I always wanted to give Erlang a try) — phresnel, Feb 18 '16 at 09:29
@AndresF.: And I think I must have been emotional, in that the code was often not really shorter or more concise than the C++ equivalent — phresnel, Feb 18 '16 at 09:33
@phresnel Thanks for the reply :) The code you uploaded to github seems short and not-ugly to me. Of course, I don't have the C++ implementation to compare it with! — Andres F., Feb 18 '16 at 13:37

MichelHenrich · Accepted Answer · 2016-02-16T13:15:12.353

103

The reason people say functional languages are better for parallel processing is due to the fact that they usually avoid mutable state. Mutable state is the "root of all evil" in the context of parallel processing; they make it really easy to run into race conditions when they are shared between concurrent processes. The solution to the race conditions then involve locking and synching mechanisms, as you mentioned, which cause runtime overhead, as the processes wait for one another to make use of the shared resource, and greater design complexity, as all of these concepts tend to be deeply nested within such applications.

When you avoid mutable state, the need for synchronization and locking mechanisms disappears along with it. Because functional languages usually avoid mutable state, they are naturally more efficient and effective for parallel processing - you won't have the runtime overhead of shared resources, and you won't have the added design complexity that usually follows.

However, this is all incidental. If your solution in Java also avoids mutable state (specifically shared between threads), converting it to a functional language like Scala or Clojure would not yield any benefits in terms of the concurrent efficiency, because the original solution is already free of the overhead caused by the locking and synching mechanisms.

TL;DR: If a solution in Scala is more efficient in parallel processing than one in Java, it is not because of the way the code is compiled or run through the JVM, but instead because the Java solution is sharing mutable state between threads, either causing race conditions or adding the overhead of synchronization in order to avoid them.

edited Feb 16 '16 at 13:15

answered Feb 15 '16 at 15:49

MichelHenrich

6,225
1
27
29

2

If only one thread modifies a piece of data; no special care is needed. It's only when multiple threads may modify the same data that you need some sort of special care (synchronisation, transactional memory, locking, whatever). An example of this is a thread's stack, which is constantly mutated by functional code but not modified by multiple threads. – Brendan Feb 15 '16 at 16:55
32

Having one thread mutate the data while others read it is enough that you have to start taking "special care". – Peter Green Feb 15 '16 at 19:25
11

@Brendan: No, if one thread modifies data while other threads are reading from that same data, then you have a race condition. Special care is needed even if only one thread is modifying. – Cornstalks Feb 15 '16 at 19:34
@Brendan How many functional programming languages have a concept of a "thread's stack"? Or a "thread" for that matter. – user253751 Feb 15 '16 at 20:33
3

*Mutable state is the "root of all evil" in the context of parallel processing* => if you have not looked at Rust yet, I advise that you peek at it. It manages to allow mutability very efficiently by realizing that the true issue is mutable mixed with aliasing: if you only have aliasing or only have mutability, there is no issue. – Matthieu M. Feb 16 '16 at 07:11
1

@immibis All FP languages I've used had threads. They didn't really model their "stack" explicitly as part of the language, but that's usually the case with imperative languages as well. There's still underlying machinery that uses mutable state to execute the program of course. – Cubic Feb 16 '16 at 12:56
2

@MatthieuM. Right, thanks! I edited to express things more clearly in my answer. Mutable state is only "the root of all evil" when it is shared between concurrent processes - something Rust avoids with its ownership control mechanisms. – MichelHenrich Feb 16 '16 at 13:17
1

@Brendan What you wrote holds true only if one piece of data is exclusive to only one thread. The difference between reading or writing is not significant enough. Stack is prime example of thread-exclusive data. – Agent_L Feb 16 '16 at 13:20
2

Good answer, but it misses one important performance advantage that functional programming brings over imperative programming: in a functional program, you can *easily* convert sequential iteration to parallel iteration. In functional code, iteration is "internal" (performed by a library) rather than "external" (performed through explicit language constructs such as "for" or "while"). So, in Java for example, there is no easy/direct way to convert a sequential "for" loop to one that uses threads, but when using streams you can easily convert it to a "parallel()" stream. – Rogério Feb 16 '16 at 17:28
2

@Rogério I do know that and agree with your point, but I thought such details such not be mentioned because the asker just wanted to know how and why a solution implementation in a functional language would be more efficient for concurrent processing, in a general way. I think that mentioning other advantages such as that would drive the discussion to an "FP over OO/Imperative" discussion. – MichelHenrich Feb 16 '16 at 17:43
@immibis and others: You all know what I meant. Mutability alone is not the problem; and "if only one thread modifies (and no others read) you don't need special care". Both transactional memory and functional programming solve the "people are poorly skilled and locks are hard" problem (in very different ways) by denying the use of potentially more efficient solutions. – Brendan Feb 16 '16 at 18:41
@Brendan Sure, it don't create any problem if only one thread writes, and no other reads. Because then it is not parallel anymore, it is just single-threaded. – David Raab Feb 18 '16 at 11:04
@SidBurn: It's quite common to split a large amount of work into N pieces and get N threads to do one piece each; where each thread only reads/writes to its piece. Would you call that "not parallel anymore"? – Brendan Feb 19 '16 at 04:46

score 8 · Answer 2 · answered Feb 15 '16 at 18:54

Sort of both. It's faster because it's easier to write your code in a way that's easier to compile faster. You won't necessarily get a speed difference by switching languages, but if you had started with a functional language, you could have probably done the multithreading with a lot less programmer effort. Along the same lines, it's a lot easier for a programmer to make threading mistakes that will cost speed in an imperative language, and a lot more difficult to notice those mistakes.

The reason is imperative programmers generally try to put all the lock-free, threaded code in as small a box as possible, and escape it as soon as possible, back into their comfortable mutable, synchronous world. Most mistakes that cost you speed are made on that boundary interface. In a functional programming language, you don't have to worry as much about making mistakes on that boundary. Most of your calling code is also "inside the box," so to speak.

sacundim · Answer 3 · 2016-04-21T20:08:59.883

Functional programming doesn't make for faster programs, as a general rule. What it makes is for easier parallel and concurrent programming. There are two main keys to this:

The avoidance of mutable state tends to reduce the number of things that can go wrong in a program, and even more so in a concurrent program.
The avoidance of shared-memory and lock-based synchronization primitives in favor of higher-level concepts tends to simplify synchronization between threads of code.

One excellent example of point #2 is that in Haskell we have a clear distinction between deterministic parallelism vs. non-deterministic concurrency. There's no better explanation than quoting Simon Marlow's excellent book Parallel and Concurrent Programming in Haskell (quotes are from Chapter 1):

A parallel program is one that uses a multiplicity of computational hardware (e.g., several processor cores) to perform a computation more quickly. The aim is to arrive at the answer earlier, by delegating different parts of the computation to different processors that execute at the same time.

By contrast, concurrency is a program-structuring technique in which there are multiple threads of control. Conceptually, the threads of control execute “at the same time”; that is, the user sees their effects interleaved. Whether they actually execute at the same time or not is an implementation detail; a concurrent program can execute on a single processor through interleaved execution or on multiple physical processors.

In addition to this, Marlow mentions also brings up the dimension of determinism:

A related distinction is between deterministic and nondeterministic programming models. A deterministic programming model is one in which each program can give only one result, whereas a nondeterministic programming model admits programs that may have different results, depending on some aspect of the execution. Concurrent programming models are necessarily nondeterministic because they must interact with external agents that cause events at unpredictable times. Nondeterminism has some notable drawbacks, however: Programs become significantly harder to test and reason about.

For parallel programming, we would like to use deterministic programming models if at all possible. Since the goal is just to arrive at the answer more quickly, we would rather not make our program harder to debug in the process. Deterministic parallel programming is the best of both worlds: Testing, debugging, and reasoning can be performed on the sequential program, but the program runs faster with the addition of more processors.

In Haskell the parallelism and concurrency features are designed around these concepts. In particular, what other languages group together as one feature set, Haskell splits into two:

Deterministic features and libraries for parallelism.
Non-deterministic features and libraries for concurrency.

If you're just trying to speed up a pure, deterministic computation, having deterministic parallelism often makes things much easier. Often you just do something like this:

Write a function that produces a list of answers, each of which is expensive to compute but don't very much depend on each other. This is Haskell, so lists are lazy—the values of their elements are not actually computed until a consumer demands them.
Use the Strategies library to consume your function's result lists' elements in parallel across multiple cores.

I actually did this with one of my toy project programs a few weeks ago. It was trivial to parallelize the program—the key thing I had to do was, in effect, add some code that says "compute the elements of this list in parallel" (line 90), and I got a near-linear throughput boost in some of my more expensive test cases.

Is my program faster than if I had gone with conventional lock-based multithreading utilities? I very much doubt so. The neat thing in my case was getting so much bang out of so little buck—my code is probably very suboptimal, but because it's so easy to parallelize I got a big speedup out of it with much less effort than properly profiling and optimizing it, and no risk of race conditions. And that, I would claim, is the main way functional programming allows you to write "faster" programs.

score 2 · Answer 4 · edited Feb 15 '16 at 18:47

2

In Haskell, modification is literally impossible without getting special modifiable variables through a modification library. Instead, functions create the variables they need at the same time as their values (which are computed lazily), and garbage collected when no longer needed.

Even when you do need modification variables, you can usually get by using sparely, and along with the unmodifiable variables. (Another nice thing in haskell is STM, which replaces locks with atomic operations, but I'm not sure if this is only for functional programming or not.) Usually, only one part of the program will need to be made parallel to improve things performance-wise.

This makes parallelism in Haskell easy a lot of the time, and in fact efforts are under way to make it automatic. For simple code, the parallelism and logic can even be separated.

Also, due to the fact that evaluation order doesn't matter in Haskell, the compiler just creates a queue things that need evaluated, and sends them to whatever cores are avaible, so you can make a bunch of "threads" that don't actually become threads until necessary. Evaluation order not mattering is characteristic of purity, which usually necessitates functional programming.

Further Reading
Parallelism in Haskell (HaskellWiki)
Concurrent and Multicore Programming in "Real-World Haskell"
Parallel and Concurrent Programming in Haskell by Simon Marlow

edited Feb 15 '16 at 18:47

Robert Harvey

198,589
55
464
673

answered Feb 15 '16 at 18:37

PyRulez

148
7

7

`grep java this_post`. `grep scala this_post` and `grep jvm this_post` return no results :) – Andres F. Feb 15 '16 at 19:21
4

The question is vague. In the title and the first paragraph, it asks about functional programming *in general*, in the second and third paragraph, it asks about Java and Scala *in particular*. That is unfortunate, especially since one of the core strengths of Scala is precisely the fact that it is *not* (just) a functional language. Martin Odersky calls it "post-functional", others call it "object-functional". There are two different definitions of the term "functional programming". One is "programming with first-class procedures" (the original definition as applied to LISP), the other is … – Jörg W Mittag Feb 15 '16 at 23:03
2

"programming with referentially transparent, pure, side-effect free functions and immutable persistent data" (a much stricter, and also newer interpretation). This answer addresses the second interpretation, which makes sense, because a) the first interpretation is totally unrelated to parallelism and concurrency, b) the first interpretation has become basically meaningless since with the exception of C almost every language in even modestly wide-spread use today has first-class procedures (including Java), and c) the OP asks about the difference between Java and Scala, but there is no … – Jörg W Mittag Feb 15 '16 at 23:06
2

between the two regarding definition #1, only definition #2. – Jörg W Mittag Feb 15 '16 at 23:06
The evaluation thing isn't quite true as it is written here; By default, the runtime doesn't use multithreading at all, and IIRC even if you do enable multithreading you still have to tell the runtime what things it should evaluate in parallel. – Cubic Feb 16 '16 at 12:59
@JörgWMittag Yeah, that's why I didn't downvote. I just found it funny, and it's unlikely the OP really cares about Haskell. As for Scala, the "object-functional" thing is both its strength and its *weakness*. Because it tries to do all things, it manages to do some of them less than greatly. Because of this, the FP aspects of Scala are messy and more convoluted than, say, Haskell. – Andres F. Feb 16 '16 at 13:39
@AndresF. I had a note about that very thing, but someone edited in away. – PyRulez Feb 16 '16 at 14:30
@Cubic I said it was easy, but not yet automatic. (Also, for `GHC` (which is basically what every haskeller uses unless they are targeting something weird, and most everything conforms to it anyway) it is simply a command line flag to turn on parallel (assuming you added some annotations for parallelness).) – PyRulez Feb 16 '16 at 14:48
@JörgWMittag: It's not true that your "definition 2" is newer, nor was "definition 1" somehow "original" - "definition 2" precisely describes the Lambda Calculus, which is the formalism that inspired LISP, and hence preceded it. (It is, however, true that useful implementations of D1 came long before implementations of D2!) – psmears Feb 17 '16 at 13:49

Is functional programming faster in multithreading because I write things differently or because things are compiled differently?

4 Answers4