50

I've been reading over and over that functional languages are ideal (or at least very often useful) for parallelism. Why is this? What core concepts and paradigms are typically employed and which specific problems do they solve?

On an abstract level, for example, I can see how immutability might be useful in preventing race conditions or other problems stemming from resource competition, but I can't put it any more specifically than that. Please note, however, that my question is broader in scope than just immutability -- a good answer will provide examples of several relevant concepts.

Louis Thibault
  • 2,178
  • 3
  • 16
  • 17
  • 1
    Anyone hoping to add a new answer are encouraged to briefly review (or refresh) [the list of features](http://programmers.stackexchange.com/questions/279316/what-exactly-makes-the-haskell-type-system-so-revered-vs-say-java). – rwong Aug 17 '15 at 07:45

5 Answers5

72

The main reason is that referential transparency (and even more so laziness) abstracts over the execution order. This makes it trivial to parallelize evaluation.

For example, if both a, b, and || are referentially transparent, then it doesn't matter if in

a || b

a gets evaluated first, b gets evaluated first, or b doesn't get evaluated at all (because a was evaluated to true).

In

a || a

it doesn't matter if a gets evaluated once or twice (or, heck, even 5 times … which wouldn't make sense, but doesn't matter nonetheless).

So, if it doesn't matter in which order they are evaluated and it doesn't matter whether they are evaluated needlessly, then you can simply evaluate every sub-expression in parallel. So, we could evaluate a and b in parallel, and then, || could wait for either one of the two threads to finish, look what it returned, and if it returned true, it could even cancel the other one and immediately return true.

Every sub-expression can be evaluated in parallel. Trivially.

Note, however, that this is not a magic bullet. Some experimental early versions of GHC did this, and it was a disaster: there was just too much potential parallelism. Even a simple program could spawn hundreds, thousands, millions of threads and for the overwhelming majority of sub-expressions spawning the thread takes much longer than just evaluating the expression in the first place. With so many threads, context switching time completely dominates any useful computation.

You could say that functional programming turns the problem on its head: typically, the problem is how to break apart a serial program into just the right size of parallel "chunks", whereas with functional programming, the problem is how to group together parallel sub-programs into serial "chunks".

The way GHC does it today is that you can manually annotate two subexpressions to be evaluated in parallel. This is actually similar to how you would do it in an imperative language as well, by putting the two expressions into separate threads. But there is an important difference: adding this annotation can never change the result of the program! It can make it faster, it can make it slower, it can make it use more memory, but it cannot change its result. This makes it way easier to experiment with parallelism to find just the right amount of parallelism and the right size of chunks.

Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318
  • 2
    +1 even though I'd noticed the problem with too much parallelism before, I had never looked at it the way you did in your answer. This gives me a great perspective, thanks for writing this. – user541686 Aug 17 '15 at 07:14
  • 2
    I like your last paragraph. You could also draw a parallel (...no pun intended) to indexes in relational databases. You can add or remove them at will; they don't change the result, just how quickly the computation can be made. It's an optimization detail. – jpmc26 Aug 17 '15 at 07:59
  • 1
    Why not use thread pooling instead of spawning one thread per expression? – Kevin Aug 17 '15 at 14:00
  • 2
    @Kevin: because I glossed over some details in order to not make a long answer even longer. In reality, GHC will create a *spark*, not a thread. However, I wanted to use term that would be immediately familiar, without having to write half of a tutorial about the internals of the GHC runtime. (Plus, I don't know enough about the internals of the GHC runtime :-D ) Anyway, it doesn't really matter. The problem doesn't fundamentally change, it just pushed farther out: yes, sparks are cheaper to create than threads, but the cost still exists. Yes, context switching sparks is cheaper than context … – Jörg W Mittag Aug 18 '15 at 00:06
  • 2
    … switching threads, but the cost still exists. Yes, sparks use less memory than threads, but the overhead is still there. Talking about sparks instead of threads will just move the thresholds around, but not fundamentally change the problem or even make it disappear. No matter how cheap sparks are, there will still be some expressions which are *so* trivial that even the small overhead of sparks is unreasonable. Okay, so how do sparks get scheduled? I don't know actually. Work stealing is all the rage these days, so maybe that's what GHC uses? Maybe it *does* use a thread pool as you suggest. – Jörg W Mittag Aug 18 '15 at 00:09
  • @Kevin: [found it](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/lang-parallel.html#idp30002304): "Sparks are queued for execution in FIFO order, but are not executed immediately. If the runtime detects that there is an idle CPU, then it may convert a spark into a real thread, and run the new thread on the idle CPU. In this way the available parallelism is spread amongst the real CPUs." – Jörg W Mittag Aug 18 '15 at 00:15
  • Clear and example-rich answer. This is exactly the information I was looking for. Many thanks! – Louis Thibault Aug 22 '15 at 23:20
  • Is that explanation not only true for Haskell? F# and ML allows function with side-effects. They also allow mutable variables. So you cannot assume referential transparency in those languages. So for those languages this answer does not work!? – David Raab Sep 15 '15 at 14:50
  • @SidBurn: There are several different definitions of the term "functional programming". One is "programming with first-class and higher-order procedures" (this is the original meaning which was invented for languages like LISP). However, nowadays this definition is quite meaningless, because all languages (except C) have first-class and higher-order procedures, and pretty much every non-trivial program will use them. IOW: all languages are functional languages and every program is a functional program. The other definition is "programming with referentially transparent functions". That's the … – Jörg W Mittag Sep 15 '15 at 19:23
  • … one I'm using here. Note that while languages like Scala, Scheme, Clojure, SML, OCaml, F# *do* allow side-effects, they also *avoid* them. In Scala, for example, the vast majority of the standard library (with the obvious exceptions of the mutable collections, I/O, and actors) is at least externally referentially transparent (i.e. it may privately use local non-shared mutable state, but doesn't leak this implementation detail outside). And there are libraries like Scalaz which introduce monadic referentially transparent I/O. – Jörg W Mittag Sep 15 '15 at 19:27
  • @JörgWMittag Is the GHC referring to Haskell? – Kellen Stuart Sep 26 '16 at 21:19
  • @KolobCanyon: GHC is one of several implementations of Haskell. (Others are Hugs, JHC, YHC, LHC.) – Jörg W Mittag Sep 27 '16 at 00:24
6

Lets first look at why procedural programming is so bad at concurrent threads.

With a concurrent programming model, you are writing sequential instructions that (by default) expect to be run in isolation. When you introduce multiple threads, you need to explicitly control access to prevent concurrent access to a shared variable when those changes may affect each other. This is difficult programming to get right, and in testing, it is impossible to prove that it has been done safely. At best, you can only confirm that on this test run, no observable issues occurred.

In functional programming, the problem is different. There is no shared data. There is no concurrent access to the same variable. Indeed, you can only set a variable once, there are no "for loops", there are just blocks of code that when executed with a given set of values will always perform the same outcome. This makes testing both predictable and a good indicator of correctness.

The problem the developer has to solve in functional programming is how to design the solution minimising common state. When the design problem requires high levels of concurrency and minimal shared state, functional implementations are a very effective strategy.

Michael Shaw
  • 9,915
  • 1
  • 23
  • 36
4

Minimised shared state

What is it about functional programming that makes it inherently adapted to parallel execution?

The pure nature of the functions (referential transparency), i.e. having no side effects, leads to fewer shared objects and hence less shared state.

A simple example is;

double CircleCircumference(double radius)
{
  return 2 * 3.14 * radius; // constants for illustration
}

The output is solely dependent on the input, there is no state that is included; in contrast to a function such as GetNextStep(); where the output is dependent on what the current step is (typically held as a data member of an object).

It is the shared state that needs access to be controlled via mutexes and locks. Having stronger guarantees about shared state allows for better parallel optimisations and better parallel composition.


Referential Transparency and pure expressions

More detail on what referential transparency is can be found here on Programmers.SE.

[It] means that you can replace any expression in the program with the result of evaluating that expression (or vice versa) without changing the meaning of the program. Jörg W Mittag

Which in turn allows pure expressions that are used to construct pure functions - functions that evaluate to the same result given the same input arguments. Each expression can thus be evaluated in parallel.

Niall
  • 1,831
  • 1
  • 18
  • 20
1

Pure functional code is thread-safe by default.

That, by itself, is already a huge win. In other programming languages, designing blocks of code that are completely thread-safe can be really, really challenging. But a pure functional language forces you to make everything thread-safe, except in the few places where you explicitly do something that's not thread-safe. You could say that in imperative code, being thread-safe is explicit [and hence typically rare], whereas in pure functional code, thread-unsafe is explicit [and hence typically rare].

In an imperative language, most of the problem is how to add enough locking to prevent weird data races, but not too much to kill performance or provoke random deadlocks. In a pure functional language, most of the time such considerations are moot. The problem now is "only" figuring out how to divide the work up evenly.

At one extreme, you have a tiny number of parallel tasks which don't use all available cores. At the other extreme, you have billions of tiny tasks which introduce so much overhead that everything slows to a crawl. Of course, trying to figure out "how much work" a particular function call does is often very difficult indeed.

All of these problems exist in imperative code as well; it's just that the imperative programmers are usually too busy wrestling with just trying to make the thing work at all to worry about delicate fine-tuning of task sizes.

1

The hardest part of writing parallel code is to stop one thread from reading data that is being updated by anther thread.

A common solution to this is to use immutable objects, so that once an object is created it is never updated. But in real life data has to be change, therefore “persistence” data is used, where every update returns a new object – this can be make efficient by careful choose of data structures.

As functional programming does not allow any side effects, “persistence objects” are normally used when doing functional programming. So the pain a functional programmer is forced to take due to the lack of side effects, leads to a solution that works well for parallel programming.

An additional benefit being that functional language systems check that you are keeping to the rules, and have lots of tools to help you do so.

Ian
  • 4,594
  • 18
  • 28