Do compilers like Javac automatically detect pure functions and parallelize them?

Question

Pure functions are known to facilitate parellelizing. What is it about functional programming that makes it inherently adapted to parallel execution?

Are compilers such as Javac smart enough to detect when a method is a pure function? One can always implement classes which implement functional interfaces such as Function, but have side effects.

The question is not only whether the compiler can know if a function is pure, but also whether it can intelligently schedule parallel execution of pure functions. It is not enough to fire off a new thread for each one: this is inefficient. GHC (Haskell) deals with this using laziness and "green threads"; I would honestly be surprised if any impure language even tried, given the additional difficulty of making sure the pure threads were correctly scheduled with respect to the main impure thread. — Ryan Reich, Jul 09 '17 at 07:54
@RyanReich, are there any performance gains of using functional programming in an impure functional language such as Java? are the gains of functional programming purely functional such as modularity? — Naveen, Jul 09 '17 at 08:53
@RyanReich GHC deals with the problem by having the programmer annotate when they want parallelism. Purity implies that these annotations never change semantics, just performance. (There are also concurrency mechanisms that can give rise to parallelism, but this is a different kettle of fish.) — Derek Elkins left SE, Jul 09 '17 at 09:19
@Naveen There are other benefits to pure functions with regards to optimization besides parallelism such as greater freedom reordering code, memoization and common subexpression elimination. I could be wrong, but I doubt javac attempts to detect purity though, as it's probably fairly rare in idiomatic code and somewhat difficult for all but the most trivial cases. For example, you need to know that there won't be any `NullPointerException`s. The benefits of optimizations based on this are also probably fairly small for typical Java applications. — Derek Elkins left SE, Jul 09 '17 at 09:25
@Derek you don't have to annotate anything for GHC to utilize multiple cores. Just pass a runtime parameter. — Ryan Reich, Jul 09 '17 at 17:18
javac is the java compiler, which takes java source code and generates java byte code class files. It is pretty constrained as to what it can (and is supposed to) do. It does not have the liberty to or the necessary underlying mechanisms for introducing parallelism into the byte code class file. — Erik Eidt, Jul 09 '17 at 17:49
@RyanReich Which, other than the parallel GC, will not lead to any actual parallelism unless you add `par` annotations or use `forkIO`. GHC won't automatically add `par` annotations (which is what I'm talking about, there are other mechanisms like parallel arrays), though it could, and it certainly won't automatically add `forkIO` (which is part of the concurrency mechanisms), though it does use asynchronous I/O. If you don't provide any annotations or explicitly fork threads, you will only do computation on one core no matter what flags you pass to the compiler and RTS. — Derek Elkins left SE, Jul 09 '17 at 19:14
@ErikEidt: I don't think it's a question of being "constrained". It's more a question of return-on-investment: if Oracle's C2, Azul's Falcon, IBM's J9 JIT etc. can do a better job at optimization, why would Oracle / Azul / IBM invest money in complex static analysis and static optimizations in javac? Especially, since improvements in the JIT benefit Ruby, Clojure, Scala, ECMAScript, Python, Groovy, Frege, Kotlin, Ceylon, Scheme, Common Lisp, Ioke, Seph, and so on as well, instead of only Java. It *is* constrained in the kinds of control-flow optimizations, though, since it needs to generate … — Jörg W Mittag, Jul 09 '17 at 21:22
… legal stack operations that will satisfy the byte code verifier. (So, no crazy control-flow optimizations; in fact, there is no (unrestricted) `GOTO` anyway in the byte code.) — Jörg W Mittag, Jul 09 '17 at 21:24
@JörgWMittag, i take your point -- part of what I meant was it is constrained to (1) honor the Java language, and (2) generate JVM byte code (and thus doesn't have direct access to operating system or hardware constructs). — Erik Eidt, Jul 09 '17 at 21:38
@ErikEidt: Yes, the latter is definitely a restriction, especially the lack of a universal control-flow construct like `GOTO`, proper tail-calls, or continuations (JVML *does* have `GOTO`, but it is only intra-method, you cannot jump into a different method, and due to size constraints, you also can't compile the whole program into a single method … plus, you would then no longer be able to interoperate with this class file from other class files, since it would only contain one giant method.) Exceptions can be used to encode universal control-flow (including `GOTO`), but that would be SLOW. — Jörg W Mittag, Jul 09 '17 at 21:42
@ErikEidt And (3) generate the code in a "normal" way, so common patterns can be recognized by the JIT. The only such pattern I'm aware of, is the string concatenation using `StringBuilder`; JIT relies on how it exactly looks like and generating equivalent but different code can lead to JIT missing some optimizations (IIRC there was such a bug). `+++` As inlining and loop unrolling are the most fundamental optimizations, javac *must not* optimize when JIT should work well. — maaartinus, Jul 09 '17 at 21:59

score 34 · Answer 1 · edited Jul 10 '17 at 04:00

are compilers such as Javac smart enough to detect when a method is a pure function.

It's not a question of "smart enough". This is called Purity Analysis and is provably impossible in the general case: it is equivalent to solving the Halting Problem.

Now, of course, optimizers do provably impossible things all the time, "provably impossible in the general case" doesn't mean that it never works, it only means that it cannot work in all cases. So, there are in fact algorithms to check whether a function is pure or not, it's just that more often than not the result will be "I don't know", which means that for reasons of safety and correctness, you need to assume that this particular function might be impure.

And even in the cases where it does work, the algorithms are complex and expensive.

So, that is Problem #1: it only works for special cases.

Problem #2: Libraries. In order for a function to be pure, it can only ever call pure functions (and those functions can only call pure functions, and so on and so forth). Javac obviously only knows about Java, and it only knows about code it can see. So, if your function calls a function in another compilation unit, you cannot know whether it is pure or not. If it calls a function written in another language, you can't know. If it calls a function in a library which might not even be installed yet, you can't know. And so on.

This only works, when you have whole-program analysis, when the entire program is written in the same language, and all is compiled at once in one go. You can't use any libraries.

Problem #3: Scheduling. Once you have figured out which parts are pure, you still have to schedule them to separate threads. Or not. Starting and stopping threads is very expensive (especially in Java). Even if you keep a thread pool and don't start or stop them, thread context switching is also expensive. You need to be sure that the computation will run significantly longer than the time it takes to schedule and context switch, otherwise you will lose performance, not gain it.

As you probably guessed by now, figuring out how long a computation will take is provably impossible in the general case (we cannot even figure out whether it will take a finite amount of time, let alone how much time) and hard and expensive even in the special case.

Aside: Javac and optimizations. Note that most implementations of javac don't actually perform many optimizations. Oracle's implementation of javac, for example, relies on the underlying execution engine to do optimizations. This leads to another set of problems: say, javac decided that a particular function is pure and it is expensive enough, and so it compiles it to be executed on a different thread. Then, the platform's optimizer (for example, the HotSpot C2 JIT compiler) comes along and optimizes the entire function away. Now, you have an empty thread doing nothing. Or, imagine, again, javac decides to schedule a function on a different thread, and the platform optimizer could optimize it away completely, except it cannot perform inlining across thread boundaries, and so a function that could be optimized away completely is now needlessly executed.

So, doing something like this only really makes sense if you have a single compiler making most of the optimizations in one go, so that the compiler knows about and can exploit all the different optimizations at different levels and their interactions with each other.

Note that, for example, the HotSpot C2 JIT compiler actually does perform some auto-vectorization, which is also a form of auto-parallelization.

Well, depending on your definition of "pure function", using impure functions in the implementation may be allowed. — Deduplicator, Jul 09 '17 at 11:21
@Deduplicator Well, depending on your definition of `definition`, using a disparate `definition` of `purity` is probably obscure — cat, Jul 09 '17 at 17:10
Your problem #2 is mostly invalidated by the fact that practically all optimizations get executed by the JIT (you obviously know it, but ignore it). Similarly problem #3 gets partially invalidated as JIT heavily relies on statistics gathered by the interpreter. I especially disagree with "You can't use any libraries" as there's deoptimization to the rescue. I agree that the added complexity would be a problem. — maaartinus, Jul 09 '17 at 20:46
@maaartinus: The question *specifically* uses javac as an example. I am assuming that if the OP had wanted to ask about C2, he would have asked about C2. I specifically *do* mention that Oracle's implementation of javac heavily relies on Oracle's C2 compiler, but again, that's not what the OP is asking. He asking about javac; and the general assumption is that javac is a compiler from Java source code to JVM byte code. There are other Java compilers out there, but they are typically not called "javac" in order to avoid confusion. (E.g. GCJ.) — Jörg W Mittag, Jul 09 '17 at 21:16
@maaartinus: Besides, only the very end of my answer is specific to javac. I specifically *do* mention, for example, that "This only works, when you have whole-program analysis, when the entire program is written in the same language, and all is compiled at once in one go." This is obviously true for C2: it only deals with one language (JVM bytecode), and it has access to the entire program at once. — Jörg W Mittag, Jul 09 '17 at 21:19
@JörgWMittag I know that the OP asks about javac, but I'd bet they're assuming that javac is the thing responsible for optimizations. And that they hardly know that there's C2. *I'm not saying, your answer is bad.* It's just that letting javac do *any* optimization (except for trivialities like using `StringBuilder`) is a non-sense, so I'd dismiss it and simply assume, the OP writes javac but means Hotspot. Your problem #2 is a pretty good reason against optimizing *anything* in javac. — maaartinus, Jul 09 '17 at 21:47
What do you all think about the notion of a `threadsafe` keyword that might impose such restrictions in advance? I was contemplating it for a C transpiler rather than Java, but outside of a critical section/lock, it could only allow other `threadsafe` functions to be called, would not allow access of static/global variables (not even for read purposes), can only accept const references/pointers, and does not allow const casts, things of this sort. We basically try to guarantee it upfront. Might that be feasible? Admittedly I haven't thought this through so deeply... — , Feb 01 '18 at 04:22
... it's just kind of a pipe dream I have, because the hard guarantee of thread safety would be so, so nice. I wasn't looking to use it to automate multithreading but simply as a layer of enforced safety so that we can more easily reason about it. I could see a case where what we do inside a critical section/lock in such a function may not actually be safe to call across threads (ex: possible deadlock), but there I'd be okay making that a programmer responsibility, and hopefully not so many functions need to lock in the first place. — , Feb 01 '18 at 04:23

score 5 · Answer 2 · answered Jul 09 '17 at 12:29

5

The upvoted answer failed to note one thing. Synchronous communication between threads is extremely expensive. If the function is capable of being executed at a rate of many million calls per second, it actually hurts you more to parallelize it rather than to leave it as-is.

The fastest form of synchronous inter-thread communication, using busy loops with atomic variables, is unfortunately energy-inefficient. If you have to resort to using condition variables to save energy, the performance of your inter-thread communication suffers.

So, the compiler does not only need to determine whether a function is pure, it would also need to estimate the execution time of the function to see whether parallelization is a net win. Also, it would need to choose between busy loops using atomic variables or condition variables. And it would need to create threads behind your back.

If you create the threads dynamically, it is even slower than using condition variables. So, the compiler would need to set up a certain number of threads already running.

So, the answer to your question is no, compilers are not "smart" enough to auto-parallelize pure functions especially in the Java world. They are smart by not auto-parallelizing them!

answered Jul 09 '17 at 12:29

juhist

2,579
10
14

5

**"_They are smart by not auto-parallelizing them!_"**: This goes too far. While it's true that parallelizing at every possible point just for its own sake would generally be inefficient, a smart compiler would identify a practical parallelization strategy. I think that most people understand this, so when we talk about auto-parallelization, we mean auto-practical-parallelization. – Nat Jul 09 '17 at 14:43
@Nat: Ridiculously too hard. This would require identifying pure functions on the runtime scale of 100s of milliseconds, and expecting the compiler to get any idea of runtime of loops that don't have constants in their iterations (and the cases you want don't) is silly. – Joshua Jul 10 '17 at 04:06
I agree - @Nat's comment implies that parallelization does not necessarily mean multiple threads, which is true. The JIT could, for example, inline multiple calls to a pure function and interleave their CPU instructions in certain cases. For example, if both method calls fetch a constant, it could be fetched once and kept in a CPU register for both instances of the method to use. Modern CPUs are beasts with numerous general-purpose registers and specialized instructions that can be quite helpful when optimizing code. – Jul 10 '17 at 04:09
1

@Joshua: Much easier indeed for a JIT compiler. The JIT compiler also can figure out that a function may not be pure, but no call so far has invoked impure behaviour. – gnasher729 Jul 10 '17 at 08:29
I agree with @Joshua. I have a hard-to-parallelize algorithm at work. I have tried to manually parallelize it, even by doing some simplifying approximations (and thus modifying the algorithm), and have miserably failed every time. Even a program telling whether it's feasible to parallelize something is extremely hard, even though it would be much simpler than actually parallelizing it. Remember we're talking about Turing-complete programming languages. – juhist Jul 10 '17 at 12:51

Do compilers like Javac automatically detect pure functions and parallelize them?

2 Answers2