I am looking for an example where an algorithm is apparently changing its complexity class due to compiler and/or processor optimization strategies.
-
2This could definitely happen for space complexity with something like removing tail recursion. – Eliot Ball May 26 '13 at 14:21
-
4Simple: an empty loop O(n) may to optimised away O(1): see this SO post: http://stackoverflow.com/questions/10300253/will-an-empty-for-loop-used-as-a-sleep-be-optimized-away – Doc Brown May 26 '13 at 14:34
-
It's easier to find examples of this in Haskell, though it's not really optimisation - just the lazy semantics of the language which mean that potentially large chunks of code for functions that *were* called won't be evaluated because the results are never used. It's even quite common in Haskell to define unboundedly recursive functions returning infinite lists. As long as you only use a finite chunk of the list, only a finite amount of the recursion is ever evaluated (oddly enough, possibly non-recursively) and only that finite part of the list is computed. – May 26 '13 at 16:08
-
1@Steve314: I believe there was an example of this in the Computer Language Benchmark Game, where the Haskell implementation was 10x faster than the C implementation due to the simple fact that the results of the benchmark were never printed and thus the entire Haskell program essentially compiled down to `int main(void) { exit(0); };` – Jörg W Mittag May 26 '13 at 16:11
-
@Jörg - it wouldn't surprise me, but I think the Haskell devs were cheating. You *can* force something to be evaluated eagerly in Haskell if you need to and oddly enough it's mainly there for optimisation - strict/eager evaluation is often much faster than lazy because it avoids the overheads for deferring evaluation. A good Haskell compiler catches a lot of this itself using "strictness analysis", but there are times when you have to force the issue. – May 26 '13 at 16:21
-
@Steve314: I believe the rules of the CLBG say that you must implement the algorithm *exactly* as the reference implementation specifies, and the reference algorithm never actually used the array that was being sorted, ergo forcing the array to be used would actually have been a violation of the rules. The correct fix would be to change the reference implementation to print the sorted values to stdout. – Jörg W Mittag May 26 '13 at 18:53
-
@Jörg - if because of laziness the result was never evaluated, they didn't implement the reference algorithm - they implemented a lazy-transformed variant. You *need* that forced sequential evaluation to force the Haskell compiler to implement the correct algorithm rather than the variant. If spelling things correctly so you get the algorithm as specified is a violation of the rules, the rules must be pretty strange. – May 26 '13 at 19:12
3 Answers
Tail Call Optimization may reduce the space complexity. For example, without TCO, this recursive implementation of a while
loop has a worst-case space complexity Ο(#iterations)
, whereas with TCO it has a worst-case space complexity of Ο(1)
:
// This is Scala, but it works the same way in every other language.
def loop(cond: => Boolean)(body: => Unit): Unit = if (cond) { body; loop(cond)(body) }
var i = 0
loop { i < 3 } { i += 1; println(i) }
// 1
// 2
// 3
// E.g. ECMAScript:
function loop(cond, body) {
if (cond()) { body(); loop(cond, body); };
};
var i = 0;
loop(function { return i < 3; }, function { i++; print(i); });
This doesn't even need general TCO, it only needs a very narrow special case, namely elimination of direct tail recursion.
What would be very interesting though, is where a compiler optimization not just changes the complexity class but actually changes the algorithm completely.
The Glorious Glasgow Haskell Compiler sometimes does this, but that's not really what I am talking about, that's more like cheating. GHC has a simple Pattern Matching Language that allows the developer of the library to detect some simple code patterns and replace them with different code. And the GHC implementation of the Haskell standard library does contain some of those annotations, so that specific usages of specific functions which are known to be inefficient are rewritten into more efficient versions.
However, these translations are written by humans, and they are written for specific cases, that's why I consider that cheating.
A Supercompiler may be able to change the algorithm without human input, but AFAIK no production-level supercompiler has ever been built.

- 101,921
- 24
- 218
- 318
-
Thanks for the great example, and for mentioning GHC. One more question: What about Out of Order Execution. Is there any known example where this kind of optimization led to a change of an algorithm's complexity class? – Lorenz Lo Sauer May 29 '13 at 19:42
Lets take a simple program which prints the square of a number entered on the command line.
#include <stdio.h>
int main(int argc, char **argv) {
int num = atoi(argv[1]);
printf("%d\n",num);
int i = 0;
int total = 0;
for(i = 0; i < num; i++) {
total += num;
}
printf("%d\n",total);
return 0;
}
As you can see, this is a O(n) calculation, looping over and over again.
Compiling this with gcc -S
one gets a segment that is:
LBB1_1:
movl -36(%rbp), %eax
movl -28(%rbp), %ecx
addl %ecx, %eax
movl %eax, -36(%rbp)
movl -32(%rbp), %eax
addl $1, %eax
movl %eax, -32(%rbp)
LBB1_2:
movl -32(%rbp), %eax
movl -28(%rbp), %ecx
cmpl %ecx, %eax
jl LBB1_1
In this you can see the add being done, a compare and a jump back for the loop.
Doing the compile with gcc -S -O3
to get optimizations the segment between the calls to printf:
callq _printf
testl %ebx, %ebx
jg LBB1_2
xorl %ebx, %ebx
jmp LBB1_3
LBB1_2:
imull %ebx, %ebx
LBB1_3:
movl %ebx, %esi
leaq L_.str(%rip), %rdi
xorb %al, %al
callq _printf
One can now see instead it has no loop and furthermore, no adds. Instead there is a call to imull
which multiplies the number by itself.
The compiler has recognized a the loop and the math operator inside and replaced it by the proper calculation.
Note that this included a call to atoi
to get the number. When the number exists already in the code, the complier will pre-calculate the value rather than making actual calls as demonstrated in a comparison between the performance of sqrt in C# and C where sqrt(2)
(a constant) was being summed across a loop 1,000,000 times.
A compiler which is aware that the language is using big-num doing strength reduction (replacing multiplications by the index of a loop by an addition) would change the complexity of that multiplication from O(n log n) at best to O(n).

- 10,404
- 1
- 30
- 45