15

Every single time there's a discussion about a new programming language targetting the JVM, there are inevitably people saying things like:

"The JVM doesn't support tail-call optimization, so I predict lots of exploding stacks"

There are thousands of variations on that theme.

Now I know that some language, like Clojure for example, have a special recur construct that you can use.

What I don't understand is: how serious is the lack of tail-call optimization? When should I worry about it?

My main source of confusion probably comes from the fact that Java is one of the most succesful languages ever and quite a few of the JVM languages seems to be doing fairly well. How is that possible if the lack of TCO is really of any concern?

Cedric Martin
  • 1,067
  • 10
  • 16
  • 4
    if you have recursion deep enough to blow the stack without TCO then you'll have a problem even with TCO – ratchet freak Nov 07 '13 at 21:10
  • @ratchetfreak: that I understand. My question is how serious a problem this is. Seen that Java doesn't (yet) have TCO and is one of the most succesful language ever, surely the problem isn't that bad!? – Cedric Martin Nov 07 '13 at 21:15
  • 18
    @ratchet_freak That's nonsense. Scheme doesn't even have loops, but because the spec mandates TCO support, recursive iteration over a large set of dats is no more expensive than an imperative loop (with the bonus that the Scheme construct returns a value). – itsbruce Nov 07 '13 at 22:01
  • 6
    @ratchetfreak TCO is a mechanism for making recursive functions written a certain way (i.e. tail-recursively) be completely unable to blow the stack even if they wanted to. Your statement only makes sense for recursion that is not written tail-recursively, in which case you are correct and TCO won't help you. – Evicatos Nov 08 '13 at 00:43
  • 1
    @Evicatos: tail recursion is only one small subset of tail calls. And it's not even the most interesting one. Direct tail recursion can be replaced with loops, by using only local transformations, i.e. without changing the global structure of the program. Indirect tail recursion is uglier, but still possible. Other patterns, however, require the use of trampolines or something like that, and a global rewrite of the entire program, to avoid blowing the stack. When you write code in CPS, for example, you *never* return, the stack *never* gets smaller! You simply can't do that without PTC. – Jörg W Mittag Nov 08 '13 at 11:41
  • 2
    Last I looked, the 80x86 doesn't do (native) tail-call optimization either. But that hasn't stopped language developers from porting languages that use it. The compiler identifies when it can use a jump versus a jsr, and everyone's happy. You can do the same thing on a JVM. – kdgregory Nov 08 '13 at 17:42
  • @JörgWMittag You're right. The point I was trying to make is that ratchet freak's statement only makes sense if you are talking about recursion that doesn't use tail calls at all. – Evicatos Nov 08 '13 at 18:01
  • 1
    @kdgregory: The question is not whether you can do it. The question is what price you pay. As Rich Hickey (creator of Clojure) said: Tail Calls, Performance, Java Interop – Pick two. – Jörg W Mittag Nov 09 '13 at 03:54
  • @JörgWMittag - no, the question -- at least the OP's question -- was whether lack of tail-call optimization *in the JVM* was a problem. My comment pointed out that the question was irrelevant, because TCO is a feature of a language implementation rather than machine architecture (ie, the JVM is identical to the 80x86 in this respect). – kdgregory Nov 09 '13 at 11:17
  • 3
    @kdgregory: But the x86 has `GOTO`, the JVM doesn't. And x86 isn't used as an interop platform. The JVM doesn't have `GOTO` and one of the main reasons for choosing the Java Platform is interop. If you want to implement TCO on the JVM, you have to do *something* to the stack. Manage it yourself (i.e. don't use the JVM call stack at all), use trampolines, use exceptions as `GOTO`, something like that. In all of those cases, you become incompatible with the JVM call stack. It's impossible to be stack-compatible with Java, have TCO, and high performance. You have to sacrifice one of those three. – Jörg W Mittag Nov 10 '13 at 03:25
  • 1
    @JörgWMittag - http://docs.oracle.com/javase/specs/jvms/se5.0/html/Instructions2.doc5.html#goto.Operation (old version intentional, it's also in the SE7 edition) – kdgregory Nov 10 '13 at 12:02
  • @JörgWMittag Java doesn't have goto but the JVM does: https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.goto – JimmyJames Jul 26 '16 at 14:19
  • @JimmyJames: It does have a restricted *intra-method* `GOTO`, but the whole point of TCO is that the call to an *arbitrary* method compiles to (the equivalent of) `GOTO`. To make that work on the JVM, you would have to compile your entire system into a single method and simulate your own call stack, which (apart from blowing the method size limit) breaks Java interop. – Jörg W Mittag Jul 26 '16 at 18:59
  • @JörgWMittag Unfortunately I don't have enough background in this area to do anything other than accept your word on it. I take it that the upshot is that tail recursion is simple but that these other forms are not easy on the JVM. I may not be the most interesting TCO but would it be fair to say that the tail recursion is the most commonly used? – JimmyJames Jul 26 '16 at 19:06
  • @JimmyJames: The JVM's `GOTO` only works *within* the method. It is intended for implementing `for` and `while` loops, which are exactly equivalent to tail-recursion, so it can be used for that, too. But there are lots of interesting things that can be done, once you have Proper Tail Calls with full TCO: e.g. implementing a state machine simply by methods which correspond to states and method calls which correspond to state transitions. That's a very simple encoding of state machines, but unfortunately, those methods *never* return, however, all the calls are tail calls. So, this is exactly … – Jörg W Mittag Jul 26 '16 at 19:12
  • … equivalent to the standard `GOTO`-based implementation you would use in C, but with all the niceties of nicely encapsulated named states (methods). The other thing is Continuation Passing Style. Here, subroutines never return, instead they get passed a function that represents the "rest of the computation" and when they are finished doing their thing, they call that function with their "return value" as an argument … and the next function does the same and so on and so forth. – Jörg W Mittag Jul 26 '16 at 19:15

4 Answers4

17

Consider this, let's say we got rid of all loops in Java (the compiler writers are on strike or something). Now we want to write factorial, so we might right something like this

int factorial(int i){ return factorial(i, 1);}
int factorial(int i, int accum){
  if(i == 0) return accum;
  return factorial(i-1, accum * i);
}

Now we're feeling pretty clever, we've managed to write our factorial even without loops! But when we test, we notice that with any reasonably sized number, we're getting stackoverflow errors since there's no TCO.

In real Java this isn't a problem. If we ever have a tail recursive algorithm, we can transform it into a loop and be just fine. However, what about languages with no loops? Then you're just hosed. That's why clojure has this recur form, without it, it's not even turing complete (No way to do infinite loops).

The class of functional languages that target the JVM, Frege, Kawa (Scheme), Clojure are always trying to deal with the lack of tail calls, because in these languages, TC is the idiomatic way of doing loops! If translated to Scheme, that factorial above would be a good factorial. It'd be awfully inconvenient if looping 5000 times made your program crash. This can be worked around though, with recur special forms, annotations hinting at optimizing self calls, trampolining, whatever. But they all force either performance hits or unnecessary work on the programmer.

Now Java doesn't get off free either, since there's more to TCO then just recursion, what about mutually recursive functions? They can't be straightforwardly translated to loops, but are still unoptimized by the JVM. This makes it spectacularly unpleasant to try to write algorithms using mutual recursion using Java since if you want decent performance/range you have to do dark magic to get it to fit into loops.

So, in summary, this isn't a huge deal for many cases. Most tail calls either only proceed one stackframe deep, with things like

return foo(bar, baz); // foo is just a simple method

or are recursion. However, for the class of TC that don't fit into this, every JVM language feels the pain.

However, there is a decent reason why we don't yet have TCO. The JVM gives us stack traces. With TCO we systematically eliminate stackframes that we know are "doomed", but the JVM might actually want these later for a stacktrace! Say we implement a FSM like this, where each state tail-calls the next. We'd erase all record of previous states so a traceback would show us what state, but not anything about how we got there.

Additionally, and more pressingly, much of bytecode verification is stack based, eliminating the thing that lets us verify bytecode is not pleasant prospect. Between this and the fact that Java has loops, TCO looks like a bit more trouble than it's worth to the JVM engineers.

manlio
  • 4,166
  • 3
  • 23
  • 35
daniel gratzer
  • 11,678
  • 3
  • 46
  • 51
  • 3
    The biggest problem is the byte code verifier, which is completely based on stack inspection. That's a major bug in the JVM specification. 25 years ago, when the JVM was designed, people already said that it would be better to have the JVM byte code language to be safe in the first place rather than having that language be unsafe and then rely on byte code verification after the fact. However, Matthias Felleisen (one of the lead figures in the Scheme community) wrote a paper demonstrating how tail calls can be added to the JVM while preserving the byte code verifier. – Jörg W Mittag Nov 07 '13 at 23:38
  • @jozefg: ooooh gotcha! I never realized that there was this entire "loop vs recursion" thing and that, basically, you at least need one of the two. I now understand better why in Lisp dialects targetting the JVM that is such a hot topic! – Cedric Martin Nov 07 '13 at 23:45
  • 2
    Interestingly, the J9 JVM by IBM *does* perform TCO. – Jörg W Mittag Nov 07 '13 at 23:53
  • 1
    @jozefg Interestingly, nobody cares about stacktrace entries for loops, hence the stacktrace argument doesn't hold water, at least for tail recursive functions. – Ingo Nov 08 '13 at 11:51
  • @Ingo: I wouldn't say that. If I'm iterating over a list of a thousand elements, sending each one to a processing method, and one of them several hundred elements in has some unusual pattern of data that my code can't handle, having an accurate stack trace for that is *crucial* to debugging it. I've had that happen before. – Mason Wheeler Nov 08 '13 at 12:03
  • 2
    @MasonWheeler That is exactly my point: the stacktrace doesn't tell you in which iteration it happened. You can see this only indirectly, by inspecting loop variables, etc. So why would you want several hundert stack trace entries of a tail recursive function? Only the last one is interesting! And, like with loops, you could determine which recursion it was by inspecting local varaibles, argument values, etc. – Ingo Nov 08 '13 at 12:18
  • 3
    @Ingo: If a function only recurses with itself, the stack trace may not show much. If, however, a group of functions are mutally recursive, then a stack trace may sometimes show a great deal. – supercat Apr 06 '15 at 18:48
  • @supercat True. And for such cases there should be some switch to turn off TCO, as a *debugging* help. But such cases are simply no excuse to make an important programming technique virtually impossible, especially when such techniques may help make programs that are arguably less error prone in the first place. – Ingo Apr 07 '15 at 17:11
  • @Ingo: The proper way to support tail-call invocation would be with a special "tail return" syntax, which would inform the compiler that tail-call invocation is *semantically required*. While a compiler could easily identify cases where tail-call invocation could work (if code doesn't happen to care about stack traces) code which relies upon it but doesn't specify it would be brittle, since a change which precluded that optimization wouldn't generate an error but would yield code which would crash with large data sets (but still work with small ones). – supercat Apr 07 '15 at 17:19
  • @supercat I fully agree with that, and suggested to use `goto foo(args);` for this (like in perl). However, a language change is even more problematic than a semantic-preserving JVM change, that's why I don't hope that this ever comes true. – Ingo Apr 07 '15 at 17:24
  • @Ingo: If a piece of code can handle 500,000 records on the old JVM but a new JVM which increased that limit to 1,000,000, having the code still run on the old JVM would be a good thing. If, however, code written for the new JVM could handle 1,000,000 records but on the old JVM it would die if given more than 213, having the code be "compatible" with the old JVM may not be such a good thing. I would think the best way to go would be to have it be a new language feature and specify that code which uses it will require the new compiler and JVM. – supercat Apr 07 '15 at 17:41
  • @supercat At some point there needs to be progress, and we need to let go of falsely understood backwards compatibility. Your second example is clearly an instance of this "backwards" fallacy. For, consider the following and tell me, please, which state of affairs is to be preferred: 1) We have 2 JVMs and both can run the code with a problem size of 213. 2) We have 2 JVMs and one of them can run the code with a problem size of 213, while the other can run problem size 1,000,000 or even *unlimited*. – Ingo Apr 07 '15 at 18:04
  • 1
    @Ingo: A program's use of bounded or unbounded resources should be a conscious design decision. If a program can *recognize* whether a resource is bounded or not, and vary the size of problem it will accept accordingly, that's fine. What's generally not fine is running a program that assumes a particular resource is unbounded, on systems where it's not, especially if the program is ill-equipped to handle its scarcity. – supercat Apr 07 '15 at 20:08
  • @supercat You are right in principle, but the truth is: in practice we're doing just that all the time! For example, a simple word counting program just adds its stuff to a hash map, as if there were no memory limits. This is justified by the idea that the user can just specify a big enough -Xmx value. And OTOH, in many cases you can't help it without assuming another "unlimited" resource. For example one can keep track of the words in a data base with "unlimited" tablespace, instead of an in-limited-heap-memory hash map. – Ingo Apr 07 '15 at 20:55
  • @Ingo: A word-counting program doesn't assume there are no memory limits; rather, it assumes the total length of the *distinct words* in the input file will be small enough not to exceed the memory limits. By contrast, a parser state machine implementation that uses tail call to progress from state to state would be able to handle input files of arbitrary size but finite nesting using a finite amount of stack, but bomb on input files over 1500 tokens on a JVM which doesn't support TCO but has a 64KB stack. – supercat Apr 07 '15 at 21:25
  • @supercat I can't really see a difference here. In both cases the assumption is that some resource is abundant enough to satisfy the needs of the program and those needs depend on the input and have no upper limit. – Ingo Apr 08 '15 at 16:53
  • @Ingo: Whether the difference between a program that can handle an input of size X versus size Y is qualitative or quantitative depends upon context, but as it gets larger it's more likely to be considered quantitative. There are many realistic situations where a nested-call implementation of a program would be limited to files several orders of magnitude smaller than those it needs to process, but where a tail-jump implementation would be able to handle files thousands of times times bigger than those it needs to process. I would call the difference... – supercat Apr 08 '15 at 18:25
  • ...between those programs qualitative. The difference won't be important in every case, but it should be possible (and not difficult) for a programmer to specify "This program is unlikely to work usefully in an environment where things the programmer expects to be tail-jumps would get replaced with nested method calls." If a program would be limited to processing 213 records but the programmer thinks that's adequate, the programmer could say the program should run even if tail-jumps would get replaced with nested calls. If, however, the programmer thinks... – supercat Apr 08 '15 at 18:29
  • ...that using nested calls would limit the program to the point of uselessness, it should be possible to specify that rather than have the program sometimes work and sometimes crash. – supercat Apr 08 '15 at 18:30
  • @supercat I think I see what you mean, but just because standard JVMs up to this day let stack space be a scarce resource simply does not imply that future JVMs should do the same. After all, this is not part of the JVM spec. It is possible (theoretically, at least) that a JVM works like the Haskell RTS in that it allocates stack as it is needed up to the point where the entire available memory is used for stack space (and not just a small, fixed area). Once programmers learn this and rely on this, we would have the same situation: you can't run those progs on earlier JVMs easily. – Ingo Apr 08 '15 at 21:15
  • 1
    "... much of bytecode verification is stack-based" Bytecode verification is performed statically before a method is ever executed. TCO would have no impact on it whatsoever, nor could it. – David Conrad Jul 26 '16 at 04:03
  • @Ingo “And for such cases there should be some switch to turn off TCO, as a _debugging_ help.” If TCO/TCE is for a CPS transformation, then turning it off will overflow the stack and crash the program, so no debugging would be possible. Google refused to implement TCO in V8 JS, because of this issue occuring _incidentally_. They would want some special syntax so that the programmer can declare he really wants TCO and the loss of the stack trace. Does anyone know if exceptions are also screwed by TCO? – Shelby Moore III Aug 29 '18 at 03:55
7

Tail calls optimizations is mainly important because of tail recursion. However, there is an argument why it is actually good that the JVM does not optimize tail calls: As TCO reuses a part of the stack, a stack trace from an exception will be incomplete, thus making debugging a bit harder.

There are ways to work around the limitations of the JVM:

  1. Simple tail recursion can be optimized to a loop by the compiler.
  2. If the program is in continuation-passing style, then it is trivial to use “trampolining”. Here, a function does not return the end result, but a continuation which is then executed on the outside. This technique allows a compiler writer to model arbitrarily complex control flow.

This may need a larger example. Consider a language with closures (e.g. JavaScript or similar). We can write the factorial as

def fac(n, acc = 1) = if (n <= 1) acc else n * fac(n-1, acc*n)

print fac(x)

Now we can have it return a callback instead:

def fac(n, acc = 1) =
  if (n <= 1) acc
  else        (() => fac(n-1, acc*n))  // this isn't full CPS, but you get the idea…

var continuation = (() => fac(x))
while (continuation instanceof function) {
  continuation = continuation()
}
var result = continuation
print result

This now works in constant stack space, which is kind of silly because it's tail-recursive anyway. However, this technique is able to flatten all tail calls into constant stack space. And if the program is in CPS, then this means that the callstack is constant overall (in CPS, every call is a tail call).

A major disadvantage of this technique is that it's much harder to debug, a bit harder to implement, and less performant – see all the closures and indirection I'm using.

For these reasons it would be vastly preferable to have the VM implement a tail call op – languages like Java that have good reasons not to support tail calls would not have to use it.

amon
  • 132,749
  • 27
  • 279
  • 375
  • 1
    "As TCO reuses a part of the stack, a stack trace from an exception will be incomplete," - yes, but then, a stacktrace from within a loop is incomplete either - it doesn't record how often the loop was executed. - Alas, even if the JVM would support proper tail calls, one still could opt out, during debugging, say. And then, for production, enable TCO to be sure that the code runs with 100,000 or 100,000,000 tail calls. – Ingo Nov 08 '13 at 12:03
  • 1
    @Ingo No. (1) When loops aren't implemented as recursion, there is no rationale for them to show up on the stack (tail call ≠ jump ≠ call). (2) TCO is more general than tail recursion optimization. My answer uses recursion as an *example*. (3) If you're programming in a style that *relies* on TCO, switching off this optimization is not an option – full TCO or full stack traces are either a language feature, or they're not. E.g. Scheme manages to balance the TCO drawbacks with a more advanced exception system. – amon Nov 08 '13 at 12:30
  • 1
    (1) fully agree. But by the same reasoning, there is no rationale to keep hundreds and thousands of stack trace entries that all point to `return foo(....);` in method `foo` (2) fully agree, of course. Still, we accept incomplete tracing from loops, assignments (!), statement sequences. For example, if you find an unexpected value in a variable, you surely want to know how it got there. But you don't complain about missing traces in that case. Because it is somehow engraved in our brains that a) it happens only on calls b) it happens on all calls. Both makes no sense, IMHO. – Ingo Nov 08 '13 at 12:45
  • (3) Disagree. I can't see no reason why it should be impossible to debug my code with a problem of size N, for some N small enough to get away with the normal stack. And then, to turn the switch and turn TCO on - effectively dropping the constraint on probem size. – Ingo Nov 08 '13 at 12:52
  • @Ingo “Disagree. I can't see no reason why it should be impossible to debug my code with a problem of size N, for some N small enough to get away with the normal stack.” If TCO/TCE is for a CPS transformation, then turning it off will overflow the stack and crash the program, so no debugging would be possible. Google refused to implement TCO in V8 JS, because of this issue occuring _incidentally_. They would want some special syntax so that the programmer can declare he really wants TCO and the loss of the stack trace. Does anyone know if exceptions are also screwed by TCO? – Shelby Moore III Aug 29 '18 at 03:57
6

A significant portion of calls in a program are tail calls. Every subroutine has a last call, so every subroutine has at least one tail call. Tail calls have the performance characteristics of GOTO but the safety of a subroutine call.

Having Proper Tail Calls enables you to write programs that you cannot otherwise write. Take, for example, a state machine. A state machine can be very directly implemented by having each state be a subroutine and each state transition be a subroutine call. In that case, you transition from state to state to state, by making call after call after call, and you actually never return! Without Proper Tail Calls, you would immediately blow the stack.

Without PTC, you have to use GOTO or Trampolines or exceptions as control flow or something like that. It's much uglier, and not so much a direct 1:1 representation of the state machine.

(Note how I cleverly avoided using the boring "loop" example. This is an example where PTCs are useful even in a language with loops.)

I deliberately used the term "Proper Tail Calls" here instead of TCO. TCO is a compiler optimization. PTC is a language feature that requires every compiler to perform TCO.

Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318
  • `The vast majority of calls in a program are tail calls.` Not if "the vast majority" of methods called perform more than one call of their own. `Every subroutine has a last call, so every subroutine has at least one tail call.` This is trivially demonstrable as false: `return a + b`. (Unless you're in some insane language where basic arithmetic operations are defined as function calls, of course.) – Mason Wheeler Nov 07 '13 at 23:43
  • In your example, the call to `+` is a tail call. What would it be if not a call? In modern, highly-factored code, there usually are only a few calls per subroutine. "Majority" is overstating it, that's right, I'm gonna change it to "significant portion" instead. – Jörg W Mittag Nov 07 '13 at 23:46
  • `What would it be if not a call?` An arithmetic operation, of course. (Why in the world would any sane language designer burden a primitive operation with all the overhead of a function call? If you *need* a function that adds two numbers--such as to pass to a function pointer--it's trivial to write one, but 99% of the time that's not necessary.) – Mason Wheeler Nov 07 '13 at 23:50
  • None of the languages I use implement adding two numbers any different than adding two points or two strings or two vectors or two matrices or two sets or two arrays. And why would they? It's the same thing, after all. – Jörg W Mittag Nov 07 '13 at 23:52
  • No it's not. Adding two numbers is adding two numbers. It's a single operation. All of the other things you mentioned (aside from the string concatenation) are *set operations*: they don't work on two numbers, but on each corresponding element of two datasets. There's a world of difference there. – Mason Wheeler Nov 07 '13 at 23:57
  • 1
    "Adding two numbers is adding two numbers." Except for languages where it's not. What about the + operation in Lisp/Scheme where a single arithmetic operator can take an arbitrary number of arguments? (+ 1 2 3) The only sane way to implement that is as a function. – Evicatos Nov 08 '13 at 00:55
  • @Evicatos: First, that's not adding 2 numbers; that's another set operation. Second, I've never accused any Lisp dialect of being a sane language. ;) It's hard not to look at one without saying "this entire language is one big abstraction inversion from start to finish." – Mason Wheeler Nov 08 '13 at 05:00
  • 1
    @Mason Wheeler: What do you mean by abstraction inversion? – Giorgio Nov 08 '13 at 06:20
  • @Giorgio: http://en.wikipedia.org/wiki/Abstraction_inversion – Mason Wheeler Nov 08 '13 at 12:00
  • 1
    @MasonWheeler That is, without a doubt, the most hand-wavy Wikipedia entry on a technical subject that I've ever seen. I've seen some dubious entries but that's just... wow. – Evicatos Nov 08 '13 at 18:05
  • @Evicatos: I just looked it over, and you're right. It used to be a *lot* better; I think a bunch of obnoxious wikipedians (you know the type) have been weakening it over the last few years. – Mason Wheeler Nov 08 '13 at 19:02
  • @MasonWheeler Yeah, the general idea has merit but that article is just a mess. – Evicatos Nov 08 '13 at 19:08
  • @MasonWheeler: So with your previous comment you meant that Lisp is a big abstraction inversion? If so, I do not see why. Can you make a concrete example? – Giorgio Nov 08 '13 at 22:18
  • @Giorgio: Oy, where do I even begin? The fundamental model is "let's pretend we're not actually programming a computer." (Not nearly as bad--or as pretentious about it--as Haskell, but still.) The fundamental iteration model isn't iteration, and naive recursion often doesn't work well, so you have to go way out of your way in many cases to transform your intuitive recursion into the special tail-recursive form, which is a lot more complicated and harder to read, *so that the compiler can magically turn it into iteration for you!* If that's not an abstraction inversion, I don't know what is! – Mason Wheeler Nov 08 '13 at 22:48
  • It's supposed to be designed to stay out of your way and let you focus on what you want instead of how you want to do it, but in many cases, writing simple code for what you want turns out to be horribly inefficient, and you need a lot more code describing how to do it right. (For some real fun, check out what Paul Graham says about Lisp to a non-Lisp audience. Then check out his book, *On Lisp*, in which literally the first thing he talks about is how all sorts of naive/intuitive constructs are horribly inefficient, and here is "the right way" to do them, which is about 3x more complicated.) – Mason Wheeler Nov 08 '13 at 22:50
  • @Mason Wheeler: I have the impression you missed the point of using (tail) recursion versus iteration. Recursion allows to make the inputs and outputs of each iteration explicit: in imperative programming these inputs and outputs are variables that get modified by the loop body. Recursion makes it easier to track these inputs / outputs and therefore to prove your code is correct. Of course you can never completely forget you are programming a computer, so what? Also Pascal does not allow you to completely forget about the underlying hardware: should we turn back to assembly? – Giorgio Nov 09 '13 at 09:54
  • @Giorgio: First, how does tail recursion make it easier to track your variables? Second, what does that have to do with modifying the values of variables? (This is something that's always taken as an article of faith by the Cult of Immutability, but never with any comprehensible rationale attached.) Third, "proving the correctness of code" is a bad joke for just about anything more complicated than "Hello world," and everyone knows it, so why are you even bringing that up? – Mason Wheeler Nov 09 '13 at 19:10
  • @Mason Wheeler: In an imperative loop you produce a result by repeatedly modifying the content of variables. To know which variables are modified and how you have to look at the whole body of the loop. Using immutable variables and recursion, the inputs / outputs of each iteration are written explicitly in **one place only**, namely the function signature. Anyway, I am not sure I want to continue discussing this: while you ask me to explain my reasoning, you keep making generic assertions like "proving the correctness of code is a bad joke ...etc" and present them as well-known truths. – Giorgio Nov 10 '13 at 03:17
  • @Giorgio: Umm... it *is* a well-known truth, and has been for decades. The complexity of proving correctness of code rises exponentially faster than the complexity of the code itself, making it infeasible to the point of impossibility to produce any proof worthy of the term for non-trivial code. Did you seriously not know that, or are you trolling me? – Mason Wheeler Nov 10 '13 at 03:29
  • @Mason Wheeler: While you cannot prove correctness in general, under certain assumptions you can prove it for specific parts of your code, and if you write the code appropriately. Using recursion (induction) and immutability makes such verification easier. See e.g. http://cs.cmu.edu/~rwh/smlbook/book.pdf, Sections 24.1, 24.2, especially the initial paragraph on page 198. – Giorgio Nov 10 '13 at 15:51
  • 1
    @MasonWheeler: Are you talking about the list length functions on pages 22 and 23 of On Lisp? The tail-call version is about 1.2x as complicated, nowhere near 3x. I'm also unclear on what you mean by abstraction inversion. – Michael Shaw Nov 11 '13 at 06:52
  • "Every subroutine has a last call, so every subroutine has at least one tail call." Both of these statements are obviously false. – David Conrad Jul 26 '16 at 04:06
4

"The JVM doesn't support tail-call optimization, so I predict lots of exploding stacks"

Anyone who says this either (1) doesn't understand tail-call optimization, or (2) doesn't understand the JVM, or (3) both.

I'll start with the definition of tail-calls from Wikipedia (if you don't like Wikipedia, here's an alternative):

In computer science, a tail call is a subroutine call that happens inside another procedure as its final action; it may produce a return value which is then immediately returned by the calling procedure

In the code below, the call to bar() is the tail call of foo():

private void foo() {
    // do something
    bar()
}

Tail call optimization happens when the language implementation, seeing a tail call, doesn't use normal method invocation (which creates a stack frame), but instead creates a branch. This is an optimization because a stack frame requires memory, and it requires CPU cycles to push information (such as the return address) onto the frame, and because the call/return pair is assumed to require more CPU cycles than an unconditional jump.

TCO is often applied to recursion, but that's not its only use. Nor is it applicable to all recursions. The simple recursive code to compute a factorial, for example, cannot be tail-call optimized, because the last thing that happens in the function is a multiplication operation.

public static int fact(int n) {
    if (n <= 1) return 1;
    else return n * fact(n - 1);
}

In order to implement tail call optimization, you need two things:

  • A platform that supports branching in addition to subtroutine calls.
  • A static analyzer that can determine whether tail call optimization is possible.

That's it. As I've noted elsewhere, the JVM (like any other Turing-complete architecture) has a goto. It happens to have an unconditional goto, but the functionality could easily be implemented using a conditional branch.

The static analysis piece is what's tricky. Within a single function, it's no problem. For example, here's a tail-recursive Scala function to sum the values in a List:

def sum(acc:Int, list:List[Int]) : Int = {
  if (list.isEmpty) acc
  else sum(acc + list.head, list.tail)
}

This function turns into the following bytecode:

public int sum(int, scala.collection.immutable.List);
  Code:
   0:   aload_2
   1:   invokevirtual   #63; //Method scala/collection/immutable/List.isEmpty:()Z
   4:   ifeq    9
   7:   iload_1
   8:   ireturn
   9:   iload_1
   10:  aload_2
   11:  invokevirtual   #67; //Method scala/collection/immutable/List.head:()Ljava/lang/Object;
   14:  invokestatic    #73; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   17:  iadd
   18:  aload_2
   19:  invokevirtual   #76; //Method scala/collection/immutable/List.tail:()Ljava/lang/Object;
   22:  checkcast   #59; //class scala/collection/immutable/List
   25:  astore_2
   26:  istore_1
   27:  goto    0

Note the goto 0 at the end. By comparison, an equivalent Java function (which must use an Iterator to imitate the behavior of breaking a Scala list into head and tail) turns into the following bytecode. Note that the last two operations are now an invoke, followed by an explicit return of the value produced by that recursive invocation.

public static int sum(int, java.util.Iterator);
  Code:
   0:   aload_1
   1:   invokeinterface #64,  1; //InterfaceMethod java/util/Iterator.hasNext:()Z
   6:   ifne    11
   9:   iload_0
   10:  ireturn
   11:  iload_0
   12:  aload_1
   13:  invokeinterface #70,  1; //InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
   18:  checkcast   #25; //class java/lang/Integer
   21:  invokevirtual   #74; //Method java/lang/Integer.intValue:()I
   24:  iadd
   25:  aload_1
   26:  invokestatic    #43; //Method sum:(ILjava/util/Iterator;)I
   29:  ireturn

Tail call optimization of a single function is trivial: the compiler can see that there is no code that uses the result of the call, so it can replace the invoke with a goto.

Where life gets tricky is if you have multiple methods. The JVM's branching instructions, unlike those of a general-purpose processor such as the 80x86, are confined to a single method. It's still relatively straightforward if you have private methods: the compiler is free to inline those methods as appropriate, so can optimize tail calls (if you're wondering how this might work, consider a common method that uses a switch to control behavior). You can even extend this technique to multiple public methods in the same class: the compiler inlines the method bodies, provides public bridge methods, and internal calls turn into jumps.

But, this model breaks down when you consider public methods in different classes, particularly in light of interfaces and classloaders. The source-level compiler simply does not have enough knowledge to implement tail-call optimizations. However, unlike "bare-metal" implementations, the *JVM( does have the information to do this, in the form of the Hotspot compiler (at least, the ex-Sun compiler does). I don't know whether it actually performs tail-call optimizations, and suspect not, but it could.

Which brings me to the second part of your question, which I'll rephrase as "should we care?"

Clearly, if your language uses recursion as its sole primitive for iteration, you care. But, languages that need this feature can implement it; the only issue is whether a compiler for said language can produce a class that can call and be called by an arbitrary Java class.

Outside of that case, I'm going to invite downvotes by saying that it's irrelevant. Most of the recursive code that I've seen (and I've worked with a lot of graph projects) is not tail-call optimizable. Like the simple factorial, it uses recursion to build state, and the tail operation is combination.

For code that is tail-call optimizable, it's often straightforward to translate that code into an iterable form. For example, that sum() function that I showed earlier can be generalized as foldLeft(). If you look at the source, you'll see that it's actually implemented as an iterative operation. Jörg W Mittag had an example of a state machine implemented via function calls; there are lots of efficient (and maintainable) state machine implementations that do not rely on function calls being translated into jumps.

I'll finish with something completely different. If you Google your way from footnotes in the SICP, you might end up here. I personally find that a much more interesting place than having my compiler replace JSR by JUMP.

kdgregory
  • 5,220
  • 23
  • 27
  • If a tail-call opcode existed, why would tail-call optimization require anything other than observing at each call site whether the method making the call would need to execute any code afterward? It may be that in some cases a statement like `return foo(123);` could be better executed by in-lining `foo` than by generating code to manipulate the stack and perform a jump, but I don't see why tail-call would be any different from an ordinary call in that regard. – supercat Apr 06 '15 at 18:46
  • @supercat - I'm not sure what your question is. The first point of this post is that the compiler can't know what the stack frame of all potential callees might look like (remember that the stack frame holds not just the function arguments but also its local variables). I suppose you could add an opcode that does a runtime check for compatible frames, but that brings me to the second part of the post: what's the *real* value? – kdgregory Apr 11 '15 at 13:30