What can qualify for potential tail call recursion (TCO) optimization or tail recursion elimination (TRE)

Question

Short question is: what can qualify for potential tail call recursion optimization (TCO) or tail recursion elimination (TRE) if the compiler or interpreter supports it.

The summary of this question is in the section "Or is it true that, one simplest way to think about it" below.

I saw the Stack Overflow question "What is tail-recursion elimination?" as well, and read different descriptions of how a tail call recursion can happen.

First one:

a tail call is a subroutine call performed as the final action of a procedure (Wikipedia as of Jan 10, 2016)

Second one:

when the value returned by the recursive call is itself immediately returned (Programming Interview Exposed, Wiley, 2013)

Third one:

when a function calls itself, it cannot do any operation with this value that involves a local variable, and just return it. This call happens at the end of function, and so any local variables or "state" inside the stack (inside the current scope) can be thrown away

I think the second one is very close to the third one, except I didn't see exactly what it mean by "immediately". The first one made me think if I return n * fact(n-1) then it qualifies, although later on in Wikipedia, it says n * fact(n - 1) is not tail recursive, because even though it calls itself on the last line, it uses n, so it doesn't qualify for TCO.

Interestingly, maybe the book Programming Interview Exposed 2013, Wiley, needs an errata, that it says

int factorial( int n ) {
    if (n > 1) { 
        /* Recursive case */
        return factorial(n-1) * n; 
    } else { 
        /* Base case */
        return 1; 
    }
}

when the value returned by the recursive call is itself immediately returned, as in the preceding definition for factorial, the function is tail-recursive. Some compilers can perform tail call elimination on tail-recursive functions, an optimization that reuses the same stack frame for each recursive call

but I don't think it is tail recursive, is it?

Is calling on the last line important? On the page Tail Call Optimization in Ruby it says the following code can qualify for TCO:

def fact(n, acc=1)             # just assuming we pass in non-negative n
  return acc if n <= 1
  return fact(n-1, n*acc)
end

But isn't it true that even the following qualify for TCO?

def fact(n, acc=1)
  return fact(n-1, n*acc) if n > 1
  return acc if n <= 1
end

Or is it not, because it has "unfinished business" -- you need the stack to remember where the code has reached and continue later on, so you cannot wipe out the stack?

What if you do math operation, but it is like return 1 + fact(n-1)? That is, you are adding a constant, 1 instead of touching any local variable, so there shouldn't be stack info that needs to be kept each time you recurs. But if you view it as needing to remember 1 + (1 + (1 + (1 + ..., then in fact you still need more and more stack, unless if typical TCO can actually optimize it to 4 + fact(...) without needing new stacks.

Or is it true that, one simplest way to think about it is:

When you can wipe out the whole stack frame, because there is no need to keep any of those info in the current stack when you make the recursive call, then you can wipe the whole stack frame out and just use a "GO TO", instead of adding a new stack, then it can qualify for TCO? Then, at least for this recursive call alone, there will be no stack overflow.

So because if you can wipe out the whole stack frame, that means, there is no local variables that is needed?

What if it is a recursive call to traverse a binary tree or rename all files under a directory and all its subdirectories, from the form "Programming_Ruby.pdf" to "Programming Ruby.pdf", and your recursion is not on the last line, because the last line has a print "this folder done"... then, can you wipe the whole stack frame out? That is, will the position of the next line of code to run also depend on the stack frame?

If the next line to run depends on the stack, then maybe it can boil down to these 2 rules?

the recursive call must be the last operation of the function, so after this recursive call, the current function will end and there is no "next line of code to run" to remember.
there must be no operation involved with this recursive call. No math operations, whatsoever, especially if it involves a local variable. It must simply return f(...) and return it alone. (or call this procedure alone). The ... inside can involve calculations, even if it involves local variables. Outside of (...) you cannot do any calculation at all.

Short answer: Read the various "Lambda: The Ultimate" papers from the MIT AI Lab. They go into a great deal of detail on, among other things, tail recursion and tail call elimination. — John R. Strohm, Jan 11 '16 at 07:14

score 6 · Answer 1 · answered Jan 10 '16 at 18:10

Consider the functional language Clojure, because TCO is difficult to implement in the JVM, Clojure presents tail calls as an expression with the recur. For example, we might have the following factorial function.

(def factorial
  (fn [n]
    (let [factorial-inner 
         (fn [cnt acc]
           (if (zero? cnt)
             acc
             (recur (dec cnt) (* acc cnt))))]
         (factorial-inner n 1))))

(Typically the loop macro is used as a way to save defining functions like factorial-inner.) This is functionally equivalent to the following

(def factorial
   (fn [n]
     (if (zero? n)
         1
         (* n (factorial (dec n))))))

However, recur may only be called in the tail position and so guarantees that TCO may be performed. It is bound essentially by the rules you list. What adds complexity to this answer is that there are a number of well known optimizations that transform recursive calls into tail calls so that TCO may be performed. Somewhat confusingly, these optimizations are sometimes also called TCO.

A well known example is Tail call modulo cons. A function written in this way has the form:

f (base_case) = base_value
f (args) = cons(g(args), f(h(args)))

Functions written in this way, when cons and g satisfy appropriate conditions, can be rewritten as

f (args) = f_inner(args, base_value)
  where
    f_inner (base_case, acc) = acc
    f_inner (args, acc) = f_inner(h(args), cons(g(args), acc))

where f_inner is clearly tail-call recursive. (Exercise: show that the non-tail call factorial definition is a tail call modulo cons and that it can be transformed into the first definition.) Even more complicated cases of such rewriting optimizations (those that rewrite a non-tail-call-recursive definition into a tail-call recursive one) are possible. The GHC implementation of the language Haskell contains perhaps the most extensive use of term-rewriting optimizations (even allowing users to contribute rewriting rules). For more details, consult this page and the linked papers.

score 3 · Answer 2 · answered Jan 10 '16 at 18:29

A tail call is any call that is done immediately before the calling function returns. In return n + foo(), the last operation before returning must be the addition not the call. In return foo(), the call is indeed in tail position.

Why can't a compiler magically optimize that away, since some simple operation do not affect the stack? Because in most calling conventions, a function is given a return address during call. A return is like a goto to this address. When a function a() { return b() }; a() is not tail-called optimized, we get a sequence of operations like this:

a() is called with a return address in the main
b() is called with a return address in a
b returns into a
a returns into the main

With deep recursion, all of these return addresses start to consume a lot of stack space. So with TCO, a tail call is given the return address of the current function:

a() is called with return address in the main
b() is called with return address in the main ← TCO!
b returns into the main

So after b was called as a tail call, the flow of control never re-entered the a function! And since the same return address is reused for the call, no extra stack space was used.

Your final rules are therefore sufficiently correct. However, return n * fact(n - 1) does have an operation in the tail position! This is the multiplication *, which will be the last thing the function does before it returns. In some languages, this might actually be implemented as a function call which could then be tail-call optimized.

Another point to keep in mind is that tail calls are a special case of a “call with continuation”. A continuation is like a function or a closure, but also remembers where in that function the execution was stopped – it can therefore be continued. One surprising aspect is that every program can be translated into a form that uses continuations, where every call is a tail call! This is called Continuation-Passing-Style and is used in some compilers as an intermediate form.

really? even `return n * fact(n - 1)` can be tail optimized in some cases? The SICP book listed it as `20 * (19 * (18 * ... * 1 * fact(0))))))`, so there is a lot of stacks involved, to remember those 20, 19, 18, etc. I tried MIT Scheme and Berkeley STk, with the last line being a function call `(* n (factorial (- n 1)))` but it will stack overflow if n is 100000. Then if I change it to the way of using `acc` as in `(factorial-helper n acc)` and tail call, then even when n is 100000, it won't stack overflow — nonopolarity, Jan 11 '16 at 03:37
@太極者無極而生 In that example, the multiplication is in tail position and is eligible for optimization if multiplication is implemented as a function call. The recursive call to `fact(…)` isn't in tail position. Tail *recursion* optimization is just a special case of tail *call* optimization, which is also useful outside obvious recursion. But even if the multiplication is TCO'd, the recursive call isn't and we expect to see that stack use you noted. — amon, Jan 11 '16 at 09:24

score 1 · Answer 3 · answered Jan 10 '16 at 17:29

1

Tail call optimization indeed boils down to those final two rules you stated:

The tail call must be the last operation in the function and its result (if any) must only be used to return it unchanged to the function's caller.

answered Jan 10 '16 at 17:29

Bart van Ingen Schenau

71,712
20
110
179

I'm trying to think of any way an aggressively optimizing compiler could do better, but it does seem like this is probably about it. The best I can come up with is that in the `return n+f();` case, maybe the compiler could keep an `int` hanging around that it adds `n` to on every `f()` call because it knows that `n+...+n+f()` reduces to `xn+f()` and that last addition can wait until the final `f()` call. But that seems unlikely to be worth the effort. – Ixrec Jan 10 '16 at 17:50
@Ixrec: A good optimizers might sometimes inline a call, tail-call or not, and/or replace recursion with a loop. – Deduplicator Jan 10 '16 at 17:54

What can qualify for potential tail call recursion (TCO) optimization or tail recursion elimination (TRE)

Or is it true that, one simplest way to think about it is:

If the next line to run depends on the stack, then maybe it can boil down to these 2 rules?

3 Answers3