Are stacks the only reasonable way to structure programs?

Question

Most architectures I've seen rely on a call stack to save/restore context before function calls. It's such a common paradigm that push and pop operations are built-in to most processors. Are there systems that work without a stack? If so, how do they work, and what are they used for?

Given how functions are expected to behave in C-like languages (i.e. you can nest calls as deep as you like, and can return back out in reverse order), it's not clear to me how else one *could* implement function calls without it being incredibly inefficient. You could e.g. force the programmer to use continuation-passing style or some other bizarre form of programming, but nobody seems to actually work with CPS very much at the low level for some reason. — Kevin, Jun 19 '16 at 22:28
GLSL works without a stack (as do other languages in that specific bracket). It simply disallows recursive calls. — Alex Celeste, Jun 20 '16 at 10:17
@Leushenko Yes, but GLSL is a DSL and there's a reason we don't do systems programming in GLSL (I hope) — cat, Jun 20 '16 at 13:50
You may also want to look into [Register windows](https://en.wikipedia.org/wiki/Register_window), which are used by some RISC architectures. — Mark Booth, Jun 20 '16 at 14:55
@Kevin: "Early FORTRAN compilers supported no recursion in subroutines. Early computer architectures supported no concept of a stack, and when they did directly support subroutine calls, the return location was often stored in one fixed location adjacent to the subroutine code, which does not permit a subroutine to be called again before a prior call of the subroutine has returned. Although not specified in Fortran 77, many F77 compilers supported recursion as an option, while it became a standard in Fortran 90." https://en.wikipedia.org/wiki/Fortran#FORTRAN_II — Mooing Duck, Jun 20 '16 at 16:16
@Kevin - Continuation passing style is used in a lot of compilers as an intermediate stage. It generally gets transformed back to stack-based code prior to code generation because stacks are more efficient, but CPS is useful for implementing a lot of the interesting features of modern languages, e.g. generator functions and C#'s await/async feature. — Jules, Jun 20 '16 at 17:30
COBOL also for many years (i.e. until the 1980s) did not allow recursion and therefore did not need a stack. If you wanted to process tree-structured data (e.g. a parts hierarchy) life was tough. — Michael Kay, Jun 20 '16 at 23:01
The P8X32A ("Propeller") microcontroller has no concept of a stack in it's standard assembly language (PASM). The instructions responsible for jumping also self-modify the return instruction in RAM to determine where to return to - which can be chose arbitrarily. Interestingly, the "Spin" language (an interpreted high level language that runs on the same chip) does have traditional stack semantics. — Wossname, Jun 21 '16 at 07:27

score 50 · Accepted Answer · edited Apr 12 '17 at 07:31

50

A (somewhat) popular alternative to a call stack are continuations.

The Parrot VM is continuation-based, for example. It is completely stackless: data is kept in registers (like Dalvik or the LuaVM, Parrot is register-based), and control-flow is represented with continuations (unlike Dalvik or the LuaVM, which have a call stack).

Another popular data structure, used typically by Smalltalk and Lisp VMs is the spaghetti stack, which is kind-of like a network of stacks.

As @rwong pointed out, continuation-passing style is an alternative to a call stack. Programs written in (or transformed to) continuation-passing style never return, so there is no need for a stack.

Answering your question from a different perspective: it is possible to have a call stack without having a separate stack, by allocating the stack frames on the heap. Some Lisp and Scheme implementations do this.

edited Apr 12 '17 at 07:31

Community

1

answered Jun 20 '16 at 02:01

Jörg W Mittag

101,921
24
218
318

4

It depends on your definition of a stack. I'm not sure that having a linked list (or array of pointers to or ...) of stack frames is so much "not a stack" as "a different representation of a stack"; and programs in CPS languages (in practice) tend to build what are effectively linked lists of continuations that are very similar to stacks (if you haven't, you might check out GHC, which pushes what it calls "continuations" onto a linear stack for efficiency). – Jonathan Cast Jun 20 '16 at 14:12
6

"_Programs written in (or transformed to) continuation-passing style never return_" ... sounds ominous. – Rob Penridge Jun 20 '16 at 14:58
5

@RobPenridge: it's a bit cryptic, I agree. CPS means that instead of returning, functions take as an extra argument another function to call when their work is done. So, when you call a function and you have some other work you need to do after you have called the function, instead of waiting for the function to return and then carry on with your work, you wrap the remaining work ("the continuation") into a function, and pass that function as an additional argument. The function you called then calls that function instead of returning, and so on and so forth. No function ever returns, it just – Jörg W Mittag Jun 20 '16 at 15:01
3

… calls the next function. Therefore, you don't need a call stack, because you never need to come back and restore the binding state of a previously called function. Instead of carrying around the *past* state so that you can come back to it, you carry around the *future* state, if you will. – Jörg W Mittag Jun 20 '16 at 15:02
But when you get a couple calls "deep" into the call hierarchy, you have multiple continuations. Which must be stored somewhere in memory. They can either be in an implicit stack, or they can be linked.. which is another way to represent a stack. I can't think of any significant difference between continuations and stack based besides the fact that continuations make multi-threading simpler. – Mooing Duck Jun 20 '16 at 16:26
1

@jcast: The defining feature of a stack is IMO that you can only access the topmost element. A list of continuations, OTOH, would give you access to all continuations and not just the topmost stackframe. If you have Smalltalk-style resumable exceptions, for example, you need to be able to traverse the stack without popping it. And having continuations in the language while still wanting to keep the familiar idea of a call stack leads to spaghetti stacks, which is basically a tree of stacks where continuations "fork" the stack. – Jörg W Mittag Jun 20 '16 at 16:26
@JörgWMittag: The issue isn't just that the stack can be examined, but also that it's also possible to implement co-routines using continuations. – supercat Jun 20 '16 at 20:18

score 36 · Answer 2 · answered Jun 20 '16 at 04:03

In the olden days, processors didn't have stack instructions, and programming languages didn't support recursion. Over time, more and more languages choose to support recursion, and hardware followed suite with stack frame allocation capabilities. This support has varied greatly over the years with different processors. Some processors adopted stack frame and/or stack pointer registers; some adopted instructions that would accomplish the allocation of stack frames in a single instruction.

As processors advanced with single level, then multi-level caches, one critical advantage of the stack is that of cache locality. The top of the stack is almost always in the cache. Whenever you can do something that has a large cache hit rate, you're on the right track with modern processors. The cache applied to the stack means that local variables, parameters, etc.. are almost always in the cache, and enjoy the highest level of performance.

In short, the usage of the stack evolved both in hardware and software. There are other models (for example data flow computing was tried for an extended period), however, locality of the stack makes it work really well. Further, procedural code is just what processors want, for performance: one instruction telling it what to do after another. When the instructions are out of linear order, the processor slows down tremendously, at least as yet, since we haven't figured out how to make random access as fast as sequential access. (Btw, there are similar issues at each memory level, from cache, to main memory, to disc...)

Between the demonstrated performance of sequential access instructions and the beneficial caching behavior of the call stack, we have, at least at present, a winning performance model.

(We might throw mutability of data structures into the works as well...)

This doesn't mean that other programming models can't work, especially when they can be translated into the sequential instructions and call stack model of today's hardware. But there is a distinct advantage for models that support where the hardware is at. However, things don't always stay the same, so we could see changes in the future as different memory and transistor technologies allow for more parallelism. It is always a banter between programming languages and hardware capabilities, so, we'll see!

In fact, GPUs still don't have stacks at all. You are forbidden from recursing in GLSL/SPIR-V/OpenCL (not sure about HLSL, but probably, I see no reason why it would be different). The way they actually handle function call "stacks" is by using an absurdly large number of registers. — Linear, Jun 20 '16 at 10:49
@Jsor: That's to a large degree implementation detail, as can be seen from the SPARC architecture. Like your GPU's, SPARC has a huge register set, but it's unique in that it has a sliding window which on wraparound spills the very old registers to a stack in RAM. So it's really a hybrid between the two models. And SPARC didn't specify exactly how many physical registers there were, just how big the register window was, so different implementations could lie anywhere on that scale of "huge amount of registers" to "just enough for one window, on every function call spill directly to stack" — MSalters, Jun 21 '16 at 09:36
A down side of the call stack model is that array, and/or address overflow have to be watched very carefully, as self modifying programs as an exploit are possible if arbitrary bits of the heap are executable. — BenPen, Sep 23 '16 at 22:26

score 15 · Answer 3 · answered Jun 19 '16 at 23:35

TL;DR

Call stack as a function call mechanism:
1. Is typically simulated by hardware but is not fundamental to the construction of hardware
2. Is fundamental to imperative programming
3. Is not fundamental to functional programming
Stack as an abstraction of "last-in, first-out" (LIFO) is fundamental to computer science, algorithms, and even some non-technical domains.
Some examples of program organization that do not use call stacks:
- Continuation-passing style (CPS)
- State machine - a giant loop, with everything inlined. (Purported to be inspired by the Saab Gripen firmware architecture, and attributed to a communication by Henry Spencer and reproduced by John Carmack.) ^{(Note #1)}
- Dataflow architecture - a network of actors connected by queues (FIFO). The queues are sometimes called channels.

The rest of this answer is a random collection of thoughts and anecdotes, and therefore somewhat disorganized.

The stack that you have described (as a function call mechanism) is specific to imperative programming.

Below imperative programming, you will find machine code. Machine code can emulate the call stack by executing a small sequence of instructions.

Below machine code, you will find the hardware responsible for executing software. While modern microprocessor is too complex to be described here, one can imagine that a very simple design exists that is slow but is still capable of executing the same machine code. Such a simple design will make use of the basic elements of digital logic:

Combinational logic, i.e. a connection of logic gates (and, or, not, ...) Note that "combinational logic" excludes feedbacks.
Memory, i.e. flip-flops, latches, registers, SRAM, DRAM, etc.
A state machine that consists of some combinational logic and some memory, just enough so that it can implement a "controller" that manages the rest of the hardware.

The following discussions contained plenty of examples of alternative ways of structuring imperative programs.

The structure of such as program will look like this:

void main(void)
{
    do
    {
        // validate inputs for task 1
        // execute task 1, inlined, 
        // must complete in a deterministically short amount of time
        // and limited to a statically allocated amount of memory
        // ...
        // validate inputs for task 2
        // execute task 2, inlined
        // ...
        // validate inputs for task N
        // execute task N, inlined
    }
    while (true);
    // if this line is reached, tell the programmers to prepare
    // themselves to appear before an accident investigation board.
    return 0; 
}

This style would be appropriate for microcontrollers, i.e. for those who see the software as an companion to the functions of the hardware.

Interesting. I would have thought it the other way round. For example, FORTRAN is an imperative programming language, and early versions did not use a call stack. However, recursion is fundamental to functional programming, and I don't think its possible to implement in the general case without using a stack. — T.E.D., Jun 21 '16 at 13:42
@T.E.D. - in a functional language implementation there's a stack (or typically, a tree) data structure in there representing pending calculations but you don't necessarily execute it with instructions using the machine's stack-oriented addressing modes or even the call/return instructions (in a nested/recursive fashion - maybe as just part of a state machine loop). — davidbak, Jun 21 '16 at 17:59
@davidbak - IIRC, a recursive algorithm pretty much has to be tail-recursive in order to be able to get rid of the stack. There are probably some other cases where you could optimize it out, but in the *general* case, **you have to have a stack**. In fact, I was told there was a mathematical proof of this floating around somewhere. [This answer](http://stackoverflow.com/a/933979/29639) claims it is the Church-Turing theorem (I think based on the fact that Turing machines use a stack?) — T.E.D., Jun 21 '16 at 18:40
@T.E.D. - I agree with you. I believe the miscommunication here is that I read the OP's post to be talking about _system architecture_ which meant to me _machine architecture_. I think others who have answered here have the same understanding. So those of us who understood _that_ to be the context have answered by responding that you don't need a stack at the machine instruction/addressing mode level. But I can see the question can also be interpreted to mean, merely, does a _language system_ in general need a call stack. That answer is _also_ no, but for different reasons. — davidbak, Jun 21 '16 at 19:34
Sorry for the confusion caused by my answer. I would like to correct my answer in this way: (1) Call stack does not exist at a very low level (machine level); (2) Call stack is essential at the middle level (imperative and structural programming); (3) Higher levels of computer programming, such as functional programming, **do not require its users to be concerned about call stacks**, but these languages tend to be implemented in imperative and structural programming which requires the use of calls stacks. — rwong, Jun 22 '16 at 04:36

Eric Lippert · Answer 4 · 2016-06-23T20:13:36.300

You've got some good answers so far; let me give you an impractical but highly educational example of how you could design a language without the notion of stacks or "control flow" at all. Here's a program that determines factorials:

function f(i) => if i == 0 then 1 else i * f(i - 1)
let x = f(3)

We put this program in a string, and we evaluate the program by textual substitution. So when we are evaluating f(3), we do a search and replace with 3 for i, like this:

function f(i) => if i == 0 then 1 else i * f(i - 1)
let x = if 3 == 0 then 1 else 3 * f(3 - 1)

Great. Now we perform another textual substitution: we see that the condition of the "if" is false and do another string replace, producing the program:

function f(i) => if i == 0 then 1 else i * f(i - 1)
let x = 3 * f(3 - 1)

Now we do another string replace on all sub-expressions involving constants:

function f(i) => if i == 0 then 1 else i * f(i - 1)
let x = 3 * f(2)

And you see how this goes; I won't labour the point further. We could keep on doing a series of string substitutions until we got down to let x = 6 and we'd be done.

We use the stack traditionally for local variables and continuation information; remember, a stack doesn't tell you where you came from, it tells you where you're going next with that return value in hand.

In the string substitution model of programming, there are no "local variables" on the stack; the formal parameters are substituted for their values when the function is applied to its argument, rather than put into a lookup table on the stack. And there is no "going somewhere next" because program evaluation is simply applying simple rules for string substitution to produce a different but equivalent program.

Now, of course, actually doing string substitutions is probably not the way to go. But programming languages that support "equational reasoning" (such as Haskell) are logically using this technique.

[Retina](https://github.com/m-ender/retina) is an example of a Regex-based programming language that uses string operations for computation. — Andrew, Jun 20 '16 at 17:45
@AndrewPiliser Designed and implemented by [this cool dude](https://codegolf.stackexchange.com/users/8478/martin-ender). — cat, Jun 21 '16 at 12:25
"We use the stack traditionally for local variables and continuation information; remember, a stack doesn't tell you where you came from, it tells you where you're going next with that return value in hand." - this makes me wonder whether there are programming systems (other than ROP used for exploits) that deliberately put some address other than the next instruction on the stack, or even construct a chain of return addresses to do some processing on the way out — user253751, Jul 16 '21 at 16:45
@user253751: I do not know of any offhand but it is certainly possible. A simpler form of what you're describing is seen in mainstream optimizations such as inlining, where the call is eliminated entirely and therefore there is no need to return, and tail calling. That is, if A() calls void method B() and the last thing void method B() does is call void method C(), then C() can return straight to A() instead of returning to B() first. — Eric Lippert, Jul 16 '21 at 17:00
@user253751: In that last scenario, if you examined the call stack it would look like A() called C() directly because the stack frame for B()'s locals is overwritten by C()'s locals, and C()'s return-to address is inside A() instead of B(). Runtimes which use this optimization might sensibly choose to turn it off if a debugger is attached because it looks weird, even if it does make sense. — Eric Lippert, Jul 16 '21 at 17:03

score 10 · Answer 5 · edited Jun 20 '16 at 12:20

10

No, not necessarily.

Read Appel's old paper Garbage Collection can be faster than Stack Allocation. It uses continuation passing style and shows a stackless implementation.

Notice also that old computer architectures (e.g. IBM/360) did not have any hardware stack register. But the OS and compiler reserved a register for the stack pointer by convention (related to calling conventions) so that they could have a software call stack.

In principle, a whole program C compiler and optimizer could detect the case (somewhat common for embedded systems) where the call graph is statically known and without any recursion (or function pointers). In such a system, each function could keep its return address in a fixed static location (and that was how Fortran77 worked in 1970 era computers).

These days, processors also have call stacks (and call & return machine instructions) aware of CPU caches.

edited Jun 20 '16 at 12:20

psmears

188
4

answered Jun 20 '16 at 05:14

Basile Starynkevitch

32,434
6
84
125

1

Pretty sure FORTRAN stopped using static return locations when FORTRAN-66 came out and required support for `SUBROUTINE` and `FUNCTION`. You are correct for earlier versions, though (FORTRAN-IV and possibly WATFIV). – TMN Jun 20 '16 at 12:42
And COBOL, of course. And excellent point about IBM/360 - it got quite a lot of use even though missing hardware stack addressing modes. (R14, I believe it was?) And it had compilers for stack-based languages, e.g., PL/I, Ada, Algol, C. – davidbak Jun 20 '16 at 19:09
Indeed, I studied the 360 in college and found it bewildering at first. – JDługosz Jun 21 '16 at 11:02
1

@JDługosz The best way for modern students of computer architecture to consider the 360 is as a very simple RISC machine ... albeit with more than one instruction format ... and a few anomalies like `TR` and `TRT`. – davidbak Jun 21 '16 at 18:02
How about "zero and add packed" to move a register? But "branch and link" rather than stack for return address has made a come-back. – JDługosz Jun 21 '16 at 19:16

score 3 · Answer 6 · edited Apr 12 '17 at 07:31

Since the publication by Parnas in 1972 of On the criteria to be used in decomposing systems into modules it has been reasonably accepted that information hiding in software is a good thing. This follows on a long debate throughout the '60s on structural decomposition and modular programming.

Modularity

A necessary result of black-box relationships between modules implemented by different groups in any multi-threaded system requires a mechanism to permit reentrancy and a means to track the dynamic call-graph of the system. Controlled flow of execution has to pass both into and out of multiple modules.

Dynamic scoping

As soon as lexical scoping is insufficient to track dynamic behavior, then some runtime bookkeeping is required to track the difference.

Given any thread (by definition) has only a single current instruction pointer, a LIFO-stack is appropriate to track each invocation.

Exceptions

So, while the continuation model does not maintain a data structure explicitly for the stack, there is still the nested calling of modules that has to be maintained somewhere!

Even declarative languages either maintain evaluation history, or conversely flatten the execution plan for performance reasons and maintain progress in some other way.

The endless loop structure identified by rwong is common in high-reliability applications with static scheduling that disallows many common programming structures but demands that the entire application be considered a white box with no significant information hiding.

Multiple concurrent endless loops do not require any structure to hold return addresses as they do not call functions, making the question moot. If they communicate using shared variables, then these can easily degenerate into legacy Fortran-style return address analogues.

You paint yourself in a corner by assuming "_any_ multi-threaded system". Coupled finite-state machines might have multiple threads in their implementation, but don't require a LIFO stack. There's no limitation in FSM's that you return to any previous state, let alone in LIFO order. So that's a real multi-threaded system for which it doesn't hold. And if you restrict yourself to a definition of multi-threaded as "parallel independent function call stacks" you end up with a circular definition. — MSalters, Jun 21 '16 at 09:45
I don't read the question that way. OP is familiar with function calls, but asks about _other_ systems. — MSalters, Jun 22 '16 at 08:08
@MSalters Updated to incorporate concurrent endless loops. The model is valid, but limits scalability. I'd suggest that even moderate state machines incorporate function calls to permit reuse of code. — Pekka, Jun 22 '16 at 08:16

score 2 · Answer 7 · answered Jun 22 '16 at 18:10

All the old mainframes (IBM System/360) had no notion of a stack at all. On the 260, for example, parameters were constructed in a fixed location in memory and when a subroutine was called, it was called with R1 pointing to the parameter block and R14 containing the return address. The called routine, if it wanted to call another subroutine, would have to store R14 in a known location before making that call.

This is much more reliable than a stack because everything can be stored in fixed memory locations established at compile time and it can be 100% guaranteed that processes will never run out of stack. There is none of the "Allocate 1MB and cross your fingers" that we have to do nowadays.

Recursive subroutine calls were allowed in PL/I by specifying the keyword RECURSIVE. They meant that the memory used by the subroutine was dynamically rather than statically allocated. But recursive calls were as rare then as they are now.

Stackless operation also makes massive multi-threading much easier, which is why attempts are often made to make modern languages stalkless. There is no reason at all, for example, why a C++ compiler could not be back-end modified to use dynamically allocated memory rather than stacks.

Are stacks the only reasonable way to structure programs?

7 Answers7

Modularity

Dynamic scoping

Exceptions