50

When compiling C code and looking at assembly, it all has the stack grow backwards like this:

_main:
    pushq   %rbp
    movl    $5, -4(%rbp)
     popq    %rbp
    ret

-4(%rbp) - does this mean the base pointer or the stack pointer are actually moving down the memory addresses instead of going up? Why is that?

I changed $5, -4(%rbp) to $5, +4(%rbp), compiled and ran the code and there were no errors. So why do we have to still go backwards on the memory stack?

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
alex
  • 499
  • 1
  • 4
  • 4
  • 2
    Note that `-4(%rbp)` doesn't move the base pointer at all and that `+4(%rbp)` couldn't have possibly work. – Margaret Bloom Jan 27 '19 at 20:24
  • 15
    "*why do we have to still go backwards*" - what do you think would be the advantage of going forwards? Ultimately, it doesn't matter, you just have to choose one. – Bergi Jan 27 '19 at 20:57
  • 36
    "why do we grow the stack backwards?" -- because if we don't someone else would ask why `malloc` grows the heap backwards – slebetman Jan 28 '19 at 03:24
  • 2
    @MargaretBloom: Apparently on the OP's platform, the CRT startup code doesn't care if `main` clobbers its RBP. That's certainly possible. (And yes, writing `4(%rbp)` would step on the saved RBP value). Err actually, this main never does `mov %rsp, %rbp`, so the memory access is relative to *the caller's RBP*, if that's what the OP actually tested!!! If this was actually copied from compiler output, some instructions were left out! – Peter Cordes Jan 28 '19 at 03:44
  • 2
    It seems to me that "backwards" or "forwards" (or "down" and "up") depends on your point of view. If you diagrammed memory as a column with low addresses on top, then growing the stack by decrementing a stack pointer would be analogous to a physical stack. – jamesdlin Jan 29 '19 at 08:00
  • @jamesdlin But numbers naturally have a direction in natural human language (at least in English). That's why if I ask you to count backwards you would not count from 1 to 10. – slebetman Jan 30 '19 at 02:34
  • If the stack had always been grown "forward", would you have asked "why do we still grow the stack forward"? – gnasher729 Feb 17 '22 at 09:29

4 Answers4

92

Does this mean the base pointer or the stack pointer are actually moving down the memory addresses instead of going up? Why is that?

Yes, the push instructions decrement the stack pointer and write to the stack, while the pop do the reverse, read from the stack and increment the stack pointer.

This is somewhat historical in that for machines with limited memory, the stack was placed high and grown downwards, while the heap was placed low and grown upwards.  There is only one gap of "free memory" — between the heap & stack, and this gap is shared, either one can grow into the gap as individually needed.  Thus, the program only runs out of memory when the stack and heap collide leaving no free memory. 

If the stack and heap both grow in the same direction, then there are two gaps, and the stack cannot really grow into the heap's gap (the vice versa is also problematic).

Originally, processors had no dedicated stack handling instructions.  However, as stack support was added to the hardware, it took on this pattern of growing downward, and processors still follow this pattern today.

One could argue that on a 64-bit machine there is sufficient address space to allow multiple gaps — and as evidence, multiple gaps are necessarily the case when a process has multiple threads.  Though this is not sufficient motivation to change things around, since with multiple gap systems, the growth direction is arguably arbitrary, so tradition/compatibility tips the scale.


You'd have to change the CPU stack handling instructions in order to change the direction of the stack, or else give up on use of the dedicated pushing & popping instructions (e.g. push, pop, call, ret, others).

Note that the MIPS instruction set architecture does not have dedicated push & pop, so it is practical to grow the stack in either direction — you still might want a one-gap memory layout for a single thread process, but could grow the stack upwards and the heap downwards.  If you did that, however, some C varargs code might require adjustment in source or in under-the-hood parameter passing.

(In fact, since there is no dedicated stack handling on MIPS, we could use pre or post increment or pre or post decrement for pushing onto the stack as long as we used the exact reverse for popping off the stack, and also assuming that the operating system respects the chosen stack usage model.  Indeed, in some embedded systems and some educational systems, the MIPS stack is grown upwards.)


We refer to multi-byte items by the lowest address among them — i.e. by their first byte aka the beginning.  Another advantage of growing the stack downward is that, after pushing, the stack pointer refers to the item recently pushed onto the stack, no matter its size.  Growing the stack in the reverse direction means pointing to the logical end of the last item pushed.

Erik Eidt
  • 33,282
  • 5
  • 57
  • 91
  • 35
    It's not just `push` and `pop` on most architectures, but also the far more important interrupt-handling, `call`, `ret`, and whatever else has baked-in interaction with the stack. – Deduplicator Jan 27 '19 at 16:46
  • 1
    Some microcontrollers (e.g. AVR) still follow this historical model, with the stack at the top of the RAM and .data + .bss + heap at the bottom, with only one gap of free memory in between (.text is not in RAM: it is a Harvard architecture). – Edgar Bonet Jan 27 '19 at 20:04
  • 3
    ARM can have all all four stack flavours. – Margaret Bloom Jan 27 '19 at 20:28
  • 15
    For what it's worth, I don't think "the growth direction is arbitrary" in the sense that either choice is equally good. Growing down has the property that overflowing the end of a buffer clobbers earlier stack frames, including saved return addresses. Growing up has the property that overflowing the end of a buffer clobbers only storage in the same or later (if the buffer is not in the latest, there may be later ones) call frame, and possibly even only unused space (all assuming a guard page after the stack). So from a safety perspective, growing up seems highly preferable – R.. GitHub STOP HELPING ICE Jan 28 '19 at 02:30
  • @EdgarBonet: Linux on x86 / x86-64 still follows the same layout, too for user-space processes. The stack is near the very top of usable user-mode virtual address space (e.g. `RSP = 0x7ffff7f85578` on entry to `main`, near the top of the lower-47-bit canonical range that Linux reserves for user-space), and text/data/bss segments are near the bottom (in a position-dependent executable. Otherwise in a PIE, text/data/bss are ASLRed somewhere around `0x555555555000` (that address is with ASLR disabled by GDB). Or https://notes.shichao.io/lkd/ch15/ shows some 32-bit memory map examples. – Peter Cordes Jan 28 '19 at 03:38
  • 6
    @R..: growing up doesn't eliminate buffer overrun exploits, because vulnerable functions are usually not leaf functions: they call other functions, placing a return address above the buffer. Leaf functions that get a pointer from their caller could become vulernable to overwriting their own return address. e.g. If a function allocates a buffer on the stack and passes it to `gets()`, or does a `strcpy()` that doesn't get inlined, then the return *in those library functions* will use the overwritten return address. Currently with downward-growing stacks, it's when their caller returns. – Peter Cordes Jan 28 '19 at 06:00
  • 1
    Upward stacks would make it possible for a defensive `strcpy` to detect some errors, though: it knows where its own return address is stored, and could make that an upper limit for the destination pointer. e.g. `abort()` if passed an input that would result in overwriting the return address. It can't catch the general case of a function allocating a buffer and passing it to another function, which only then calls `strcpy`, though. But I think many vulnerabilities allocate and `strcpy` in the same function. – Peter Cordes Jan 28 '19 at 06:05
  • @PeterCordes the knowledge of the return address does not depend on the stack’s run direction. When you’re talking about target buffers within the current stack frame, it would be possible to detect when the copying would cross the return address, if the particular language/compiler was willing to let the application spend CPU cycles for that. But when changing the stack’s direction, the writing into the target buffer would run into the same direction, hence, could never cross the return address. – Holger Jan 28 '19 at 11:13
  • @Holger: The usual buffer overruns happen when calling library functions that copy a string into a buffer, not in open-coded loops inside the function itself. I don't think you understood my previous comments. A library `strcpy` function doesn't know where the buffer ends and its parent's return address begins. It only knows where *its own* return address is stored. (It could unwind the stack, but with `-fomit-frame-pointer` being the default, that's *very* expensive to look up metadata in `.eh_frame`, or whatever Microsoft does. It can't follow a saved-RBP frame pointer chain.) – Peter Cordes Jan 28 '19 at 11:42
  • @PeterCordes ok, now I understood. – Holger Jan 28 '19 at 13:12
  • 5
    @PeterCordes: Indeed my comment noted that same-level or more recent stack frames than the overflowed buffer are still potentially clobberable, but that's a lot less. In the case where the clobbering function is a leaf function directly called by the function whose buffer it is (e.g. `strcpy`), on an arch where return address is kept in a register unless it needs to be spilled, there is no access to clobber the return address. – R.. GitHub STOP HELPING ICE Jan 28 '19 at 14:40
  • 1
    Some applications and programming languages (e.g.early versions of Turbo Pascal as well as PostScript) supported a mark/release allocation strategy where the only way to free anything is to free *everything* allocated after a particular object. Such a design is very convenient with an upward-growing heap, since one can free an object and everything allocated after it by setting the "new object location" pointer to the address of an object without having to know the size of it or any other allocated object. – supercat Jan 28 '19 at 19:13
  • @R..: oh good point, yes I was thinking x86 because of the code in the question. But of course only ISAs like MIPS which don't impose a stack-growth direction and leave that purely to software could practically do this now. And that means call will use a link-register, not a baked-in stack push. (Might be possible on ARM32 outside of Thumb mode; [`stm`](http://www.keil.com/support/man/docs/armasm/armasm_dom1361289906470.htm) is available in 4 modes, and decrement-before (`push`) is only one possibility.) I'm not sure if ARM interrupt handling ever uses the stack in HW, though. – Peter Cordes Jan 28 '19 at 19:59
  • 'Originally processors had no stack handling instructions': originally when? The Burroughs B5500 had them in 1960, the PDP-11 in 1970, ... – user207421 Jan 29 '19 at 00:46
  • @user207421, PDP-1 from the late '50s did not have a stack pointer; the call instruction stored the return address in the Accumulator. The PDP-8 (1965); also had no stack pointer: its JSR instruction stored the return address into the first word of the function. (Thus, functions started with a NOP instruction to reserve storage for the return address right there in the code! Nested subroutines but no recursion.) I'm sure there are other examples, both older and more modern than these. Such machines are somewhat hostile to modern programming languages, like Pascal and C. – Erik Eidt Jan 29 '19 at 01:23
  • Downward growing stack makes it slightly easier to align the stack pointer to power-of-2. – Sebastian Redl Feb 17 '22 at 18:09
  • I just use addi with -4 because the sub vsrient is pseudo code – A P Jul 04 '22 at 16:51
8

In your specific system the stack starts from high memory address and "grow" downwards to low memory addresses. (the symmetric case from low to high also exists)

And since you changed from -4 and +4 and it ran it doesn't mean that it's correct. The memory layout of a running program is more complex and dependent on many other factors that may contributed to the fact that you didn't instantly crashed on this extremely simple program.

nadir
  • 773
  • 1
  • 4
  • 10
2

The stack pointer points at the boundary between allocated and unallocated stack memory. Growing it downwards means that it points at the start of the first structure in allocated stack space, with other allocated items following at larger addresses. Having pointers point to the start of allocated structures is much more common than the other way round.

Now on many systems these days, there is a separate register for stack frames which can be somewhat reliably be unwound in order to figure out the call chain, with local variable storage interspersed. The way this stack frame register is set up on some architectures means that it ends up pointing behind the local variable storage as opposed to the stack pointer before it. So using this stack frame register then requires negative indexing.

Note that stack frames and their indexing are a side aspect of compiled computer languages so it is the compiler's code generator that has to deal with the "unnaturalness" rather than a poor assembly language programmer.

So while there were good historical reasons for choosing stacks to grow downward (and some of them are retained if you program in assembly language and don't bother setting up a proper stack frame), they have become less visible.

  • 2
    "Now on many systems these days, there is a separate register for stack frames" you are behind the times. Richer debug information formats have largely removed the need for frame pointers nowadays. – Peter Green Jan 29 '19 at 00:56
0

It's like driving a car: You can drive on the left side of the road, or on the right side, it doesn't really make a difference, except that everyone in a large area needs to agree.

It's the same with the direction the stack grows: It doesn't really matter. Of course the processor hardware will make some assumptions, so changing the direction would mean new hardware, so we are bound by that. But apart from the hardware's use of the stack, changing the direction wouldn't give us any advantage. So why are we still driving on the left side of the road? Because changing it would be pointless.

And personally, giving memory locations names like "top" or "bottom", "forward" or "backward", "up" or "down", "left" or "right" is rather pointless to me.

gnasher729
  • 42,090
  • 4
  • 59
  • 119