Could it be more efficient for systems in general to do away with Stacks and just use Heap for memory management?

Question

It seems to me that everything that can be done with a stack can be done with the heap, but not everything that can be done with the heap can be done with the stack. Is that correct? Then for simplicity's sake, and even if we do lose a little amount of performance with certain workloads, couldn't it be better to just go with one standard (ie, the heap)?

Think of the trade-off between modularity and performance. I know that isn't the best way to describe this scenario, but in general it seems that simplicity of understanding and design could be a better option even if there is a potential for better performance.

In C and C++, you need to explicitly deallocate memory that was allocated on the heap. That is not simpler. — user16764, Oct 28 '12 at 00:08
I used a C# implementation were profiling revealed that stack objects were allocated in a heap-like area with terrible garbage collection. My solution? Move _everything_ possible (e.g. loop variables, temporary variables, etc.) to persistent heap memory. Made the program eat 10x the RAM and run at 10x the speed. — geometrian, Apr 30 '14 at 04:10
@IanMallett: I don't understand your explanation of the problem and solution. Do you have a link with more information somewhere? Typically I find stack-based allocation to be faster. — Frank Hileman, Jun 25 '14 at 15:12
@FrankHileman the basic problem was this: the implementation of C# I was using had extremely poor garbage collection speed. The "solution" was to make all the variables persistent so that at runtime no memory operations happened. I wrote [an opinion piece](http://geometrian.com/programming/tutorials/csharpworse/index.php) a while ago about C#/XNA development in general that also discusses some of the context. — geometrian, Jun 25 '14 at 16:32
@IanMallett: thank you. As a former C/C++ developer who mostly uses C# these days, my experience has been quite different. I find that libraries are the biggest problem. It sounds like the XBox360 platform was half-baked for .net developers. Usually when I have GC problems, I switch to pooling. It does help. — Frank Hileman, Jun 27 '14 at 00:01
@FrankHileman The main thing seems to be lack of amortization of the cleaning operation--which is significant in a tight frame budget. As I wrote, though, I essentially did use a pooled allocator--for everything, including stack variables--to no avail. But, I'll be the first to admit that that was almost four years ago, so perhaps the GC issues I was having have been fixed (although I still do receive that occasional email). — geometrian, Jun 28 '14 at 03:31

tdammers · Accepted Answer · 2011-10-11T08:29:37.603

32

Heaps are bad at fast memory allocation and deallocation. If you want to grab many tiny amounts of memory for a limited duration, a heap is not your best choice. A stack, with its super-simple allocation / deallocation algorithm, naturally excels at this (even more so if it is built into the hardware), which is why people use it for things like passing arguments to functions and storing local variables - the most important downside is that it has limited space, and so keeping large objects in it, or trying to use it for long-lived objects, are both bad ideas.

Getting rid of the stack completely for the sake of simplifying a programming language is the wrong way IMO - a better approach would be to abstract the differences away, let the compiler figure out which kind of storage to use, while the programmer puts together higher-level constructs that are closer to the way humans think - and in fact, high-level languages like C#, Java, Python etc. do exactly this. They offer almost identical syntax for heap-allocated objects and stack-allocated primitives ('reference types' vs. 'value types' in .NET lingo), either fully transparent, or with a few functional differences which you must understand to use the language correctly (but you don't actually have to know how a stack and a heap work internally).

edited Oct 11 '11 at 08:29

answered Oct 08 '11 at 22:06

tdammers

52,406
14
106
154

2

WOW THIS WAS GOOD :) Really concise and informative to a beginner! – Dark Templar Oct 08 '11 at 22:37
1

On many CPUs the stack is handled in hardware, which is an issue outside language but it plays a big part at run time. – Patrick Hughes Oct 08 '11 at 23:45
@Patrick Hughes: Yes, but the Heap is also located in the hardware as well, isn't it? – Dark Templar Oct 09 '11 at 21:23
@Dark What Patrick probably wants to say is that architectures like x86 have special registers to manage the stack and special instructions to put or remove something on / from the stack. That makes it quite fast. – FUZxxl Oct 10 '11 at 01:38
@DarkTemplar: No, the "Heap" does not exist in hardware, only in software. – Rommudoh Oct 10 '11 at 08:34
Not all heaps are "extremely bad at fast memory allocation" - there are many optimisations abound for e.g. small allocations. – JBRWilkinson Oct 10 '11 at 11:59
@JBRWilkinson: edited. – tdammers Oct 11 '11 at 08:29
The other big heap optimization is to partition the heap into per thread chunks. That helps tremendously (if your programming model/style supports it) by reducing lock contention and, with a small-chunk handler, brings heaps close to the speed of a stack. (Stacks are still faster, but restricted to using the exact lifetime of scopes; not everything is so nice…) – Donal Fellows Oct 11 '11 at 09:11
3

@Donal Fellows: All true. But the point is that stacks and heaps both have their strong and weak points, and using them accordingly will yield the most efficient code. – tdammers Oct 11 '11 at 11:25
Stacks also tend to help cache locality thanks to their inherent hierarchy. C# has this to an extent with the heap as well due to the heap compaction step during garbage collection, but it comes at a cost. Stack has this for free, because both allocation and deallocation are simply about saying "the stack now ends here". In current C#/CLR implementations, allocation on the heap is as cheap as on the stack even, but the deallocation is still a lot more expensive than on the stack, not to mention the heap walking required to find dead objects. – Luaan Sep 04 '15 at 12:40

score 8 · Answer 2 · edited Dec 14 '13 at 22:44

No

The stack area in C++ is incredibly fast in comparison. I venture no experienced C++ developers would be open to disabling that functionality.

With C++, you have choice and you have control. The designers were not particularly inclined to introduce features that added significant execution time or space.

Exercising that choice

If you want to build a library or program which requires that every object is allocated dynamically, you can do that with C++. It would execute relatively slowly, but you could then have that 'modularity'. For the rest of us, the modularity is always optional, introduce it as needed because both are required for good/fast implementations.

Alternatives

There are other languages which require that storage for each object is created on the heap; it is quite slow, such that it compromises designs (real world programs) in a way that is worse than having to learn both (IMO).

Both are important, and C++ gives you the power use both effectively for each given scenario. Having said that, the C++ language may not be ideal for your design, if these factors in your OP are important to you (for example, read up on higher level languages).

The heap is actually the same speed as the stack, but doesn't have the specialised hardware support for allocation. On the other hand, there are ways to accelerate heaps a lot (subject to a number of conditions which make them expert-only techniques). — Donal Fellows, Dec 15 '13 at 07:22
@DonalFellows: The hardware support for stacks is irrelevant. What is important is knowing that whenever anything is released one may released anything allocated after it. Some programming languages don't have heaps that can independently free objects, but instead only have a "free everything allocated after" method. — supercat, Dec 24 '13 at 23:52

score 8 · Answer 3 · answered Oct 08 '11 at 23:00

8

Simply put, a stack isn't a little bit of performance. It's hundreds or thousands of times faster than the heap. In addition, most modern machines have hardware support for the stack (like x86) and that hardware functionality for e.g. the call stack cannot be removed.

answered Oct 08 '11 at 23:00

DeadMG

36,794
8
70
139

What do you mean when you say that modern machines have hardware support for the stack? The stack itself is already IN the hardware, isn't it? – Dark Templar Oct 08 '11 at 23:06
1

x86 has special registers and instructions for dealing with the stack. x86 has no support for heaps - such things are created by the OS. – Pubby Oct 09 '11 at 01:48

score 6 · Answer 4 · answered Oct 08 '11 at 22:46

Then for simplicity's sake, and even if we do lose a little amount of performance with certain workloads, couldn't it be better to just go with one standard (ie, the heap)?

Actually the performance hit is likely to be considerable!

As others have pointed out stacks are an extremely efficient structure for managing data that obeys LIFO (last in first out) rules. Memory allocation/freeing on the stack is usually just a change to a register on the CPU. Changing a register is almost always one of the fastest operations a processor can perform.

The heap is usually a fairly complex data structure and allocating/freeing memory will take many instructions to do all the associated bookkeeping. Even worse, in common implementations, every call to work with the heap has to the potential to result in a call to the operating system. Operating system calls are very time consuming! The program typically has to switch from user mode to kernel mode, and whenever this happens the operating system may decide that other programs have more pressing needs, and that your program will need to wait.

fredoverflow · Answer 5 · 2011-10-08T21:58:04.137

5

Simula used the heap for everything. Putting everything on the heap always induces one more level of indirection for local variables, and it puts additional pressure on the Garbage Collector (you have to take into account that Garbage Collectors really sucked back then). That's partly why Bjarne invented C++.

edited Oct 08 '11 at 21:58

answered Oct 08 '11 at 21:07

fredoverflow

6,854
8
39
46

So basically C++ only uses the heap as well? – Dark Templar Oct 08 '11 at 21:51
2

@Dark: What? No. The lack of stack in Simula was an inspiration to do it better. – fredoverflow Oct 08 '11 at 21:57
Ah I see what you mean now! Thanks +1 :) – Dark Templar Oct 08 '11 at 21:59

score 3 · Answer 6 · answered Oct 08 '11 at 22:09

Stacks are extremely efficient for LIFO data, such as the meta-data associated with function calls, for instance. The stack also leverages inherent design features of the CPU. Since performance at this level is fundamental to just about everything else in a process, taking that "small" hit at that level will propagate very widely. In addition, heap memory is moveable by the OS, which would be deadly to stacks. While a stack can be implemented in the heap, it requires overhead that will affect literally every piece of a process at the most granular level.

score 2 · Answer 7 · answered Oct 08 '11 at 22:02

2

"efficient" in terms of you writing code maybe, but certainly not in terms of your software efficiency. Stack allocations are essentially free (it takes only a few machine instructions to move stack pointer and reserve space on the stack for local variables).

Since stack allocation takes almost no time, an allocation even on a very efficient heap will be 100k (if not 1M+) of times slower.

Now imagine how many local variables and other data structures a typical application uses. Every single little "i" that you use as a loop counter being allocated a million times slower.

Sure if the hardware is fast enough, you could write an application that only uses heap. But now imaging what kind of application you could write if you took advantage of heap and used the same hardware.

answered Oct 08 '11 at 22:02

DXM

19,932
4
55
85

When you say "imagine how many local variables and other data structures a typical application uses" what other data structures are you specifically referring to? – Dark Templar Oct 08 '11 at 22:09
1

Are the values "100k" and "1M+" somehow scientific? Or is it just a way to say "a lot"? – Bruno Reis Oct 09 '11 at 02:30
@Bruno - IMHO the 100K and 1M numbers I used is actually conservative estimate to prove a point. If you are familiar with VS and C++, write a program that allocates 100 bytes on the stack and write one that allocates 100 bytes on the heap. Then switch to disassembly view and simply count number of assembly instructions each allocation takes. Heap operations are typically several function calls into windows DLL, there are buckets and linked lists and then there's coalescing and other algorithms. With stack, it can boil down to one assembly instruction "add esp, 100"... – DXM Oct 09 '11 at 04:36
@Dark - by "other" I just meant things which are more complex than simple integers and floats (i.e. structs or classes). I do a lot of work in C++ with data throughput/performance applications and general rule of thumb is to make sure everything that can, go on the stack. It doesn't matter for us how fast computers get because we can always offer more capacity by making sure the code we write is not doing silly things like using new operator all over the place. – DXM Oct 09 '11 at 04:41
2

"100k (if not 1M+) of times slower"? That's quite a bit exaggerated. Let it be two orders of magnitude slower, perhaps three, but that's it. At least, my Linux is able to do 100M heap allocations (+some surrounding instructions) in less than 6 seconds on a core i5, that can't be more than a few hundred instructions per allocation – actually, it's almost certainly less. If it's _six_ orders of magnitude slower than stack there's something majorly wrong with the OS's heap implementation. Sure there is a lot wrong with Windows, but that... – leftaroundabout Oct 09 '11 at 14:59
@left-what performance hit does your core i5 take when the next instruction is not in L1 cache? That what happens when you start calling functions not at the current execution address. If you ever debugged heap corruptions, you'll notice that crash is usually 6-8 functions deep inside the heap implementation. With simply move if ESP register, the crazy pipelining they have on modern CPUs simply swallows up that one instruction. The reason why I through out a really large number that cause all this controversy is because if you do the math you are dividing something by ~0 – DXM Oct 09 '11 at 19:26
1

moderators are probably about to kill this whole comment thread. So here's the deal, I concede that actual numbers were pulled out of my...., but let's agree the factor is really, really big and not make more comments :) – DXM Oct 09 '11 at 19:28

score 2 · Answer 8 · answered Oct 09 '11 at 03:13

You might have an interest in "Garbage Collection is Fast, but a Stack is Faster".

http://dspace.mit.edu/bitstream/handle/1721.1/6622/AIM-1462.ps.Z

If I read it correctly, these guys modified a C compiler to allocate "stack frames" on the heap, and then use garbage collection to de-allocate the frames instead of popping the stack.

Stack-allocated "stack frames" outperform heap-allocated "stack frames" decisively.

score 1 · Answer 9 · answered Oct 08 '11 at 22:09

1

How is the call stack going to work on a heap? Essentially, you would have to allocate a stack on the heap in every program, so why not have the OS+hardware do that for you?

If you want things to be really simple and efficient, just give the user their chunk of memory and let them deal with it. Of course, nobody wants to implement everything their self and that is why we have a stack and a heap.

answered Oct 08 '11 at 22:09

Pubby

3,290
1
21
26

Strictly speaking a "call stack" isn't a required feature of a programming language runtime environment. e.g. A straightforward implementation of a lazily evaluated functional language by graph reduction (which I have coded) has no call stack. But the call stack is a very widely useful and widely used technique, especially as modern processors assume you use it and are optimized for its use. – Ben Oct 09 '11 at 01:53
@Ben - while it is true (and a good thing) to abstract things like memory allocation out of a language, this does not change the now prevailing computer architecture. Hence, your graph reduction code will still use the stack when running - like it or not. – Ingo Oct 09 '11 at 07:46
@Ingo Not really in any meaningful sense. Sure, the OS will initialise a section of memory traditionally called "the stack", and there'll be a register pointing to it. But functions in the source language don't wind up represented as stack frames in call order. Function execution is entirely represented by manipulation of data structures in the heap. Even without using last-call optimization it's not possible to "overflow the stack". That's what I mean when I say there's nothing fundamental about "the call stack". – Ben Oct 09 '11 at 08:35
I do not speak of the functions of the source language but of the functions in the interpreter (or whatever) that actually perform the graph reduction. Those will need a stack. This is evident, as contemporary hardware does not do graph reduction. Hence, your graph reduction algorithm is ultimately mapped to machine ode, and I bet there are subroutine calls among them. QED. – Ingo Oct 09 '11 at 18:28

score 1 · Answer 10 · answered Oct 08 '11 at 23:07

Both stack and heap are required. They are used in different situations, for example:

Heap allocation has a limitation that sizeof(a[0])==sizeof(a[1])
Stack allocation has a limitation that sizeof(a) is compile-time constant
Heap allocation can do loops, graphs etc complex data structures
Stack allocation can do compile-time sized trees
Heap requires tracking of ownership
Stack allocation and deallocation is automatic
Heap memory can be easily passed from one scope to another via pointers
Stack memory is local to each function and objects need to be moved to upper scope to extend their lifetime(or stored inside objects instead of inside member functions)
Heap is bad for performance
Stack is pretty quick
Heap objects are returned from functions via pointers that take ownership. Or shared_ptrs.
Stack objects are returned from functions via references that do not take ownership.
Heap requires matching every new with correct kind of delete or delete[]
Stack objects use RAII and constructor initialization lists
Heap objects can be initialized any point inside a function, and cannot use constructor parameters
Stack objects use constructor parameters for initialization
Heap uses arrays and array size can change on runtime
Stack is for single objects, and size is fixed on compile-time

Basically the mechanisms cannot be compared at all because so many details are different. The only thing common with them is that they both handle memory somehow.

score 1 · Answer 11 · answered Oct 27 '12 at 19:09

Modern computers have several layers of cache memory in addition to a large, but slow, main memory system. One can make dozens of accesses to the fastest cache memory in the time required to read or write one byte from the main memory system. Thus, accessing one location a thousand times is much faster than accessing 1,000 (or even 100) independent locations once each. Because most applications repeatedly allocate and deallocate small amounts of memory near the top of the stack, the locations on the top of the stack get used and re-used an enormous amount, such that the vast majority (99%+ in a typical application) of stack accesses can be handled using cache memory.

By contrast, if an application were to repeatedly create and abandon heap objects to store continuation information, every version of every stack object that was ever created would have to be written out to main memory. Even if the vast majority of such objects would be completely useless by the time the CPU wanted to recycle the cache pages they started out in, the CPU would have no way of knowing that. Consequently, the CPU would have to waste a lot of time performing slow memory writes of useless information. Not exactly a recipe for speed.

Another thing to consider is that in many cases it's useful to know that an object reference passed to a routine will not be used once the routine exits. If parameters and local variables are passed via the stack, and if inspection of the routine's code reveals that it does not persist a copy of the passed-in reference, then the code which calls the routine can be sure that if no outside reference to the object existed before the call, none will exist afterward. By contrast, if parameters were passed via heap objects, concepts like "after a routine returns" become somewhat more nebulous, since if code kept a copy of the continuation, it would be possible for the routine to "return" more than once following a single call.

Could it be more efficient for systems in general to do away with Stacks and just use Heap for memory management?

11 Answers11

Linked