8

In my day to day programming, I tend to use very few pointers, not only because I want to keep my code simple and error free, but because I assume that the programming that I do does not have any objects large enough to benefit from delegating them to the heap.

At most; my largest object would probably be a QString of maybe a million characters.

However, I just assume it is not large enough to benefit from the heap; How should I go about estimating whether an object I have created is large enough to benefit from being turned into a pointer?

Anon
  • 3,565
  • 3
  • 27
  • 45
  • 4
    I guess avoiding pointers will not make your code less error prone. When dealing with objects you don't want to pass the contents and copy it back. That's simply poor design, –  Nov 10 '16 at 22:34

6 Answers6

14

Since most implementations take the heap and the stack from the same block of memory (growing from either end) it doesn't matter. Size is not a reason to prefer the heap over the stack. The lifetime of your object is. Should it die once it goes out of scope or not? If not, when?

candied_orange
  • 102,279
  • 24
  • 197
  • 315
  • Isn't the heap/stack decision mostly an implementation detail of the compiler nowadays anyway? I didn't even know you could make the stack/heap distinction yourself in C++ – Robert Harvey Nov 11 '16 at 00:50
  • 3
    @RobertHarvey The heap/stack distinction is very important in C++, as it determines the lifetime of the object. Stack-allocated objects (aka. local variables) are destroyed automatically when the scope is left (even in case of exceptions). In contrast, heap-allocated objects must be deleted manually, which is error-prone. Only for later languages with garbage collection does the heap/stack distinction become irrelevant, because the GC now manages the object life time. – amon Nov 11 '16 at 09:55
  • 1
    You say it doesn't matter, but I have had instances where the stack ran into the heap because there were too many large objects on it. The default stack size on macOS, for example, is about 1.5 megs. If they OP is putting strings that are 1MB on the stack that could very much be a problem. My guess is that `QString` internally allocates the memory for the actual string on the heap, though, so they probably haven't hit this. – user1118321 Dec 17 '17 at 21:46
  • @user1118321 if a room is over capacity the fire Marshall will kick you out no matter which door you walk in through. – candied_orange Dec 17 '17 at 21:53
  • 1
    Yeah, but we're talking about 4-6 orders of magnitude difference in size. 10 people in a coat closet will get the fire marshal called. 10 people in a football stadium won't. – user1118321 Dec 18 '17 at 00:09
  • @user1118321 again, the stack and the heap typically occupy the SAME block of memory. They start at different ends and grow towards each other. Which is why I'm talking about the same room. – candied_orange Dec 18 '17 at 00:25
9

A million of characters are not to be passed by value, because passing by value (beyond a couple of words which could hold in registers) is done via the stack. And the stack space is always limited.

Fortunately, when you use a QString only a small object is passed by value: the object itself uses a pointer to a memory region where the millions of bytes are stored. Passing by value means nevertheless that a copy is made (except if move semantics can be used) copying the millions of bytes. When by value is not required, prefer passing a const reference.

Finally you can consider using pointers in a safe fasion by using the standard smart pointers such as unique_ptr or shared_ptr.

Christophe
  • 74,672
  • 10
  • 115
  • 187
  • 1
    Actually, most Qt objects (like QString) use copy-on-write semantics, so the millions of characters aren't copied if you just pass by value. Pass by (const) reference is still better, because it avoids an (atomic) update of the reference counter. – Bart van Ingen Schenau Nov 11 '16 at 12:02
  • @BartvanIngenSchenau Thanks for this useful additional insight in the QT internals – Christophe Nov 11 '16 at 12:25
7

So the question basically is: “when should I use new?” Local variables always use stack space, but we can explicitly allocate objects on the heap with a new expression.

When deciding for or against new, we do not decide based on the size of the object, as there are more important criteria:

  • Do we statically know the size of the object? It is rarely sensible to allocate an array on the stack – usually, arrays are either static or allocated dynamically.

  • What is the lifetime of the object? Does its lifetime end when the control flow leaves the current scope? If so, a local variable should be used to create the object. If it should be able to live longer, we need a heap-allocated object.

  • Are we trying to write exception-safe code? (Yes!) If so, handling ownership of more than one naked pointer is extremely tedious and rather bug-prone. Even if we allocate an object with new, we should manage it through a smart pointer like std::unique_ptr.

Most objects are fairly small. E.g. a std::vector local variable probably uses only three words of stack memory, regardless of the amount of data it stores. Internally, it allocates a buffer for the data on the heap. The same would be true for string types. If you create a class that is rather large, there are ways to reduce the size. But long before that happens, the growing number of instance fields should remind you of the single-responsibility principle: Is this class doing too many unrelated things? Can we describe the class in a simpler manner as a collaboration of multiple smaller objects? But that is a design question.

When we pass an object to a function as a parameter, we as callers cannot decide how the variable is passed. That is part of the function signature. Note that we can pass objects by pointer or by reference even when they were allocated on the stack. Possible argument types are:

  • Const references const T& arg are the usual way to pass parameters. A reference is similar to a pointer, so the size of the type T is irrelevant.

  • A copy T arg is in general only done in one of two cases:

    1. The size is very small (up to a couple of words large), and the type is cheap to copy, and it has no virtual methods. This describes the built-in numeric types, and perhaps small user-defined POCOs (plain old C objects, such as structs).
    2. We would have to create a copy anyway. This is sometimes seen in operator overloads like T operator+(T lhs, T const& rhs) { lhs += rhs; return lhs; }.
  • Non-const references T& arg are used whenever we need to modify the object, or when the type T was not written with const-correctness in mind.

  • Pointers const T* arg or T* arg are rarely seen in modern C++ code. A reference can only be created from an existing object. In contrast, pointers can be null, or point into invalid memory. As such, pointers shouldn't be used to avoid copies (use references), or to make virtual method calls possible (again, use references). This leaves pointers for pointer-y stuff, like: using the pointer as an array (prefer the standard collections), when you do pointer arithmetic (prefer iterators), or when you want to reassign a pointer to point to a different object, which isn't possible with references.

amon
  • 132,749
  • 27
  • 279
  • 375
2

At most; my largest object would probably be a QString of maybe a million characters.

sizeof(QString) is 8 bytes on 64-bit systems no matter how big or small its contents are, because the contents are always allocated on the heap, as with the case of std::vector. QString is effectively just a handle to memory on the heap. So even if you create a QString object on the stack and insert a million characters to it, that's still just using 8 bytes of stack space at most, if not living directly in a register.

If you tried to allocate a million unicode characters on the stack, that'd generally be asking for an overflow in many cases.

In most of the normal kind of C++ code people write, you generally don't need to think about the distinction of stack vs. heap so much since it's all abstracted away from you with the standard data structures you use. The distinction is generally more about whether you use, say, unique_ptr or not in implementing a class. If you do, that implies the additional heap overhead, an extra layer of indirection, and some memory fragmentation.

If you ever do work with objects that don't manage their own memory separately on the heap, like you just want to use std::array<T, N> on the stack, then maybe a sane range (just a rough and crude number to go by) is typically in the range of hundreds of bytes to maybe a couple of kilobytes for a function which isn't called recursively or calling other functions that allocate a large amount of stack space to be reasonably safe against stack overflows on most desktop machines and operating systems at least. If it's an array of 32-bit integers, I might use the heap if there are more than 128 to store (more than 512 bytes), e.g., at which point you might use std::vector instead. Example:

void some_func(int n, ...)
{
    // 'std::array' is a purely contiguous structure (not
    // storing pointers to memory allocated elsewhere), so
    // here we are actually allocating sizeof(int) * 128
    // bytes on the stack.
    std::array<int, 128> stack_array;
    int* data = stack_array.data();

    // Only use heap if n is too large to fit in our
    // stack-allocated array of 128 integers.
    std::vector<int> heap_array;
    if (n > stack.size())
    {
        heap_array.resize(n);
        data = heap_array.data();
    }

    // Do stuff with 'data'.
    ...      
}

As for move semantics, I'd apply a similar rule there because it's actually pretty cheap to deep copy even 512 bytes if that's just copying memory from stack to stack or from two regions of memory with high temporal locality. Even if you have a huge class, Foo, with a dozen data members where sizeof(Foo) is like a whopping 256 bytes, I wouldn't use that so much to allocate its contents on the heap if it can be avoided. Typically you're perfectly fine if the decision to use the heap or not is based more on things other than performance, like avoiding stack overflows and modeling variable-sized data structures (which imply heap unless they're optimized for common cases involving small sizes, like small string optimizations which avoid the extra heap allocation for small strings but use the extra heap allocation for big ones), allowing shared ownership with shared_ptr, using a pimpl to reduce compile-time dependencies, or allocating subtypes for polymorphism where it might be unwieldy to avoid heap allocations in those cases.

1

It depends.

First and foremost, what's most important is semantics. Returning by value makes no sense for reference types.

Also, with the advent of move semantics, passing by value doesn't imply making a copy, so it may be efficient even for 'large' objects. And let's not forget about RVO.

That being said, passing by value can be expensive even with small objects, especially if they themselves have internal heap allocations, or if it's done in a tight loop.

You can keep on passing by value until profilling shows it to be a problem, basically. My heuristic for this is not to worry about anything less than ~100 bytes, especially if it needs no heap allocation/synchronization, unless it's in a tight loop.

Bwmat
  • 769
  • 1
  • 6
  • 14
1

For the most part, Candied Orange's answer is correct. In general, if you want something to go away when you exit the function, put it on the stack. However, there is one subtlety you need to be aware of. A stack has a size. If you have a multi-threaded application, there will be multiple stacks – one for each thread. These stacks are often contiguous. That is one stack comes immediately after the other in the block of memory allocated for your process. If you put too much information on one stack, it can overflow and hit the next stack, possibly corrupting it, or possibly writing over a protected page and causing a crash.

This is more likely with large objects that are composed of smaller objects, and with recursive functions that call themselves and put multiple copies of their local variables onto the stack. If you have a particularly large object on your stack in a recursive function (or on the stack in several non-recursive functions that call each other), you can have problems. So it is important to be aware of the size of objects you are putting onto the stack. As stack depth grows, and the size of objects you're working with on the stack grows, you run the risk of problems.

There are a number of solutions. You can do what QString does and have the object allocate its memory on the heap so that putting one on the stack only uses up a little bit of stack space and the rest is on the heap. You can just allocate from the heap and use smart pointers on the stack, as others have mentioned. Or you can look for smaller objects that don't have this problem. It all depends on the problem you're solving.

But just be aware there are cases where the size of your object may determine where it should be allocated or how it should be handled.

user1118321
  • 4,969
  • 1
  • 17
  • 25