93

Private variables are a way to hide complexity and implementation details to the user of a class. This is a rather nice feature. But I do not understand why in c++ we need to put them in the header of a class. I see two annoying downsides to this:

  • It clutters the header from the user
  • It force recompilation of all client libraries whenever the internals are modified

Is there a conceptual reason behind this requirement? Is it only to ease the work off the compiler?

Billjk
  • 1,241
  • 1
  • 12
  • 18
Simon Bergot
  • 7,930
  • 3
  • 35
  • 54

5 Answers5

91

It is because the C++ compiler must know the actual size of the class in order to allocate the right amount of memory at instantiation. And the size includes all members, also private ones.

One way to avoid this is using the Pimpl idiom, explained by Herb Sutter in his Guru of the Week series #24 and #28.

Update

Indeed, this (or more generally, the header / source file distinction and #includes) is a major hurdle in C++, inherited from C. Back in the days C++ C was created, there was no experience with large scale software development yet, where this starts to cause real problems. The lessons learned since then were heeded by designers of newer languages, but C++ is bound by backward compatibility requirements, making it really hard to address such a fundamental issue in the language.

Péter Török
  • 46,427
  • 16
  • 160
  • 185
  • Isn't this kind of information only contained in the class library? Is it used for linking? – Simon Bergot Apr 10 '12 at 10:02
  • @Simon, what do you mean by "class library"? – Péter Török Apr 10 '12 at 10:07
  • I mean the collection of object files containing the class definition and methods – Simon Bergot Apr 10 '12 at 10:09
  • @Simon, I am not an expert in compilers, but yes, I would assume it is stored in the object file. – Péter Török Apr 10 '12 at 10:14
  • Pimpl could easily be integrated into the core language by just adding some syntactic sugar. That'd be backward compatible. – Per Johansson Apr 10 '12 at 12:37
  • 15
    When C++ was created, AT&T/Bell Labs (Stroustrups employer at the time) certainly had experience with large-scale C development. Their 5ESS phone switch software was at the time probably the largest single C program in the world. Early ideas about OO are already visible in that code base, and Cfront mimicked those techniques. However, the notion of `private` is more modern. – MSalters Apr 10 '12 at 12:47
  • @MSalters, interesting to know. I would be curious though how large was "large" in those days? – Péter Török Apr 10 '12 at 12:55
  • 2
    @PéterTörök: About 15 MLoc (turn of the century), not counting comments. Clean-build times were measured in days. – MSalters Apr 10 '12 at 13:20
  • @MSalters, whoops, I stand corrected. Modified my wording above to "when C was created...". – Péter Török Apr 10 '12 at 13:49
  • 3
    In C, you'd simply put the allocator in a library function; the client wouldn't be able to allocate such a structure at all. This increases the overhead a bit, but makes migrating the code across versions trivial so it's often worthwhile. However, it does tend to lead to a code style that's very distinct from that seen with C++. – Donal Fellows Apr 10 '12 at 14:07
  • @Simon, the object files aren't aware of C++ features at all, and they don't include information about the types used by functions within a specific object file. An object file could not be used as a substitute for a .h file, if that's what you were getting at. They only include a bunch of functions, one for each method and free function, with name decoration to make sure functions from different classes/namespaces/overloads don't collide. The information in the header file is used by the compiler to generate the correct machine code implementing these functions. – nw. Jun 18 '20 at 15:08
20

The class definition needs to be sufficient for the compiler to produce an identical layout in memory wherever you've used an object of the class. For example, given something like:

class X { 
    int a;
public:
    int b;
};

The compiler will typically have a at offset 0, and b at offset 4. If the compiler saw this as just:

class X { 
public:
    int b;
};

It would "think" that b should be at offset 0 instead of offset 4. When code using that definition assigned to b, code using the first definition would see a get modified, and vice versa.

The usual way to minimize the effects of making changes to the private parts of the class is usually called the pimpl idiom (about which I'm sure Google can give a great deal of information).

Jerry Coffin
  • 44,385
  • 5
  • 89
  • 162
  • 1
    I am asking about a design decision. Of course you would need to put the private member declaration somewhere for the language to work. But why should it be in the header and not in a more private place? – Simon Bergot Apr 10 '12 at 10:04
  • 8
    @Simon: The header is all the compiler sees to tell it what the class/struct looks like. There have been discussions of adding something like modules to C++, which would hide that sort of data a bit more, but so far it hasn't been approved (though it hasn't been entirely dropped either). – Jerry Coffin Apr 10 '12 at 10:05
  • 4
    Still, a trivial rule would be to allocate such ".cpp-defined" private members last. That means the offsets of public and "normal" private members would not depend on them. IMO the real reason is that you can't inherit from such a class, since the derived part has to follow even those private members. – MSalters Apr 10 '12 at 12:51
3

There are most likely several reasons. While private members can't be accessed by most other classes, they can still be accessed by friend classes. So at least in this case they may be needed in the header, so the friend class can see they exist.

The recompilation of dependent files may depend on your include structure. Including the .h files in a .cpp file instead of another header can in some cases prevent long chains of recompilations.

thorsten müller
  • 12,058
  • 4
  • 49
  • 54
3

The primary reason this is needed is that any code that uses a class needs to know about private class members in order to generate code that can handle it.

Consider the following class:

//foo.h
class foo {
    char private_member[0x100];
public:
    void do_something();
};

which is used by the following code:

#include "foo.h"
void f() {
    foo x;
    x.do_something();
}

If we compile this code on 32-bit Linux using gcc, with some flags to simplify the analysis, the function f compiles to (with comments):

;allocate 256 bytes on the stack for a foo, plus 4 bytes for a foo*
   0:   81 ec 04 01 00 00       sub    esp,0x104
;the trivial constructor for foo is elided here
;compute the address of x
   6:   8d 44 24 04             lea    eax,[esp+0x4]
;pass the foo* to the function being called (the implicit first argument, this)
   a:   89 04 24                mov    DWORD PTR [esp],eax
;call x.do_something()
   d:   e8 fc ff ff ff          call   e <_Z1fv+0xe>
;deallocate the stack space used for this function
  12:   81 c4 04 01 00 00       add    esp,0x104
;return
  18:   c3                      ret    

There are two things of note here:

  • The code for f() needs to know sizeof(foo) in order to allocate the correct amount of space for it.
  • There is no call to foo's constructor. This is because the constructor is trivial, but it is impossible to know if foo has a trivial constructor without knowing its private class members.

Essentially, while the programmer does not need to know about the implementation of a class in order to use it, the compiler does. The C++ designers could have allowed private class members to be unknown to client code by introducing some levels of indirection, but that would have serious performance implications in some cases. Instead, the programmer can decide to implement this indirection themselves (via the pImpl idiom, for instance) if they decide the trade-off is worth it.

Chris
  • 133
  • 4
  • excellent answer – Martin K Feb 12 '20 at 10:59
  • Could you please explain why the extra 4 bytes for a `foo*`? [This](https://godbolt.org/z/6zWhGT) shows only 256 bytes being `sub`tracted from the stack. – Zoso Oct 12 '20 at 20:54
  • 1
    @Zoso It's a compiler optimization. The compiler could instead subtract 256 bytes from the stack and then explicitly `push` the `foo*` to pass it as an argument (the implicit `this` pointer argument), but it saves some instructions by reserving space for both at once. Your example is a 64-bit compiler, so it uses a different calling convention (`this` is passed in a register rather than the stack), so reserving the space is unnecessary. – Chris Oct 13 '20 at 02:02
0

The underlying problem is that the header file in C++ must contain information that the compiler needs, but is also used as reference for a human user of a class.

As the human user of a class I don’t care about many things. Private fields are one, in-line implementations of functions and methods are another one. The compiler on the other hand cares a lot.

More modern languages like Swift solve this using a bit more CPU time: There is only one permanently stored file, and that is the source file. The compiler creates behind the scenes something like a header file containing everything it needs to compile other source files using the class. And the idea allows showing the human user what would in C++ be the header file, with only those things included that the user wants, that is no private methods, but with all the comments for types, constants, methods etc. present. Exactly what the human user wants. (The Xcode IDE optionally shows how methods are called from a completely different language).

gnasher729
  • 42,090
  • 4
  • 59
  • 119