15

Why ISO/ANSI didn't standardize C++ at the binary level? There are many portability issues with C++, which is only because of lack of it's standardization at the binary level.

Don Box writes, (quoting from his book Essential COM, chapter COM As A Better C++)

C++ and Portability


Once the decision is made to distribute a C++ class as a DLL, one is faced with one of the fundamental weaknesses of C++, that is, lack of standardization at the binary level. Although the ISO/ANSI C++ Draft Working Paper attempts to codify which programs will compile and what the semantic effects of running them will be, it makes no attempt to standardize the binary runtime model of C++. The first time this problem will become evident is when a client tries to link against the FastString DLL's import library from a C++ developement environment other than the one used to build the FastString DLL.

Are there more benefits Or loss of this lack of binary standardization?

Nawaz
  • 1,515
  • 1
  • 12
  • 22
  • Is this better asked on http://programmers.stackexchange.com/, seeing as how it's more of a subjective question? – Stephen Furlani Dec 22 '10 at 15:59
  • 1
    Related question of mine actually: http://stackoverflow.com/questions/2083060/what-could-c-c-lose-if-they-defined-a-standard-abi – Khaled Alshaya Dec 22 '10 at 16:00
  • 4
    Don Box is a zealot. Ignore him. – John Dibling Dec 22 '10 at 16:16
  • 8
    Well, C isn't standardized by ANSI/ISO in the binary level either; OTOH C has a *de facto* standard ABI rather than a *de jure* one. C++ doesn't have such a standardized ABI because different manufacturers had different goals with their implementations. For example, exceptions in VC++ piggyback on top of Windows SEH. POSIX has no SEH and therefore taking that model wouldn't have made sense (So G++ and MinGW don't use that model). – Billy ONeal Dec 22 '10 at 16:18
  • @Stephan Furlani: It is not subjective at all; there is an objective answer; the lack of standardisation is deliberate. –  Dec 22 '10 at 16:47
  • 3
    I see this as a feature not a weakness. If you bind an implementation to a specific ABI then we will never have innovation and new hardware will be bound to the design of the language (and since there is 15 years between each new version that's a long time in the hardware industry) and by stifling innovate new ideas to make the code execute more efficiently will not be made. The price is that all code in an executable must be built by the same compiler/version (a problem but not a major one). –  Dec 22 '10 at 19:29
  • I agree that it is better suited for http://programmers.stackexchange.com. If it reappears on SO, then I was wrong. – mmyers Dec 22 '10 at 22:01

10 Answers10

17

Languages with binary-compatible compiled form are a relatively new phase[*], for example the JVM and .NET runtimes. C and C++ compilers usually emit native code.

The advantage is that there is no need for a JIT, or a bytecode interpreter, or a VM, or any other such thing. For example, you can't write the bootstrap code that runs at machine startup as nice, portable Java bytecode, unless perhaps the machine can natively execute Java bytecode, or you have some kind of converter from Java to a non-binary-compatible native executable code (in theory: not sure this can be recommended in practice for bootstrap code). You could write it in C++, more or less, albeit not portable C++ even at the source level, since it will do a lot of messing with magic hardware addresses.

The disadvantage is that of course native code only runs at all on the architecture it was compiled for, and the executables can only be loaded by a loader that understands their executable format, and only link with and call into other executables for the same architecture and ABI.

Even if you get that far, linking two executables together will only actually work correctly as long as: (a) you don't violate the One Definition Rule, which is easy to do if they were compiled with different compilers/options/whatever, such that they were using different definitions of the same class (either in a header, or because they each statically linked against different implementations); and (b) all relevant implementation details such as structure layout are identical according to the compiler options in force when each was compiled.

For the C++ standard to define all of this would remove a lot of the freedoms currently available to implementers. Implementers are using those freedoms, especially when writing very low-level code in C++ (and C, which has the same issue).

If you want to write something that looks a bit like C++, for a binary-portable target, there's C++/CLI, which targets .NET, and Mono so that you can (hopefully) run .NET elsewhere than Windows. I think it's possible to persuade MS's compiler to produce pure CIL assemblies that will run on Mono.

There are also potentially things that can be done with for example LLVM to create a binary-portable C or C++ environment. I don't know that any widespread example has emerged, though.

But these all rely on fixing a lot of things that the C++ makes implementation-dependent (such as the sizes of types). Then the environment that understands the portable binaries, must be available on the system where the code is to run. By allowing non-portable binaries, C and C++ can go places where portable binaries can't, and that's why the standard doesn't say anything at all about binaries.

Then on any given platform, implementations usually still don't provide binary compatibility between different sets of options, although the standard isn't stopping them. If Don Box doesn't like that Microsoft's compilers can produce incompatible binaries from the same source, according to compiler options, then it's the compiler team he needs to complain about. The C++ language does not forbid a compiler or an OS from pinning down all the necessary details, so once you limit yourself to Windows it's not a fundamental problem with C++. Microsoft has chosen not to do so.

The differences often manifest as one more thing that you can get wrong and crash your program, but there may be considerable gains to be made in efficiency between, for example, incompatible debug vs release versions of a dll.

[*] I'm not sure when the idea was first invented, probably 1642 or something, but their current popularity is relatively new, compared to the time when C++ committed to the design decisions which prevent it defining binary-portability.

Steve Jessop
  • 5,051
  • 20
  • 23
  • @Steve But C has a well defined ABI on i386 and AMD64, so I can pass a pointer to a function compiled by GCC version X to a function compiled by MSVC version Y. Doing that with a C++ function is impossible. – user877329 Jun 15 '15 at 07:39
7

Cross-platform and cross-compiler compatibility were not the primary goals behind C and C++. They were born in an era, and intended for purposes for which platform-specific and compiler-specific minimizations of time and space were crucial.

From Stroustrup's "The Design and Evolution of C++":

"The explicit aim was to match C in terms of run-time, code compactness and data compactness. ... The ideal - which was achieved - was that C with Classes could be used for whatever C could be used for."

Andy Thomas
  • 1,250
  • 8
  • 9
  • 1
    +1 -- exactly. How would one build a standard ABI that worked on both ARM and Intel boxes? Wouldn't make sense! – Billy ONeal Dec 22 '10 at 16:20
  • 1
    unfortunately, it failed in this. You can do everything C does... except dynamically load a C++ module at runtime. you have to 'revert' to using C functions in the exposed interface. – gbjbaanb Mar 27 '12 at 19:13
6

It's not a bug, it's a feature! This gives implementors freedom to optimize their implementation at the binary level. The little-endian i386 and its offspring are not the only CPUs that have or do exist.

6

The problem described in the quotation is caused by the quite deliberate avoidance of standardisation of symbol-name mangling schemes (I think "standardisation at the binary level" is a misleading phrase in this respect although the issue is related to a compiler's Application Binary Interface (ABI).

C++ encodes a function or data object's signature and type information, and its class/namespace membership into the symbol-name, and different compilers are allowed to use different schemes. Consequently a symbol in a static library, DLL, or object file will not link with code compiled using a different compiler (or possibly even a different version of the same compiler).

The issue is described and explained probably better than I can here, with examples of schemes used by different compilers.

The reasons for the deliberate lack of standardisation are also explained here.

Clifford
  • 191
  • 1
  • 4
3

The aim of ISO/ANSI was to standardize the C++ language, issue that seems to be complex enough to require years to have an update of the language standards and compiler support.

The binary compatibility is much more complex, given that the binaries need to run on different CPUs' architectures and different OS environments.

  • True, but the problem described in the quotation is in fact nothing to do with "binary level compatibility" (despite the author's use of the term) in any sense other than such things are defined in something called an "Application Binary Interface". He is in fact describing the issue of incompatible name mangling schemes. –  Dec 22 '10 at 16:53
  • @Clifford : name mangling scheme is just a subset of the binary level compatibility. the latter is more like an umbrella term! – Nawaz Dec 24 '10 at 06:31
  • I doubt there's a problem with trying to run a Linux binary on a windows machine. Things would be a lot better if there was an ABI per-platform, as at least then a script language could dynamically load and run a binary on the same platform, or apps could use components built with a different compiler. You can't use a C dll on linux today, and no-one complains, but that C dll can still be loaded by a python app which is where the benefit accrues. – gbjbaanb Mar 27 '12 at 19:04
2

As Andy said cross platform compatibility wasn't a big goal, whereas broad platform and hardware implementation was a goal, with the net result that you can write conforming implementations for a very wide selection of systems. Binary standardization would have made this practically unachievable.

C compatibility was also important and would have significantly complicated this.

There has subsequently been some efforts to standardise the ABI for a subset of implementations.

Flexo
  • 712
  • 2
  • 7
  • 17
1

I think the lack of a standard for C++ is a problem in today's world of de-coupled, modular programming. However, we have to define what we want from such a standard.

No-one in their right mind wants to define the implementation or platform for a binary. So you can't take a x86 Windows dll and start using it on a x86_64 Linux platform. That would be a bit much.

However, what people do want is the same thing we have with C modules - a standardised interface at the binary level (ie once compiled). Currently, if you want to load a dll in a modular app, you export C functions and bind to them at runtime. You cannot do that with a C++ module. It would be great if you could, which would also mean that dlls written with one compiler could be loaded by a different one. Sure, you still wouldn't be able to load a dll built for an incompatible platform, but that's not a problem that needs fixing.

So if the standards body defined what the interface a module exposed, then we'd have a lot more flexibility in loading C++ modules, we wouldn't have to expose C++ code as C code, and we'd probably get a lot more use of C++ in script languages.

We also wouldn't have to suffer things like COM that attempt to provide a solution to this problem.

gbjbaanb
  • 48,354
  • 6
  • 102
  • 172
  • 1
    +1. Yeah, I agree. The other answers here basically handwave away the problem by saying that binary standardization would prohibit architecture-specific optimizations. But that's not the point. Nobody is arguing for some cross-platform binary executable format. The problem is that there's no standard *interface* to load C++ modules dynamically. – Charles Salvia May 03 '12 at 15:06
1

There are many portability issues with C++, which is only because of lack of it's standardization at the binary level.

I don't think it's quite this simple. The answers provided already provide excellent rationale as to the lack of focus on standardization, but C++ may be too rich of a language to be well-suited to genuinely compete with C as an ABI standard.

We can go into name mangling resulting from function overloading, vtable incompatibilities, incompatibilities with exceptions throw across module boundaries, etc. All of these are a real pain, and I do wish they could at least standardize vtable layouts.

But an ABI standard isn't just about making C++ dylibs produced in one compiler capable of being used by another binary built by a different compiler. ABI is used cross-languages. It would be nice if they could at least cover the first part, but there's no way I see C++ ever truly competing with C at the sort of universal ABI level so crucial for making the most widely-compatible dylibs.

Imagine a simple pair of functions exported like this:

void f(Foo foo);
void f(Bar bar, int val);

... and imagine Foo and Bar were classes with parameterized constructors, copy constructors, move constructors, and non-trivial destructors.

Then take the scenario of a Python/Lua/C#/Java/Haskell/etc. developer trying to import this module and use it in their language.

First we'd need a name mangling standard for how to export symbols utilizing function overloading. This is an easier part. Yet it shouldn't really be name "mangling". Since users of the dylib have to look up symbols by name, the overloads here should lead to names which don't look like a complete mess. Maybe the symbol names could be like "f_Foo" "f_Bar_int" or something of that sort. We'd have to be sure they can't clash with a name actually defined by the developer, perhaps reserving some symbols/characters/conventions for ABI usage.

But now a tougher scenario. How does the Python developer, for example, invoke move constructors, copy constructors, and destructors? Maybe we could export those as part of the dylib. But what if Foo and Bar are exported in different modules? Should we duplicate the symbols and implementations associated in this dylib or not? I'd suggest we do, since it might get really annoying really fast otherwise to start having to get tangled up in multiple dylib interfaces just to create an object here, pass it here, copy one there, destroy it here. While the same basic concern could somewhat apply in C (just more manually/explicitly), C tends to avoid this just by nature of the way people program with it.

This is just a small sample of the awkwardness. What happens when one of the f functions above throws a BazException (also a C++ class with constructors and destructors and deriving std::exception) into JavaScript?

At best I think we can only hope to standardize an ABI that works from one binary produced by one C++ compiler to another binary produced by another. That would be great, of course, but I just wanted to point this out. Typically accompanying such concerns to distribute a generalized library that works cross-compilers is also often the desire to make it really generalized and compatible cross-languages.

Suggested Solution

My suggested solution after struggling to find ways to use C++ interfaces for APIs/ABIs for years with COM-style interfaces is to just become a "C/C++" (pun) developer.

Use C to create those universal ABIs, with C++ for the implementation. We can still do things like export functions that return pointers to opaque C++ classes with explicit functions to create and destroy such objects on the heap. Try to fall in love with that C aesthetic from an ABI perspective even if we're totally using C++ for the implementation. Abstract interfaces can be modeled using tables of function pointers. It's tedious to wrap this stuff up into a C API, but the benefits and the compatibility of the distribution that comes with that will tend to make it very worthwhile.

Then if we don't like using this interface so much directly (we probably shouldn't at least for RAII reasons), we can wrap it all we want in a statically-linked C++ library we ship with the SDK. C++ clients can use that.

Python clients won't want to use either a C or C++ interface directly as there's no ways to make those pythonique. They'll want to wrap it up into their own pythonique interfaces, so it's actually a good thing that we're just exporting a bare minimum C API/ABI to make that as easy as possible.

I think a lot of the C++ industry would benefit from doing this more rather than trying to stubbornly ship COM-style interfaces and so forth. It would also make all of our lives easier as users of these dylibs to not have to fuss about with awkward ABIs. C makes it simple, and the simplicity of it from an ABI perspective allows us to create APIs/ABIs that work naturally and with minimalism for all kinds of FFIs.

  • 1
    *"Use C to create those universal ABIs, with C++ for the implementation."*... I do the same, like many others! – Nawaz Jan 04 '16 at 18:58
-1

I don't know why it doesn't standardize on a binary level. But I know what I do about it. On Windows I declare function extern "C" BOOL WINAPI. (Of course replace BOOL with whatever type the function is.) And they are exported cleanly.

mike jones
  • 99
  • 1
  • 2
    But if you declare it `extern "C"`, it will use the C ABI, which a *de facto* standard on common PC hardware even though it's not imposed by any sort of committee. – Billy ONeal Dec 22 '10 at 16:21
-3

Use unzip foo.zip && make foo.exe && foo.exe if you want portability of your source.

Sjoerd
  • 2,906
  • 1
  • 19
  • 18