What backs up the claim that C++ can be faster than a JVM or CLR with JIT?

Question

A reoccurring theme on SE I've noticed in many questions is the ongoing argument that C++ is faster and/or more efficient than higher level languages like Java. The counter-argument is that modern JVM or CLR can be just as efficient thanks to JIT and so on for a growing number of tasks and that C++ is only ever more efficient if you know what you're doing and why doing things a certain way will merit performance increases. That's obvious and makes perfect sense.

I'd like to know a basic explanation (if there is such a thing...) as to why and how certain tasks are faster in C++ than the JVM or CLR? Is it simply because C++ is compiled into machine code whereas the JVM or CLR still have the processing overhead of JIT compilation at run time?

When I try to research the topic, all I find is the same arguments I've outlined above without any detailed information as to understanding exactly how C++ can be utilized for high-performance computing.

Start with as low a level as possible. E.g., with Verilog, learn how CPUs and memories are designed, then go up to assembly, then C, then C++. — SK-logic, Aug 03 '12 at 15:59
I'd add to "C++ is only ever more efficient if you know what you're doing and why doing things a certain way will merit performance increases." by saying that it's not only a matter of knowledge, it's a matter of developer time. It's not always efficient to maximize optimization. This is why higher level languages such as Java and Python exist (among other reasons)--to decrease the amount of time a programmer has to spend programming to accomplish a given task at the expense of highly tuned optimization. — Joel Cornett, Aug 03 '12 at 20:26
@Joel Cornett: I totally agree. I am definitely more productive in Java than in C++ and I only consider C++ when I need to write really fast code. On the other hand I have seen poorly written C++ code being really slow: C++ is less useful in the hands of unskilled programmers. — Giorgio, Aug 04 '12 at 09:35
Any compilation output that can be produced by a JIT can be produced by C++, but code that C++ can produce may not necessarily be produced by a JIT. So the capabilities and performance characteristics of C++ are a superset of those of any higher-level language. *Q.E.D* — tylerl, Aug 05 '12 at 06:17
possible duplicate of [Is there any reason to use C++ instead of C, Perl, Python, etc.?](http://programmers.stackexchange.com/questions/29109/is-there-any-reason-to-use-c-instead-of-c-perl-python-etc) — Konrad Rudolph, Aug 05 '12 at 09:07
@tylerl JIT compilation can make optimizations that wouldn't be safe during ahead-of-time compilation. A JIT compiler can speculate using runtime information, and if any of its assumptions turn out to be wrong it still has the option of recompiling the code. So no, AOT compilation can't produce any compilation output of a JIT. — Doval, Mar 22 '14 at 05:35
@Doval Technically true, but as a rule you can count the possible runtime factors affecting a program's performance on one hand. Usually without using more than two fingers. So worst case you ship multiple binaries... except it turns out that you don't even need to do that because the potential speedup is negligible, which is why nobody even bothers. — tylerl, Mar 22 '14 at 05:56

score 205 · Accepted Answer · edited Aug 06 '12 at 18:39

205

It's all about the memory (not the JIT). The JIT 'advantage over C' is mostly limited to optimizing out virtual or non-virtual calls through inlining, something that the CPU BTB is already working hard to do.

In modern machines, accessing RAM is really slow (compared to anything the CPU does), which means applications that use the caches as much as possible (which is easier when less memory is used) can be up to a hundred times faster than those that don't. And there are many ways in which Java uses more memory than C++ and makes it harder to write applications that fully exploit the cache:

There is a memory overhead of at least 8 bytes for each object, and the use of objects instead of primitives is required or preferred in many places (namely the standard collections).
Strings consist of two objects and have an overhead of 38 bytes
UTF-16 is used internally, which means that each ASCII character requires two bytes instead of one (the Oracle JVM recently introduced an optimizaion to avoid this for pure ASCII strings).
There is no aggregate reference type (i.e. structs), and in turn, there are no arrays of aggregate reference types. A Java object, or array of Java objects, has very poor L1/L2 cache locality compared to C-structs and arrays.
Java generics use type-erasure, which has poor cache locality compared to type-instantiation.
Object allocation is opaque and has to be done separately for each object, so it is impossible for an application to deliberately lay out its data in a cache-friendly way and still treat it as structured data.

Some other memory- but not cache-related factors:

There is no stack allocation, so all non-primitive data you work with has to be on the heap and go through garbage collection (some recent JITs do stack allocation behind the scenes in certain cases).
Because there are no aggregate reference types, there is no stack passing of aggregate reference types. (Think efficient passing of Vector arguments)
Garbage collection can hurt L1/L2 cache contents, and GC stop-the-world pauses hurt interactivity.
Converting between data types always requires copying; you cannot take a pointer to a bunch of bytes you got from a socket and interpret them as a float.

Some of these things are tradeoffs (not having to do manual memory management is worth giving up a lot of performance for most people), some are probably the result of trying to keep Java simple, and some are design mistakes (though possibly only in hindsight, namely UTF-16 was a fixed length encoding when Java was created, which makes the decision to choose it a lot more understandable).

It's worth noting that many of these tradeoffs are very different for Java/JVM than they are for C#/CIL. The .NET CIL has reference-type structs, stack allocation/passing, packed arrays of structs, and type-instantiated generics.

edited Aug 06 '12 at 18:39

Community

1

answered Aug 03 '12 at 15:18

Michael Borgwardt

51,037
13
124
176

38

+1 -- overall, this is a good answer. However, I'm not sure the "there is no stack allocation" bullet point is entirely accurate. Java JITs often do escape analysis to allow for stack allocation where possible -- perhaps what you should say is that the Java language doesn't allow the programmer to decide when an object is stack-allocated versus heap-allocated. Additionally, if a generational garbage collector (which all modern JVMs use) is in use, "heap allocation" means a completely different thing (with completely different performance characteristics) than it does in a C++ environment. – Daniel Pryden Aug 03 '12 at 19:20
5

I would think there are two other things but I mostly work with stuff at a much higher level so tell if I'm wrong. You can't really write C++ without developing some more general awareness of what's actually happening in memory and how machine code actually works whereas scripting or virtual machine languages abstract all that stuff away from your attention. You also have much more fine-grained control over how things work whereas in a VM or interpreted language you're relying on what core library authors may have optimized for an overly specific scenario. – Erik Reppen Aug 03 '12 at 21:10
18

+1. One more thing I'd add (but am not willing to submit a new answer for): array indexing in Java always involves bounds checking. With C and C++, this is not the case. – riwalk Aug 03 '12 at 21:28
7

It's worth noting that Java's heap allocation is significantly faster than a naive version with C++ (due to internal pooling and things), but memory allocation in C++ *can* be significantly better if you know what you're doing. – Brendan Long Aug 03 '12 at 21:52
2

@Stargazer712 Much like the stack allocation optimization, the JIT compiler can optimize away bounds checking if it can prove the access is valid. – stonemetal Aug 03 '12 at 22:36
10

@BrendanLong, true.. but only if the memory is clean - once an app is running for a while, memory allocation will be slower due to the need to GC which slows things down dramatically as it has to free memory, run finalisers and then compact. Its a trade off that benefits benchmarks but (IMHO) overall slows down apps. – gbjbaanb Aug 03 '12 at 22:43
2

@gbjbaanb GC with finalizers is substantially different (and slower) than GC without at least on the JVM. Modern tracing GCs (without finalizers) tend to provide *better* performance than malloc and free most of the time since 1. memory is allocated by increasing a pointer (no data structure traversals) 2. less overhead from management strategies like reference counting 3. you only ever "free" giant blocks of memory. This is to say, you have to work hard to get the performance boost from C++ and even in the best of cases it mostly is from having less indirection. – Philip JF Aug 04 '12 at 03:29
1

@Stargazer712 "array indexing in Java always involves bounds checking." this is a good thing IMHO . – Geek Aug 04 '12 at 08:43
2

Very complete answer, but to a slightly different question. Non of this applies to JIT in general (e.g. all of those features exist in .NET). – skolima Aug 04 '12 at 10:18
1

@skolima: My impression is that the OP only asked specifically about JIT because he's under the (mistaken) impression that that's the main reason why C++ can be faster when it is in fact largely irrelevant. – Michael Borgwardt Aug 04 '12 at 10:37
2

Actually, using a JIT is relevant to a degree. It's certainly true that C and C++ compilers use an intermediate language too (see for example [LLVM in Clang](http://llvm.org/). However, a JIT is expected to compile stuff on the users machine while the user is waiting, which means that some compiler optimizations aren't practical for a JIT that are perfectly valid for a traditional compiler. Some of those can be done in the translation to VM code, but not all, and not necessarily as effectively. – Aug 04 '12 at 16:51
@Steve314: Modern JIT compilers compile only the most frequently executed parts of the code while those where speed is irrelevant or those that are being compiled run in interpreted mode. That means that there aren't really any optimizations that are impractical, especially since Java and C# are many times easier to compile than C++. – Michael Borgwardt Aug 04 '12 at 17:56
3

You will find extreme examples of optimizations that can be supported by a traditional compiler, but not by a JIT, in e.g. Haskell (so not exactly a traditional language, but that's beside the point). These optimizations require a deep analysis of the code, and they take time and resources you don't want to spend in a JIT. Simple rule - more sophisticated optimizations = more time spent optimizing. Choosing not to compile some things saves some resources, but not enough to make up for any possible amount of that. Having an interpreter *and* a compiler alongside the app is an overhead too. – Aug 04 '12 at 18:41
But be happy - a JIT can tune more for the specific machine it's on and based on profiling. If I seemed to be saying "JIT is inferior", that's not the case - it's a tradeoff, and one where JITs are often the overall winners. I was just making a specific point, not writing a pros-vs-cons analysis, is all. – Aug 04 '12 at 18:43
1

I'm not 100% certain which JVMs have this improvement, but at least [some](http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html) provide the option `-XX:+UseCompressedStrings` which uses a `byte[]` instead of a `char[]` for pure-ASCII `String`s. – Louis Wasserman Aug 06 '12 at 14:05
This is interesting, but it seems to be more of an answer about why *Java* is slow than why *garbage collection* or *JIT* is slow. – Aaronaught Aug 06 '12 at 21:26
`"Java generics use type-erasure, which has poor cache locality compared to type-instantiation."` Do you have a link to back that up? (and most of the other statements). I don't see the actual causation here. – haylem Aug 08 '12 at 12:09
1

Escape analysis is not stack allocation. There is no way to allocate objects on the stack in the definition of the VM. And you can't expect any compiler to do reverse it from the bytecode, JIT compiler are not very smart. And usually a profile guided C/C++ run is now much more efficient then the hotspot analysis of Java. – Lothar Dec 28 '15 at 05:26
@Lothar: you are wrong. Java JIT compilers are extremely smart and have been able to do stack allocation of objects for well over 5 years now: http://www-01.ibm.com/support/docview.wss?uid=swg1IZ70114 – Michael Borgwardt Dec 28 '15 at 13:22
1

@Lothar: As an aside, C++ compilers are allowed to not allocate on the heap too. – Deduplicator Jan 27 '16 at 16:14
From my experience, C++ is not faster than Java in general, and I would even claim that Java is faster in most of the cases, in practice. – Nov 17 '16 at 15:22
Not sure if JVMs got better since this answer was posted, but on the Language BenchmarkGame page at least Java and C# seem to be very close: http://benchmarksgame.alioth.debian.org/u64q/csharp.html – Aug 18 '17 at 02:48
@MaxB: the question is about C++, not C# – Michael Borgwardt Aug 18 '17 at 19:02
I'm referring to the last paragraph in your answer that compares JVM and .NET. – Aug 18 '17 at 21:16
You *can* “take a pointer to a bunch of bytes you got from a socket and interpret them as a float” since JDK1.4, see [`ByteBuffer.asFloatBuffer()`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/nio/ByteBuffer.html#asFloatBuffer()) This is not copying but re-interpreting the data. The underlying byte buffer may be a wrapper around a `byte[]` array or a memory segment outside the heap. In either case, you tell the socket to receive the data into that buffer. – Holger Feb 13 '23 at 17:01

score 69 · Answer 2 · edited Apr 12 '17 at 07:31

69

Is it simply because C++ is compiled into assembly/machine code whereas Java/C# still have the processing overhead of JIT compilation at runtime?

Partially, but in general, assuming an absolutely fantastic state-of-the-art JIT compiler, proper C++ code still tends to perform better than Java code for TWO main reasons:

1) C++ templates provide better facilities for writing code that is both generic AND efficient. Templates provide the C++ programmer with a very useful abstraction that has ZERO runtime overhead. (Templates are basically compile-time duck-typing.) In contrast, the best you get with Java generics is basically virtual functions. Virtual functions always have a runtime overhead, and generally can't be inlined.

In general, most languages, including Java, C# and even C, make you choose between efficiency and generality/abstraction. C++ templates give you both (at the cost of longer compile times.)

2) The fact that the C++ standard doesn't have much to say about the binary layout of a compiled C++ program gives C++ compilers much more leeway than a Java compiler, allowing for better optimizations (at the cost of more difficulty in debugging sometimes.) In fact, the very nature of the Java language specification enforces a performance penalty in certain areas. For example, you can't have a contiguous array of Objects in Java. You can only have a contiguous array of Object pointers (references), meaning that iterating over an array in Java always incurs the cost of indirection. The value semantics of C++ however, enable contiguous arrays. Another difference is the fact that C++ allows objects to be allocated on the stack, whereas Java does not, meaning that, in practice, since most C++ programs tend to allocate objects on the stack, the cost of allocation is often close to zero.

One area where C++ might lag behind Java is any situation where many small objects need to be allocated on the heap. In this case, Java's garbage collection system will probably result in better performance than standard new and delete in C++ because the Java GC enables bulk deallocation. But again, a C++ programmer can compensate for this by using a memory pool or slab allocator, whereas a Java programmer has no recourse when faced with a memory-allocation pattern that the Java runtime is not optimized for.

Also, see this excellent answer for more information about this topic.

edited Apr 12 '17 at 07:31

Community

1

answered Aug 03 '12 at 15:02

Charles Salvia

7,342
1
35
33

4

You mentioned C# in the templates section. It's worth noting that C# generics are very C++-like (no type erasure) and, therefore, very efficient. Also, the "binary layout of the executable" is irrelevant in the Java world, since JIT means the executed code is not bit-identical to the executable. Perhaps a better way to explain it would be to mention that JIT needs to be fast and therefore the optimizations are limited. – luiscubal Aug 03 '12 at 21:39
3

The GC will perform better for initial allocations, but when the GC has to kick in, it will slow down more than the equivalent C++ : it has to free memory just like C++, but it also has to compact the heap as well. Those little memory moves to compact obviously take time. – gbjbaanb Aug 03 '12 at 22:45
7

Good answer but one minor point: "C++ templates give you both (at the cost of longer compile times.)" I would also add at the cost of larger program size. Might not always be a problem, but if developing for mobile devices, it definitely can be. – Leo Aug 03 '12 at 23:54
9

@luiscubal: no, in this respect, C# generics are very Java-like (in that the same "generic" code path is taken no matter which types are passed through.) The trick to C++ templates is that the code is instantiated once for every type it is applied to. So `std::vector` is a dynamic array designed *just* for ints, and the compiler is able to optimize it accordingly. A C# `List` is still just a `List`. – jalf Aug 04 '12 at 09:52
12

@jalf C# `List` uses an `int[]`, not an `Object[]` like Java does. See http://stackoverflow.com/questions/116988/do-c-sharp-generics-have-a-perfomance-benefit – luiscubal Aug 04 '12 at 12:52
1

Fair point, bad example then. It would have made more sense to use two reference types. I am aware that primitive types avoid being boxed in C# generics. My point was that the same code path is taken regardless of type (and I believe that is *mostly* true for primitive types as well). C++ templates are effectively expanded at compile-time, yielding completely separate data structure for *every* type. (For better or worse. In addition to giving the compiler room for aggressive optimization, it can easily cost you in code size) – jalf Aug 04 '12 at 13:15
1

@user946850: it'd be more accurate to say that virtual functions can't *always* be inlined. Sometimes they can. But assume a naive compiler comes across a call to a function `foo()` on an object of type `B`. If `foo` is not virtual, then it is known at compile-time which function is to be called (`B::foo()`), so it is trivial to inline. If `foo` is virtual, then the function being called *might* be `B::foo()`, but it could also be `D::foo()`, the overridden function in a derived class. We know which function to call at runtime, but at compile-time it's not clear – jalf Aug 04 '12 at 13:17
Of course, in simple cases, we might be able to determine which type the object actually is, so we know which `foo` to call, so it can be inlined even if it is virtual. But in the general case, you don't know which function is called until the code is executed. And by then, it is too late to inline it. ;) – jalf Aug 04 '12 at 13:18
1

@jalf What makes you think C# generics aren't expanded at compile-time? Again, since C# is JIT compiled, the executed code is not bit-identical to the executable. See http://stackoverflow.com/questions/5342345/how-do-generics-get-compiled-by-the-jit-compiler – luiscubal Aug 04 '12 at 14:31
5

@luiscubal: your terminology is unclear. The JIT doesn't act at what I'd consider to be "compile-time". You're right, of course, given a sufficiently clever and aggressive JIT compiler, there's effectively no limits to what it could do. But C++ *requires* this behavior. Further, C++ templates allow the programmer to specify explicit specializations, enabling additional explicit optimizations where applicable. C# has no equivalent for that. For example, in C++, I could define a `vector` where, for the specific case of `vector<4>`, my hand-coded SIMD implementation should be used – jalf Aug 04 '12 at 15:17
1

@jalf `JIT compilation` is, by definition, a type of compilation, so I think the time of JIT execution could be reasonably called compile-time. Languages like C# and Java have effectively *two* compile times. Looking at the link I provided, generics *might* have some effect on the inlining of virtual calls, but not much more. Most other optimizations should run perfectly fine even with generics, since .NET JITers essentially do the same as C++ compilers do. However, I guess you're right about explicit specializations. – luiscubal Aug 04 '12 at 15:35
It's true that C++ templates are expanded at compile-time instead of using a run-time polymorphism mechanism, but this isn't always the most efficient way. Run-time polymorphism can allow more code-sharing. In fact, one technique (more in the past) use a template as a thin wrapper around pointer-arithmetic-based unsafe code, which means that the programmer has to manually implement the run-time polymorphism mechanism if it's needed. Luckily, the need for this is rare - some architecture astronauts probably end up doing it when it's *less* efficient, as I have in the past. – Aug 04 '12 at 17:00
What I'd love to see is an easy way to choose between these mechanisms case-by-case, so you don't have to spend time coding a pointer-arithmetic mess just to test which is better for your app, and don't get so overcommitted that you can't stand to throw it away. – Aug 04 '12 at 17:03
1

modern JVMs can allocate memory much faster than C++ and manage memory in a much more time efficient manner sacrificing space for time. There is no reason for the JVM to try and maintain as much free space as possible all the time when it can free space so quickly and only when it need to release it when there is pressure. So even more time is captured back by not freeing memory unless there is pressure to do so. – Aug 04 '12 at 22:42
This comment discussion will be purged soon, please move essential comments into the answer, and if you really need to continue the discussion take it in chat. – yannis Aug 05 '12 at 00:17
5

@Leo: Code bloat through templates was a problem 15 years ago. With heavy templatization and inlining, plus abilities compilers picked up since (like folding identical instances), lots of code gets _smaller_ through templates nowadays. – sbi Aug 07 '12 at 09:09

sbi · Answer 3 · 2015-02-22T20:21:31.747

What the other answers (6 so far) seem to have forgotten to mention, but what I consider very important for answering this, is one of C++' very basic design philosophies, which was formulated and employed by Stroustrup from day #1:

You don't pay for what you don't use.

There are some other important underlying design principles that greatly shaped C++ (like that you shouldn't be forced into a specific paradigm), but You don't pay for what you don't use is right there among the most important ones.

In his book The Design and Evolution of C++ (usually referred to as [D&E]), Stroustrup describes what need he had that made him come up with C++ in the first place. In my own words: For his PhD thesis (something to do with network simulations, IIRC), he implemented a system in SIMULA, which he liked a lot, because the language was very good in allowing him to express his thoughts directly in code. However, the resulting program ran way too slow, and in order to get a degree, he rewrote the thing in BCPL, a predecessor of C. Writing the code in BCPL he describes as a pain, but the resulting program was fast enough to deliver results, which allowed him to finish his PhD.

After that, he wanted a language that would allow to translate real-world problems into code as directly as possible, but also allow the code to be very efficient.
In pursue of that, he created what would later become C++.

So the goal quoted above isn't merely one of several fundamental underlying design principle, it's very close to the raison d'etre for C++. And it can be found just about everywhere in the language: Functions are only virtual when you want them to (because calling virtual functions comes with a slight overhead) PODs are only initialized automatically when you explicitly request this, exceptions only cost you performance when you actually throw them (whereas it was an explicit design goal to allow the setup/cleanup of stackframes to be very cheap), no GC running whenever it feels like it, etc.

C++ explicitly chose to not to give you some conveniences ("do I have to make this method virtual here?") in exchange for performance ("no, I don't, and now the compiler can inline it and optimize the heck out of the whole thing!"), and, not surprisingly, this indeed resulted in performance gains compared to languages that are more convenient.

**You don't pay for what you don't use.** => and then they added RTTI :( — Matthieu M., Aug 05 '12 at 09:49
Very good principle also valid for start-up time of CLI programs. If some basic tool starts 10 seconds and forces heavy memory paging, then the user pays for what he doesn't use. — Maksee, Aug 05 '12 at 11:31
@Matthieu: While I understand your sentiment, I can't help but notice that even that has been added with care regarding performance. RTTI is specified so that it can be implemented using virtual tables, and thus adds very little overhead if you don't use it. If you don't use polymorphism, there is no cost at all. Am I missing something? — sbi, Aug 05 '12 at 13:11
@sbi: you are right, but there is a reason that compilers such as gcc and clang propose a non-standard compliant switch to deactivate it :) — Matthieu M., Aug 06 '12 at 06:18
@Matthieu: Of course, there's reason. But is this reason rational? From what I can see, the "cost of RTTI", if not used, is an additional pointer in every polymorphic class' virtual table, pointing at some RTTI object statically allocated somewhere. Unless you want to program the chip in my toaster, how could this ever be relevant? — sbi, Aug 06 '12 at 07:15
This answer is purely rhetorical, it does not introduce any real evidence. The OP didn't ask for a history lesson. — Aaronaught, Aug 06 '12 at 21:28
@Aaronaught: I am at a loss as to what to reply to that. Did you really just dismiss my answer because it points out the underlying philosophy that made Stroustrup et al add features in a way that allows for performance, rather than listing these ways and features individually? — sbi, Aug 07 '12 at 09:11
Yes, I really did. The question asked for evidence, not philosophy. This is a poor answer. — Aaronaught, Aug 08 '12 at 01:02

score 29 · Answer 4 · edited Jan 27 '16 at 14:46

29

Do you know the Google research paper about that topic?

From the conclusion:

We find that in regards to performance, C++ wins out by a large margin. However, it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.

This is at least a partially explanation, in the sense of "because real world C++ compilers produce faster code than Java compilers by empirical measures".

edited Jan 27 '16 at 14:46

answered Aug 03 '12 at 15:49

Doc Brown

199,015
33
367
565

4

Besides the memory and cache usage differences, one of the most important one is the amount of optimization performed. Compare how many optimizations GCC/LLVM (and probably Visual C++/ICC) do relative to the Java HotSpot compiler: a lot more, especially regarding loops, eliminating redundant branches and register allocation. JIT compilers usually don't have the time for these aggressive optimizations, even thought they could implement them better using the available run-time information. – Gratian Lup Aug 09 '12 at 08:03
2

@GratianLup: I wonder if that's (still) true with LTO. – Deduplicator Feb 22 '15 at 19:36
2

@GratianLup: Let's not forget profile-guided optimization for C++... – Deduplicator Jan 27 '16 at 16:15

score 23 · Answer 5 · edited Apr 12 '17 at 07:31

This is not a duplicate of your questions, but the accepted answer answers most of your question: A modern review of Java

To sum-up:

Fundamentally, the semantics of Java dictate that it is a slower language than C++.

So, depending on with which other language you compare C++, you might get or not the same answer.

In C++ you have:

Capacity to do smart inlining,
generic code generation that have strong locality (templates)
as small and compact as possible data
opportunities to avoid indirections
predictable memory behaviour
compiler optimizations possible only because of the usage of high level abstractions (templates)

These are the features or side-effects of the language definition that makes it theorically more efficient on memory and speed than any language that:

use indirection massively ("everything is a managed reference/pointer" languages): indirection mean that the CPU have to jump in memory to get the necessary data, increasing the CPU cache failures, which means slowing down the processing - C uses also indirections a lot even if it can have small data as C++;
generate big size objects which members are accessed indirectly: this is a consequence of having references by default, members are pointers so when you get a member you might not get data close to the core of the parent object, again triggering cache misses.
use a garbarge collector: it just makes predictability of performance impossible (by design).

C++ agressive inlining of the compiler reduce or eliminate a lot of indirections. The capacity to generate small set of compact data makes it cache friendly if you don't spread these data all over the memory instead of packed together (both are possible, C++ just let you choose). RAII makes C++ memory behaviour predictable, eliminating a lot of problems in case of real-time or semi-real-time simulations, which require high speed. Locality problems, in general can be summed-up by this: the smaller the program/data, the faster the execution. C++ provide diverse ways to make sure your data are where you want it to be (in a pool, an array, or whatever) and that it is compact.

Obviously, there are other languages that can do the same, but they are just less popular because they don't provide as much abstraction tools as C++, so they are less useful in a lot of cases.

gbjbaanb · Answer 6 · 2012-08-03T23:41:37.673

It is mainly about memory (as Michael Borgwardt said) with a bit of JIT inefficiency added in.

One thing not mentioned is the cache - to use the cache fully, you need your data to be laid out contiguously (ie all together). Now with a GC system, memory is allocated on the GC heap, which is quick, but as memory gets used the GC will kick in regularly and remove blocks that are no longer used and then compact the remaining together. Now apart from the obvious slowness of moving those used blocks together, this means that data you're using may not be stuck together. If you have an array of 1000 elements, unless you allocated them all at once (and then updated their contents rather than deleting and creating new ones- that will be created at the end of the heap) these will become scattered all over the heap, thus requiring several memory hits to read them all into the CPU cache. A C/C++ app will most likely allocate the memory for these elements and then you update the blocks with the data. (ok, there are data structures like a list that behave more like the GC memory allocations, but people know these are slower than vectors).

You can see this in operation simply by replacing any StringBuilder objects with String... Stringbuilders work by pre-allocating memory and filling it, and is a known performance trick for java/.NET systems.

Don't forget that the 'delete old and allocate new copies' paradigm is very heavily used in Java/C#, simply because people are told that memory allocations are really fast due to the GC, and so the scattered memory model gets used everywhere (except for stringbuilders, of course) so all your libraries tend to be wasteful of memory and use a lot of it, none of which gets the benefit of contiguity. Blame the hype around GC for this - they told you memory was free, lol.

The GC itself is obviously another perf hit - when it runs, it not only has to sweep through the heap, but it also has to free all unused blocks, and then it has to run any finalisers (though this used to be done separately the next time round with the app halted)(I don't know if it still is such a perf hit, but all docs I read say only use finalisers if really necessary) and then it has to move those blocks into position so the heap is compacted, and update the reference to the new location of the block. As you can see, its a lot of work!

Perf hits for C++ memory comes down to memory allocations - when you need a new block, you have to walk the heap looking for the next free space that is big enough, with a heavily fragmented heap, this is not nearly as fast as a GC's 'just allocate another block on the end' but I think it is not as slow as all the work the GC compaction does, and can be mitigated by using multiple fixed-sized block heaps (otherwise known as memory pools).

There's more... like loading assemblies out of the GAC that requires security checking, probe paths (turn on sxstrace and just look at what it's getting up to!) and general other overengineering that seems to be much more popular with java/.net than C/C++.

Many things you write are not true for modern generational garbage collectors. — Michael Borgwardt, Aug 07 '12 at 08:42
@MichaelBorgwardt such as? I say "the GC runs regularly" and "it compacts the heap". The rest of my answer concerns how application data structures use memory. — gbjbaanb, Feb 19 '14 at 11:46

vaughandroid · Answer 7 · 2012-08-03T14:46:53.900

"Is it simply because C++ is compiled into assembly/machine code whereas Java/C# still have the processing overhead of JIT compilation at runtime?" Basically, yes!

Quick note though, Java has more overheads than just JIT compilation. For example, it does much more checking for you (which is how it does things like ArrayIndexOutOfBoundsExceptions and NullPointerExceptions). The garbage collector is another significant overhead.

There's a pretty detailed comparison here.

score 2 · Answer 8 · answered Aug 05 '12 at 03:25

Bear in mind that the following is only comparing the difference between native and JIT compilation, and doesn't cover the specifics of any particular language or frameworks. There might be legitimate reasons to choose a particular platform beyond this.

When we claim that native code is quicker, we are talking about the typical use case of natively compiled code versus JIT compiled code, where the typical use of a JIT compiled application is to be run by the user, with immediate results (eg, no waiting on compiler first). In that case, I don't think anyone can claim with a straight face, that JIT compiled code can match or beat native code.

Let's assume we have a program written in some language X, and we can compile it with a native compiler, and again with a JIT compiler. Each work flow has the same stages involved, which can be generalized as (Code -> Intermediate Representation -> Machine Code -> Execution). The big difference between to two is which stages are seen by the user and which are seen by the programmer. With native compilation, the programmer sees all but the execution stage, but with the JIT solution, the compilation to machine code is seen by the user, in addition to execution.

The claim that A is faster than B is referring to the time taken for the program to run, as seen by the user. If we assume that both pieces of code perform identically in the Execution stage, we must assume that the JIT work flow is slower to the user, as he must also see the time T of the compilation to machine code, where T > 0. So, for any possibility of the JIT work flow to perform the same as the native work flow, to the user, we must decrease the time of Execution of the code, such that Execution + Compilation to machine code, are lower than only the Execution stage of the native work flow. This means we must optimize the code better in the JIT compilation than in the native compilation.

This however, is rather infeasible, since to perform the necessary optimizations to speed up Execution, we must spend more time in the compiling to machine code stage, and thus, any time we save as a result of the optimized code is actually lost, as we add it to the compilation. In other words, the "slowness" of a JIT based solution is not merely because of added time for the JIT compilation, but the code produced by that compilation performs slower than a native solution.

I'll use an example: Register allocation. Since memory access is some thousands of times slower than register access, we ideally want to use registers wherever possible and have as few memory accesses as we can, but we have a limited number of registers, and we must spill state into memory when we need a register. If we use a register allocation algorithm which takes 200ms to compute, and as a result we save 2ms of execution time - we're not making the best use of time for a JIT compiler. Solutions like Chaitin's algorithm, which can produce highly optimized code are unsuitable.

The role of the JIT compiler is to strike the best balance between compilation time and quality of produced code, however, with a large bias on fast compilation time, since you don't want to leave the user waiting. The performance of the code being executed is slower in the JIT case, as the native compiler is not bound (much) by time in optimising code, so is free to use the best algorithms. The possibility that overall compilation+execution for a JIT compiler can beat only execution time for natively compiled code is effectively 0.

But our VMs are not merely limited to JIT compilation. They employ Ahead-of-time compilation techniques, caching, hot swapping, and adaptive optimizations. So let's modify our claim that the performance is what the user sees, and limit it to the time taken for execution of the program (assume we've AOT compiled). We can effectively make the executing code equivalent to the native compiler (or perhaps better?). A big claim for VMs is that they may be able to produce better quality code then a native compiler, because it has access to more information - that of the running process, such as how often a certain function may be executed. The VM can then apply adaptive optimizations to the most essential code via hot swapping.

There's a problem with this argument though - it assumes that profile-guided optimization and the like is something unique to VMs, which is not true. We can apply it to native compilation too - by compiling our application with profiling enabled, recording the information, and then recompile the application with that profile. It's probably also worth pointing out that code hot swapping is not something that only a JIT compiler can do either, we can do it for native code - although the JIT based solutions for doing this are more readily available, and much easier on the developer. So the big question is: Can a VM offer us some information that native compilation cannot, which can boost the performance of our code?

I can't see it myself. We can apply most of the techniques of a typical VM to native code too - although the process is more involved. Similarly, we can apply any optimizations of a native compiler back to a VM which uses AOT compilation or adaptive optimizations. The reality is that the difference between natively run code, and that run in a VM is not as big as we've been made to believe. They ultimately lead to the same result, but they take a different approach to get there. The VM uses an iterative approach to produce optimized code, where the native compiler expects it from the start (and can be improved with an iterative approach).

A C++ programmer might argue that he needs the optimizations from the get-go, and shouldn't be waiting for a VM to figure out how to do them, if at all. This is probably a valid point with our current technology though, as the current level of optimizations in our VMs is inferior to what native compilers can offer - but that may not always be the case if the AOT solutions in our VMs improve, etc.

score 0 · Answer 9 · answered Aug 04 '12 at 10:47

This article is a summary of a set of blog posts trying to compare speed of c++ vs c# and the issues you have to overcome in both languages to get high performance code. The summary is 'your library matter way more than anything, but if you are in c++ you can overcome that.' or 'modern languages have better libraries and are thus get faster results with lower effort' depending on your philosophical slant.

score 0 · Answer 10 · answered Aug 05 '12 at 03:21

0

I think that the real question here is not "which is faster?" but "which has the best potential for higher performance?". Viewed on those terms, C++ clearly wins out - it's compiled to native code, there's no JITting, it's a lower level of abstraction, etc.

That's far from the full story.

Because C++ is compiled, any compiler optimizations must be done at compile time, and compiler optimizations that are appropriate for one machine may be completely wrong for another. It's also the case that any global compiler optimizations can and will favour certain algorithms or code patterns over others.

On the other hand, a JITted program will optimize at JIT time, so it can pull some tricks that a precompiled program cannot and can make very specific optimizations for the machine it's actually running on and the code that it's actually running. Once you get past the initial overhead of the JIT it has potential in some cases to be faster.

In both cases a sensible implementation of the algorithm and other instances of the programmer not being stupid will likely be far more significant factors, however - for example, it's perfectly possible to write completely brain-dead string code in C++ that will be walloped by even an interpreted scripting language.

answered Aug 05 '12 at 03:21

Maximus Minimus

1,498
10
11

3

"compiler optimizations that are appropriate for one machine may be completely wrong for another" Well, that's not really to blame on the language. Truely performance-critical code can be compiled separately for each machine it will run on, which is a no-brainer if you compile locally from source (`-march=native`). — "it's a lower level of abstraction" isn't really true. C++ uses just as high-level abstractions as Java (or, in fact, higher ones: functional programming? template metaprogramming?), it just implements the abstractions less "cleanly" than Java does. – leftaroundabout Aug 05 '12 at 11:00
"Truely performance-critical code can be compiled separately for each machine it will run on, which is a no-brainer if you compile locally from source" - this fails because of an underlying assumption that the end-user is also a programmer. – Maximus Minimus Aug 05 '12 at 16:29
Not necessarily the end user, just the person responsible for installing the program. On the desktop and mobile devices, that typically _is_ the end user, but these aren't the only applications there are, certainly not the most performance-critical ones. And you don't really need to be a _programmer_ to build a program from source, if it has properly written build scripts like all good free / open software projects do. – leftaroundabout Aug 05 '12 at 16:40
1

While in theory yes, a JIT can pull more tricks than a static compiler, in practice (for .NET at least, I don't know java as well), it doesn't actually do any of this. I've done a bunch of dissassembly of .NET JIT code recently, and there's all sorts of optimizations like hoisting code out of loops, dead code elimination, etc, that the .NET JIT simply does not do. I wish it would, but hey, the windows team inside microsoft has been trying to kill .NET for years, so I'm not holding my breath – Orion Edwards Aug 05 '12 at 21:29

superM · Answer 11 · 2012-08-06T06:31:09.733

-1

JIT compilation actually has a negative impact on performance. If you design a "perfect" compiler and a "perfect" JIT compiler, the first option will always win in performance.

Both Java and C# are interpreted into intermediate languages, and then compiled to native code at runtime, which reduces the performance.

But now the difference is not that obvious for C#: Microsoft CLR produces different native code for different CPUs, thus making the code more efficient for the machine its running on, which isn't always done by C++ compilers.

P.S. C# is written very efficently and it hasn't many abstraction layers. This isn't true for Java, which isn't that efficient. So, in this case, with its greate CLR, C# programs often show better performance than C++ programs. For more about .Net and CLR take a look at Jeffrey Richter's "CLR via C#".

edited Aug 06 '12 at 06:31

answered Aug 03 '12 at 14:36

superM

7,363
4
29
38

8

If JIT actually had a negative impact on performance, surely it would not be used? – Zavior Aug 03 '12 at 14:39
1

This part of the concept I understand. My question could probably have been better worded to something like "is the only performance difference between compiled and intermediate languages the fact that there is no JIT compiler overhead in compiled languages, if not, what are the other contributors?". Though the simple answer to that question i'm assuming is 'No' - and that's what I'm interested in finding out - what other factors contribute to performance deviations. – Anonymous Aug 03 '12 at 14:40
@Zavior, JIT has MANY advantages. People at Microsoft and Sun/Oracle aren't that stupid to create something new if it has no valuable advantages. – superM Aug 03 '12 at 14:41
2

@Zavior - I can't think of a good answer to your question, but I don't see how JIT can't add extra performance overhead - the JIT is an extra process to be completed at run time that requires resources that aren't being spent on execution of the program itself, whereas a fully compiled language is 'ready to go'. – Anonymous Aug 03 '12 at 14:43
1

@Anonymous, I can't say for sure, but Java has many abstraction layers. By this I mean that the string, for example, doesn't turn immediately to call native-api. C++ and C# don't have this folly. – superM Aug 03 '12 at 14:44
@Anon I agree that it has extra overhead, but I guess the gains make up for it. Dont have any any sources to back it up, would make sense though! – Zavior Aug 03 '12 at 14:45
3

JIT has a **positive** effect on performance, not a negative, if you put it into context -- It's compiling **byte** code into machine code before running it. The results can also be cached, allowing it to run faster than equivalent byte-code that is interpreted. – Casey Kuball Aug 03 '12 at 14:56
1

@Darthfett: it has a positive effect compared to interpreting, but if you already have a precompiled binary optimized to the current platform, there isn't much a JIT compiler can do to beat it, except adjusting its optimization parameters based on actual usage and recompiling on the fly. – tdammers Aug 03 '12 at 15:03
3

JIT (or rather, the bytecode approach) isn't used for performance, but for convenience. Instead of pre-building binaries for each platform (or a common subset, which is sub-optimal for every one of them), you compile only halfway and let the JIT compiler do the rest. 'Write once, deploy anywhere' is why it's done this way. The convenience can be had with just a bytecode interpreter, but JIT does make it faster than the raw interpreter (though not necessarily fast enough to beat a pre-compiled solution; JIT compilation *does* take time, and the result doesn't always make up for it). – tdammers Aug 03 '12 at 15:06
4

@Tdammmers, actually there is a performance component too. See http://java.sun.com/products/hotspot/whitepaper.html. Optimizations can include things like dynamic adjustments to improve branch prediction and cache hits, dynamic inlining, de-virtualization, disabling of bounds checking, and loop unrolling. The claim is that in many cases these can more than pay for the cost of JIT. – Charles E. Grant Aug 03 '12 at 16:54
1

Why are you linking to Sun for performance claims about their own products? And why are guys even trying to argue Java might be up to C++ standards for performance? It is simply not possible to have that fine-grained level of control over performance when you're writing to an abstraction that translates to multiple platforms. – Erik Reppen Aug 03 '12 at 22:13
@Zaviour: JIT provides benefits other than performance. The inventors assumed that everyone would have faster and faster CPUs that made the perf issues with JIT insignificant. They were partly right - my quad 3ghz CPU is good enough, but that doesn't mean it's as fast as a native binary, as phone and cloud devs are finding out. They also assumes memory would become cheaper and cheaper, but again - it isn't used as efficiently as native binaries use it. Its a big trade-off between dev productivity and hardware efficiency which sometimes is a trade-off worth having. – gbjbaanb Aug 03 '12 at 22:55
1

@tdammers: did you ever hear of HP's [Project Dynamo](http://www.hpl.hp.com/techreports/1999/HPL-1999-77.html)? They demonstrated that they could improve the performance of a native, statically optimized program binary, by using JIT techniques to modify the code during execution, effecting compiling machine language to more optimized machine language. – Carson63000 Aug 03 '12 at 23:58
1

@Carson63000: Yes, of course this is possible - when you distribute binaries, you can't optimize for one particular target; this is what JIT compilers are good at - adapting themselves to a particular platform. I'm pretty sure however that if you *could* make builds for individual machines, a precompiled binary would be just as fast as a JIT-compiled one, and it would have the additional advantage of not having to run the JIT compiler. With source distributions, this approach is actually being used; a typical configure script probes the system and adjusts compiler settings accordingly. – tdammers Aug 04 '12 at 12:54
@ErikReppen the statement "an abstraction that translates to multiple platforms" also fits c++, so by your own statement c++ is as slow as java. In reality java is just more restrictive about the behavior of its abstract machine, which makes it impossible to write those nice non portable speedups in java. – josefx Aug 04 '12 at 19:39
2

@superM which CLR are you talking about? While the .NET JIT compiler is often pretty good, there's lots of optimizations it simply doesn't seem to do. That, plus the fact that every object in .NET no matter how small consumes at least 12 bytes, means I've yet to see a .NET program run faster than a C++ one – Orion Edwards Aug 05 '12 at 21:31
@Orion Edwards, yes. That's what I say, and also I described why this is so in a few words. For more info, please the the book I've added to my answer. – superM Aug 06 '12 at 07:49

What backs up the claim that C++ can be faster than a JVM or CLR with JIT?

11 Answers11

Linked

Related