In C++ why and how are virtual functions slower?

Question

Can anyone explain in detail, how exactly the virtual table works and what pointers are associated when virtual functions are called.

If they are actually slower, can you show the time that the virtual function takes to execute is more than normal class methods? It is easy to lose track of how/what is happening without seeing some code.

Looking up the correct method call from a [vtable](http://en.wikipedia.org/wiki/Virtual_method_table) is obviously going to take longer than calling the method directly, since there is more to do. How much longer, or whether that additional time is significant within the context of your own program, is another question. http://en.wikipedia.org/wiki/Virtual_method_table — Robert Harvey, Mar 22 '13 at 22:47
Slower than what exactly? I've seen code that had a broken, slow implementation of dynamic behavior with lots of switch statements just because some programmer had heard that virtual functions are slow. — Christopher Creutzig, Mar 22 '13 at 23:06
@ChristopherCreutzig Ironically, such `switch` statements might very well be significantly slower than virtual calls. It pays to think and understand... — , Mar 22 '13 at 23:10
Often times, it's not that virtual calls themselves are slow, but that the compiler has no ability to inline them. — Kevin Hsu, Mar 23 '13 at 00:40
@Kevin Hsu: yes this is it absolutely. Almost any time someone tells you they got a speed up from eliminating some "virtual function call overhead", if you look into it where all the speedup has actually come from will be from optimisations which are now possible because the compiler couldn't optimise across the indeterminate call previously. — timday, Mar 23 '13 at 01:26
Even a person who can read the assembly code cannot accurately predict its overhead in actual CPU execution. Desktop-based CPU makers have invested in decades of research in not only branch prediction, but also value prediction and speculative execution for the primary reason of masking the latency of virtual functions. Why? Because desktop OSes and software uses them a lot. (I wouldn't say the same about mobile CPUs.) — rwong, Mar 23 '13 at 04:44
The virtual method doesn't take longer to execute. It does, however, take (slightly) longer to *call*. But once the virtual dispatch has been taken care of, the code of the method executes exactly as fast as any other method. — Mason Wheeler, Apr 21 '15 at 11:48

score 66 · Accepted Answer · edited Jul 21 '22 at 07:35

Virtual methods are commonly implemented via so-called virtual method tables (vtable for short), in which function pointers are stored. This adds indirection to the actual call (gotta fetch the address of the function to call from the vtable, then call it -- as opposed to just calling it right ahead). Of course, this takes some time and some more code.

However, it is not necessarily the primary cause of slowness. The real problem is that the compiler (generally/usually) cannot know which function will be called. So it can't inline it or perform any other such optimizations. This alone might add a dozen pointless instructions (preparing registers, calling, then restoring state afterwards), and might inhibit other, seemingly unrelated optimizations. Moreover, if you branch like crazy by calling many different implementations, you suffer the same hits you'd suffer from branching like crazy via other means: The cache and branch predictor won't help you, the branches will take longer than a perfectly predictable branch.

Big but: These performance hits are usually too tiny to matter. They're worth considering if you want to create a high-performance code and consider adding a virtual function that would be called at alarming frequency. However, also keep in mind that replacing virtual function calls with other means of branching -- if .. else, switch, function pointers, etc. -- won't solve the fundamental issue, and may very well reduce performance. Eliminating unnecessary indirection, whether due to virtual functions or other control flow statements, improves performance.

Edit: The difference in the call instructions is described in other answers. Basically, the code for a static ("normal") call is:

Copy some registers on the stack, to allow the called function to use those registers.
Copy the arguments into predefined locations, so that the called function can find them regardless from where it is called.
Push the return address.
Branch/jump to the function's code, which is a compile-time address and hence hardcoded in the binary by the compiler/linker.
Get the return value from a predefined location and restore registers we want to use.

A virtual call does exactly the same thing, except that the function address is not known at compile time. Instead, a couple of instructions ...

Get the vtable pointer, which points to an array of function pointers (function addresses), one for each virtual function, from the object.
Get the right function address from the vtable into a register (the index where the correct function address is stored is decided at compile-time).
Jump to the address in that register, rather than jumping to a hardcoded address.

As for branches: A branch is anything which jumps to another instruction instead of just letting the next instruction execute. This includes if, switch, parts of various loops, function calls, etc. and sometimes the compiler implements things that don't seem to branch in a way that actually needs a branch under the hood. See Why is processing a sorted array faster than an unsorted array? for why this may be slow, what CPUs do to counter this slowdown, and how this isn't a cure-all.

BTW: I really don't understand why the C++ community so adamantly ignores all of the 5 decades of research that the Lisp, Smalltalk, Java and JavaScript communities have been doing in making virtual dispatch blazingly fast. Stuff like Polymorphic Inline Caching and Speculative Inlining, just the entire field of Adaptive Optimizations in general. — Jörg W Mittag, Mar 23 '13 at 02:12
@JörgWMittag they are all interpreter stuff, and they are still slower than the binary code generated by C++ compilers — Sam, Mar 23 '13 at 07:36
@vasile Nonsense, everything he mentioned is JIT compiler technology, and could be applied to AOT compilers with some effort (worth a paper, but not nearly as hard as inventing it in the first place). While many JIT compilers are married to an interpreter, this is only an optimization (for performance, and sometimes for simplicity). A prominent JIT compiler that features nothing resembling an interpreter whatsoever is V8, it *always* compiles straight to machine code. — , Mar 23 '13 at 10:36
@JörgWMittag These optimizations primarily exist to make indirection/late binding (almost) free *when it isn't needed*, because in those languages *every* call is technically late-bound. If you really do call lots of different virtual methods from one place over a short time, these optimizations don't help or actively hurt (create lots of code for naught). C++ guys aren't very interested in those optimizations because they're in a very different situation ... — , Mar 23 '13 at 10:41
@JörgWMittag ... C++ guys aren't very interested in those optimizations because they're in a very different situation: The AOT-compiled vtable way is already pretty fast, very few calls are actually virtual, many cases of polymorphism are early-bound (via templates) and hence amendable to AOT optimization. Finally, doing these optimizations *adaptively* (instead of just speculating at compile time) requires run-time code generation, which introduces *tons* of headache. JIT compilers have already solved those problems for other reasons, so they don't mind, but AOT compilers want to avoid it. — , Mar 23 '13 at 10:44
" The problem (if it exists at all) isn't virtual functions but (unnecessary) indirection". - virtual functions *are* indirection — James, Mar 23 '13 at 16:58
great answer, +1. One thing to note though is sometimes the results of branching are known at compile time, for example when you write framework classes that need to support different uses but once application code interacts with those classes the specific use is already known. In this case, the alternative to virtual functions, could be C++ templates. Good example would be CRTP, which emulates virtual function behavior without any vtables: http://en.wikipedia.org/wiki/Curiously_recurring_template_pattern — DXM, Mar 23 '13 at 17:26
@James You have a point. What I tried to say is: Any indirection has the same problems, it's nothing specific to `virtual`. — , Mar 23 '13 at 17:57
@delnan thanks. Could you explain a bit more on how the instructions (preparing registers, calling, then restoring state afterwards) work with normal function calls & how they would differ if its a virtual function? Also, I know what the cache is, but I have no idea about branch predictor & the branches stuff you mentioned. — MdT, Mar 23 '13 at 19:45

score 29 · Answer 2 · answered Mar 23 '13 at 00:16

29

Here's some actual disassembled code from a virtual function call and a non-virtual call, respectively:

mov    -0x8(%rbp),%rax
mov    (%rax),%rax
mov    (%rax),%rax
callq  *%rax

callq  0x4007aa

You can see that the virtual call requires three additional instructions to look up the correct address, whereas the address of the non-virtual call can be compiled in.

However, note that most of the time that extra lookup time can be considered negligible. In situations where the lookup time would be significant, like in a loop, the value can usually be cached by doing the first three instructions before the loop.

The other situation where the lookup time becomes significant is if you have a collection of objects and you're looping through calling a virtual function on each of them. However, in that case, you're going to need some means of selecting which function to call anyway, and a virtual table lookup is as good a means as any. In fact, since the vtable lookup code is so widely used it is heavily optimized, so trying to work around it manually has a good chance of resulting in worse performance.

answered Mar 23 '13 at 00:16

Karl Bielefeldt

146,727
38
279
479

1

The thing to understand is that the vtable lookup and indirect call will in almost all cases have negligible impact on the total running time of the method being called. – John R. Strohm Mar 23 '13 at 05:46
17

@JohnR.Strohm One man's negligible is another man's bottleneck – James Mar 23 '13 at 17:00
2

`-0x8(%rbp)`. oh my... that AT&T syntax. – Abyx Mar 24 '13 at 17:22
"_three additional instructions_" no, only two: loading the vptr and loading the function pointer – curiousguy Oct 07 '15 at 05:28
1

@curiousguy it is in fact *three* additional instructions. You have forgotten that a virtual method is always **called on a pointer**, so you have to first load the pointer into a register. To sum up, the very first step is to load the address that the pointer variable holds into register %rax, then according to the address in the register, load the vtpr on this address to register %rax, then according to this address in the register, load the address of the method to be called into %rax, then callq *%rax!. – Gab是好人 Aug 18 '16 at 20:19
@curiousguy You can draw a picture to better illustrate the relations between a obj pointer, the obj's vptr, the vtbl that vptr points to, and the methods that each entry of the vtbl points to. – Gab是好人 Aug 18 '16 at 20:23
"three additional instructions" compared to what? – curiousguy Aug 21 '16 at 02:32
1

@curiousguy 3 additional instructions compared to a non-virtual function call on a object or on a object pointer, using the object variable name, like myObject.method() or myObjectPtr->method() , because in this case, the exact address of the method to be called is already known at compile time, so no layer of indirection is needed. – Gab是好人 Aug 22 '16 at 08:37
@Gab是好人 What you're missing is that if you compare it to a non-virtual function call **on an object** you will in the non-virtual case need to have the pointer to start with too. That is the first instruction in the virtual call will be effectively be included in the non-virtual call too so it will only be two extra instructions needed. – skyking Oct 18 '21 at 13:29

Kaz · Answer 3 · 2015-06-23T19:43:49.003

Slower than what?

Virtual functions solve a problem that cannot be solved by direct function calls. In general, you can only compare two programs which compute the same thing. "This ray tracer is faster than that compiler" doesn't make sense, and this principle generalizes even to small things like individual functions or programming language constructs.

If you don't use a virtual function to dynamically switch to a piece of code based on a datum, such as an object's type, then you will have to use something else, like a switch statement to accomplish the same thing. That something else has its own overheads, plus implications on the organization of the program which influence its maintainability and global performance.

Note that in C++, calls to virtual functions are not always dynamic. When calls are made on an object whose exact type is known (because the object isn't a pointer or reference, or because its type can otherwise be statically inferred) then the calls are just regular member function calls. That not only means that there isn't dispatch overhead, but also that these calls can be inlined in the same way as ordinary calls.

In other words, your C++ compiler can work out when virtual functions do not require virtual dispatch, so there is usually no reason to worry about their performance relative to non-virtual functions.

New: Also, we must not forget shared libraries. If you're using a class which is in a shared library, the call to an ordinary member function will not simply be a nice one instruction sequence like callq 0x4007aa. It has to go through a few hoops, like indirecting through a "program link table" or some such structure. Therefore, shared library indirection could somewhat (if not completely) level the cost difference between (truly indirect) virtual call and a direct call. So reasoning about virtual function tradeoffs must take into account how the program is built: whether the target object's class is monolithically linked into the program which is making the call.

"Slower than what?" - if you make a method virtual that doesn't have to be, you have pretty good comparison material. — tdammers, Mar 23 '13 at 13:03
Thank you for pointing out that calls to virtual functions are not always dynamic. Every other response in here makes it look like declaring a function virtual means an automatic performance hit, regardless of circumstance. — Syndog, Jun 23 '15 at 19:34

ratchet freak · Answer 4 · 2013-03-22T23:20:02.273

14

because a virtual call is equivalent to

res_t (*foo)(arg_t);
foo = (obj->vtable[foo_offset]);
foo(obj,args)

where with a non-virtual function the compiler can constant-fold the first line, this is a dereference an addition and a dynamic call transformed into just a static call

this also lets it inline the function (with all due optimization consequences)

edited Mar 22 '13 at 23:20

answered Mar 22 '13 at 23:06

ratchet freak

25,706
2
62
97

In C++ why and how are virtual functions slower?

4 Answers4

Linked