13

I read that virtual calls make the code slower than calling non-virtual ones in C#. However, the IL instruction for both are the same callvirt except in cases where base.somemethod() is called.

So how does virtual method hurt performance?

Pang
  • 313
  • 4
  • 7
jtkSource
  • 249
  • 1
  • 2
  • 8
  • 1
    This question has been asked many times before: https://www.google.com/search?q=virtual+slower+c%23+site:stackoverflow.com – Greg Hewgill Apr 02 '14 at 00:19
  • 4
    Since the CPU doesn't execute IL, it doesn't matter what the IL uses, it matters what machine code the JIT produces. For virtual methods that's an indirect call, for non virtual methods it's a direct call or the callee even gets inlined into the caller. – CodesInChaos Apr 02 '14 at 09:35

4 Answers4

18

You are making single mistake : thinking that IL code has effect on performance. The thing is, the machine code compiler can still do tons of optimizations based on the IL code itself. And it sure can differentiate between call of virtual and nonvirtual method, even if they use same opcode. This then results in different machine code and different performance.

Here is simple performance benchmark I did:

class ClassA
{
    public int Func1(int a)
    {
        return a + 1;
    }

    public virtual int Func2(int a)
    {
        return a + 2;
    }
}

class ClassB : ClassA
{
    public override int Func2(int a)
    {
        return a + 3;
    }
}

class Program
{
    public static void Main()
    {
        int x = 0;
        ClassA a = new ClassA();
        ClassB b = new ClassB();
        ClassA c = new ClassB();

        Stopwatch watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < 10000000; i++)
        {
            x = x + 1; // no function
        }
        watch.Stop();
        System.Console.WriteLine(watch.ElapsedTicks);

        watch.Restart();
        for (int i = 0; i < 10000000; i++)
        {
            x = a.Func1(x); // non virtual
        }
        watch.Stop();
        System.Console.WriteLine(watch.ElapsedTicks);

        watch.Restart();
        for (int i = 0; i < 10000000; i++)
        {
            x = a.Func2(x); // virtual on a
        }
        watch.Stop();
        System.Console.WriteLine(watch.ElapsedTicks);

        watch.Restart();
        for (int i = 0; i < 10000000; i++)
        {
            x = b.Func2(x); // virtual on b
        }
        watch.Stop();
        System.Console.WriteLine(watch.ElapsedTicks);

        watch.Restart();
        for (int i = 0; i < 10000000; i++)
        {
            x = c.Func2(x); // virtual on B typed as A
        }
        watch.Stop();
        System.Console.WriteLine(watch.ElapsedTicks);

        System.Console.WriteLine(x); // so the compiler doesn't optimize it away
    }
}

When I run it (Release, without debugging, Any CPU, .NET 4.5, x64 Windows 8.1) I get those results:

9904
8921
56921
60076
58289

First interesting thing is that calling the non-virtual method is no different than not using method at all. This clearly shows the compiler is inlining the method's code in place so there is no function call overhead. Second clearly shows that virtual methods don't have this inlining and add need of additional lookup. Making them much slower than non-virtual method calls.

But don't get the wrong idea. In the grand scheme of things, overhead caused by virtual methods is negligible compared to algorithm overhead and cache misses due to memory access.

Euphoric
  • 36,735
  • 6
  • 78
  • 110
  • 2
    I wonder why the downvotes. Is there something wrong with my benchmark? – Euphoric Apr 02 '14 at 11:37
  • I didn't downvote, but this doesn't show how virtual calls are slower because the clr is inlining ? – jtkSource Apr 02 '14 at 13:53
  • 2
    @jtkSource The point is that virtual method calls cannot be inlined and that virtual methods require one more lookup to call. Even if IL contains callvirt, the machine code compiler can still figure out the method is not really virtual and can inline it. – Euphoric Apr 02 '14 at 15:09
  • 4
    The selected answer is fine but I found this more helpful... thx – Frank V May 16 '14 at 17:34
  • @I also find this benchmark helpful – Dave Feb 27 '15 at 23:23
  • Can you clarify how the machine code compiler can differentiate between virtual and non-virtual methods? I don't see any indication of a difference in the accepted answer's CIL. – Jeroen Vannevel Apr 12 '15 at 18:22
  • @JeroenVannevel The example is call site. It differentiates it based on how the method itself is actually defined. – Euphoric Apr 12 '15 at 19:52
17

As pointed out here, calling callvirt is slower, but by a very small margin. I'm not sure how you're getting the CIL code, but as Eric Gunnerson points out, .NET never uses call for instance classes. It always uses callvirt, and he even states in a follow-up post that the difference in performance impact is minimum.

Just a very quick test, where all classes have a void PrintTest() method (which only prints a message to the console)...

  1. BaseVirtual is a base class with the method defined as virtual.
  2. DerivedVirtual and DerivedVirtual2 use override to redefine the virtual method, inheriting from BaseVirtual.
  3. Base is a regular class with a regular instance method (no virtual or sealed).
  4. Seal is a sealed class, just for the kicks.
  5. Stat is a class with the method defined as static.

Here's the main code:

using System;

namespace CSharp
{
    class Program
    {
        static void Main(string[] args)
        {
            BaseVirtual baseVirtual = new BaseVirtual();
            DerivedVirtual derivedVirtual = new DerivedVirtual();
            DerivedVirtual2 derivedVirtual2 = new DerivedVirtual2();
            Base regularBase = new Base();
            Seal seal = new Seal();

            baseVirtual.PrintTest();
            derivedVirtual.PrintTest();
            derivedVirtual2.PrintTest();
            regularBase.PrintTest();
            seal.PrintTest();
            Stat.PrintTest();
        }
    }
}

And, here's the CIL code:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       68 (0x44)
  .maxstack  1
  .locals init ([0] class CSharp.BaseVirtual baseVirtual,
           [1] class CSharp.DerivedVirtual derivedVirtual,
           [2] class CSharp.DerivedVirtual2 derivedVirtual2,
           [3] class CSharp.Base regularBase,
           [4] class CSharp.Seal seal)
  IL_0000:  newobj     instance void CSharp.BaseVirtual::.ctor()
  IL_0005:  stloc.0
  IL_0006:  newobj     instance void CSharp.DerivedVirtual::.ctor()
  IL_000b:  stloc.1
  IL_000c:  newobj     instance void CSharp.DerivedVirtual2::.ctor()
  IL_0011:  stloc.2
  IL_0012:  newobj     instance void CSharp.Base::.ctor()
  IL_0017:  stloc.3
  IL_0018:  newobj     instance void CSharp.Seal::.ctor()
  IL_001d:  stloc.s    seal
  IL_001f:  ldloc.0
  IL_0020:  callvirt   instance void CSharp.BaseVirtual::PrintTest()
  IL_0025:  ldloc.1
  IL_0026:  callvirt   instance void CSharp.BaseVirtual::PrintTest()
  IL_002b:  ldloc.2
  IL_002c:  callvirt   instance void CSharp.BaseVirtual::PrintTest()
  IL_0031:  ldloc.3
  IL_0032:  callvirt   instance void CSharp.Base::PrintTest()
  IL_0037:  ldloc.s    seal
  IL_0039:  callvirt   instance void CSharp.Seal::PrintTest()
  IL_003e:  call       void CSharp.Stat::PrintTest()
  IL_0043:  ret
} // end of method Program::Main

Nothing too fancy, but this shows that only the static method used call. Even so, I don't think this will really make a difference for most applications, so we shouldn't sweat on the small stuff and worry about this.

So, in conclusion, as the SO post states, you may get a performance hit because of the lookup the runtime needs to call the virtual methods. Compared to static methods, you get the overhead of callvirt.

... Well, that was fun...

Pang
  • 313
  • 4
  • 7
ArthurChamz
  • 765
  • 5
  • 17
6

Since myself and others found the @Euphoric benchmark above illuminating, below is an expanded version that adds interfaces, lambdas, delegates and static. As one example, with .Net 4.5.2 and x86 output the result (in million iterations per seconds) consistently grouped into three buckets:

No function                   1,388.9 MOps/s
Non-virtual                   1,479.3 MOps/s
Static                        1,201.9 MOps/s

Virtual via class               456.2 MOps/s
Overridden via class            425.9 MOps/s
Base class                      394.3 MOps/s
Non-virtual via interface       357.4 MOps/s
Virtual via interface           466.0 MOps/s

Lambda                          286.9 MOps/s
Delegate                        277.6 MOps/s

Expanded benchmark:

using System;
using System.Diagnostics;

    interface IClassA
    {
        int Func1(int a);
        int Func2(int a);
    }

    class ClassA : IClassA
    {
        public int Func1(int a)
        {
            return a + 2;
        }

        public virtual int Func2(int a)
        {
            return a + 2;
        }

        public static int StaticFunc(int a)
        {
            return a + 2;
        }
    }

    class ClassB : ClassA
    {
        public override int Func2(int a)
        {
            return a + 2;
        }
    }

    delegate int MyDelegate(int a);

    class Program
    {
        static int forDelegate(int a)
        {
            return a + 2;
        }

        public static void Main()
        {
            MethodCall();
        }

        public static void MethodCall()
        {
            const int loops = 500000000;
            int x = 0;
            ClassA a = new ClassA();
            ClassB b = new ClassB();
            ClassA c = new ClassB();

            Console.WriteLine("Method Call Overhead:");

            Stopwatch watch = new Stopwatch();
            watch.Start();
            for (int i = 0; i < loops; i++)
            {
                x = x + 2;
            }
            watch.Stop();
            Report("No function", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            watch.Restart();
            for (int i = 0; i < loops; i++)
            {
                x = a.Func1(x);
            }
            watch.Stop();
            Report("Non-virtual", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            watch.Restart();
            for (int i = 0; i < loops; i++)
            {
                x = ClassA.StaticFunc(x);
            }
            watch.Stop();
            Report("Static", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            watch.Restart();
            for (int i = 0; i < loops; i++)
            {
                x = a.Func2(x); 
            }
            watch.Stop();
            Report("Virtual via class", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            watch.Restart();
            for (int i = 0; i < loops; i++)
            {
                x = b.Func2(x); 
            }
            watch.Stop();
            Report("Overridden via class", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            watch.Restart();
            for (int i = 0; i < loops; i++)
            {
                x = c.Func2(x); 
            }
            watch.Stop();
            Report("Base class", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            IClassA iClassA = a;
            watch.Restart();
            for (int i = 0; i < loops; i++)
            {
                x = iClassA.Func1(x);
            }
            watch.Stop();
            Report("Non-virtual via interface", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            watch.Restart();
            for (int i = 0; i < loops; i++)
            {
                x = iClassA.Func2(x);
            }
            watch.Stop();
            Report("Virtual via interface", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            Func<int, int> func = l => l + 2;
            watch.Restart();
            for (int i = 0; i < loops; i++)
            {
                x = func(x);
            }
            watch.Stop();
            Report("Lambda", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            MyDelegate myDelegate = forDelegate;
            watch.Restart();
            for (int i = 0; i < loops; i++)
            {
                x = myDelegate(x);
            }
            watch.Stop();
            Report("Delegate", loops, watch.ElapsedMilliseconds);
            x -= 2 * loops;

            System.Console.ReadKey();
            System.Console.WriteLine(x); // so the compiler doesn't optimize it away
        }

        static void Report(string message, int iterations, long milliseconds)
        {
            System.Console.WriteLine(string.Format("{0,-26:} {1,10:N1} MOps/s, {2,7:N3} s", message, (double)iterations / 1000.0 / milliseconds, milliseconds / 1000.0));
        }
    }

Hope it helps!

  • 1
    Nice benchmark, except it may be a bit misleading. The method call overhead for the non-virtual and static case is inocrrect, at least on my computer in a release build because the methods are inlined. You should add the `[MethodImpl(MethodImplOptions.NoInlining)]` attribute to the methods to prevent inlining to actually get the method-call overhead! – DeCaf Nov 02 '15 at 12:22
  • Useful comment, much obliged. Prohibiting inlining brings the speed down to around ~400MOps/s in my case. Other times the inlining is realistic, and it's interesting to see which versions gets inlined or not. – Kristian Wedberg Nov 03 '15 at 21:01
  • Also note that these benchmarks all run from the same (very large) method. Running each test in its own method can certainly change and often improve the result - use whatever mimics your situation better. – Kristian Wedberg Aug 30 '17 at 12:22
  • 3
    Nice idea, but this is not how you should go about benchmarking this stuff. Influence of JITting, warmup, cache hits/misses, GC and other external factors are way bigger than any perceived difference. BenchmarkDotNet is a good way to test these micro-benchmarks as it tries hard to eliminate such side effects and automatically runs it long enough to reach a certain preset fault-margin of the timings. (I know that in 2015 such benchmarking frameworks were rare, but as written, it won't prove anything, unf.) – Abel Jun 11 '20 at 14:40
6

I was curious why Lambdas and Delegates performed so badly in the test of @Kristian Wedberg
I thought perhaps they should be called once before the test starts to make sure they are JITed, as that might distort the time.

In my test on .NET 4.6.1 and x86 the lambda consistently performs about as well as the interface calls, only the delegate was slower.
I then split the delegate test case in 3 to test different ways of creating the delegate:

  1. allocation from a method group (instance method)
  2. allocation from a method group (static method)
  3. allocation from a lambda

Results:

No function                   1.404,5 MOps/s,   0,356 s
Non-virtual                     429,2 MOps/s,   1,165 s
Static                          432,9 MOps/s,   1,155 s
Virtual via class               427,7 MOps/s,   1,169 s
Overridden via class            399,4 MOps/s,   1,252 s
Base class                      363,6 MOps/s,   1,375 s
Non-virtual via interface       331,3 MOps/s,   1,509 s
Virtual via interface           337,2 MOps/s,   1,483 s
Lambda                          339,4 MOps/s,   1,473 s
Delegate (inst. mthd grp)       338,5 MOps/s,   1,477 s
Delegate (stat. mthd grp)       204,0 MOps/s,   2,451 s
Delegate (lambda)               341,5 MOps/s,   1,464 s

Interestingly, 1) and 3) consistently perform as well as the lambda or interface tests, only 2) is slower (this was the version used in Kristian's test). I'm baffled as to why a static method of all things should be slower.

Here is the modified test:

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;

interface IClassA {
    int Func1(int a);
    int Func2(int a);
}

class ClassA : IClassA {
    [MethodImpl(MethodImplOptions.NoInlining)]
    public int Func1(int a) {
        return a + 2;
    }

    public virtual int Func2(int a) {
        return a + 2;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static int StaticFunc(int a) {
        return a + 2;
    }
}

class ClassB : ClassA {
    public override int Func2(int a) {
        return a + 2;
    }
}

delegate int MyDelegate(int a);

class Program {
    static int staticDelegate(int a) {
        return a + 2;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static Func<int, int> GetFunc() {
        return l => l + 2;
    }

    public static void Main() {
        MethodCall();
    }

    public static void MethodCall() {
        const int loops = 500000000;
        int x = 0;
        ClassA a = new ClassA();
        ClassB b = new ClassB();
        ClassA c = new ClassB();

        Console.WriteLine("Method Call Overhead:");

        Stopwatch watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < loops; i++) {
            x = x + 2;
        }
        watch.Stop();
        Report("No function", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = a.Func1(x);
        }
        watch.Stop();
        Report("Non-virtual", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = ClassA.StaticFunc(x);
        }
        watch.Stop();
        Report("Static", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = a.Func2(x);
        }
        watch.Stop();
        Report("Virtual via class", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = b.Func2(x);
        }
        watch.Stop();
        Report("Overridden via class", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = c.Func2(x);
        }
        watch.Stop();
        Report("Base class", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        IClassA iClassA = a;
        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = iClassA.Func1(x);
        }
        watch.Stop();
        Report("Non-virtual via interface", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = iClassA.Func2(x);
        }
        watch.Stop();
        Report("Virtual via interface", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        Func<int, int> func = GetFunc();
        x += func(0) - 2; // call once to JIT
        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = func(x);
        }
        watch.Stop();
        Report("Lambda", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        MyDelegate myDelegate;

        myDelegate = a.Func1;
        x += myDelegate(0) - 2; // call once to JIT
        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = myDelegate(x);
        }
        watch.Stop();
        Report("Delegate (inst. mthd grp)", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        myDelegate = Program.staticDelegate;
        x += myDelegate(0) - 2; // call once to JIT
        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = myDelegate(x);
        }
        watch.Stop();
        Report("Delegate (stat. mthd grp)", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        myDelegate = new MyDelegate(i => i + 2);
        x += myDelegate(0) - 2; // call once to JIT
        watch.Restart();
        for (int i = 0; i < loops; i++) {
            x = myDelegate(x);
        }
        watch.Stop();
        Report("Delegate (lambda)", loops, watch.ElapsedMilliseconds);
        x -= 2 * loops;

        Console.ReadKey();
        Console.WriteLine(x); // so the compiler doesn't optimize it away
    }

    static void Report(string message, int iterations, long milliseconds) {
        Console.WriteLine(string.Format("{0,-26:} {1,10:N1} MOps/s, {2,7:N3} s", message, (double)iterations / 1000.0 / milliseconds, milliseconds / 1000.0));
    }
}
enzi
  • 228
  • 2
  • 6