7

Any idea why the C# version of sqrt (System.Math.Sqrt) is ~10 times slower than c++ version ? Furthermore, C# version seems to have one extra digit of precision. I have run my test under MSVC2012.

I have used double and call System.Math.Sqrt once before doing the bench in order to force Jit

Guillaume Paris
  • 171
  • 1
  • 5
  • 5
    It's managed code. Having it run significantly slower--even with a JIT--is to be expected, although a figure of 10X is a bit surprising. – Mason Wheeler Dec 24 '12 at 18:42
  • 1
    @mason that doesn't really make sense. Such functions is exactly where JIT can compete with native code, where precise memory control is not imperative for performance. – Max Dec 24 '12 at 18:55
  • 20
    Can you post your benchmark code? – Trevor Pilley Dec 24 '12 at 18:57
  • my code is trivial and peer reviewed, I don't have it here, I have done it today at my work sorry. I could post it by Thursday. I have used Stopwatch Class (System.Diagnostics) and boost::chrono::nanoseconds for 1*10^6 repeatitions. – Guillaume Paris Dec 24 '12 at 19:01
  • 5
    If not done right, it might be caused by differences in the optimizers. For instance, if the result is not used, the C++ compiler might simply erase it. If the computations are constant, the compiler might run some sort of constant folding. – luiscubal Dec 24 '12 at 19:15
  • I have of course store the 10^6 sqrt(2.0) calls by doing a sum in a variable ( i.e: var += sqrt(2.0) ) and print it on screen at the end to be sure that compilator will not skip some codes. – Guillaume Paris Dec 24 '12 at 19:19
  • 3
    Still, `sqrt(2.0)` is a constant value. One of the optimizers might replace that with the result. – luiscubal Dec 24 '12 at 19:20
  • really ? ok I will post this point on stackoverflow. – Guillaume Paris Dec 24 '12 at 19:23
  • so yes c++ will call sqrt only once. – Guillaume Paris Dec 24 '12 at 19:55
  • @Guillaume07: no, if the optimizer works the way suspected, the sqrt function is *never* called at run time, only once at compile time (but that is not what you measure). – Doc Brown Dec 24 '12 at 21:04
  • 20
    This has nothing to do with C# or C++. At best this has to do with *certain* implementations of *certain* functions in *certain* libraries. The only true way to actually figure out what's *really* happening is to profile the actual executed code and step through it instruction-per-instruction basis to evaluate what happens where, why, and what does it cost. Anything else is mere guesses, which more often than not go wrong. – zxcdw Dec 24 '12 at 21:07
  • @zxcdw And that's what MichaelT has done below? Why do you think that compiler hurdles have nothing to do with languages? – MrFox Jan 31 '13 at 17:43
  • @suslik Note that I made my comment before MichaelT made his answer to this question. – zxcdw Feb 01 '13 at 10:20

1 Answers1

21

I am speaking only from the C side (and thus applicable to C++). I have no system that can run C# to work from.

The first program I wrote was the trivial:

#include <math.h>
#include <stdio.h>

int main(void) {
    printf("%f\n",sqrt(2.0));
}

Using gcc -S -O3 sqrt.c I got the compiled source in sqrt.s and looked at that.

    .file   "sqrt.c"
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC1:
    .string "%f\n"
    .text
    .p2align 4,,15
.globl main
    .type   main, @function
main:
.LFB14:
    .cfi_startproc
    movsd   .LC0(%rip), %xmm0
    movl    $.LC1, %edi
    movl    $1, %eax
    jmp printf
    .cfi_endproc
.LFE14:
    .size   main, .-main
    .section    .rodata.cst8,"aM",@progbits,8
    .align 8
.LC0:
    .long   1719614413
    .long   1073127582
    .ident  "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
    .section    .comment.SUSE.OPTs,"MS",@progbits,1
    .string "Ospwg" 
    .section    .note.GNU-stack,"",@progbits

One will note that there is no call to sqrt in the code - it looks like its just loading a constant (which it is).

This became more apparent when writing one that used a variable and doing the compile to demonstrate what a call to sqrt would look like.

I'm not going for any sort of elegance with this code.

#include <math.h>
#include <stdio.h>

void main(int argc, char **argv) {
    double num = atoi(argv[0]);
    printf("%f\n",sqrt(num));
}

While gcc -O3 -S sqrt.c worked, this second program as gcc -O3 -S sqrt2.c returned

/tmp/cckmgfMS.o: In function `main':
sqrt2.c:(.text+0x46): undefined reference to `sqrt'
collect2: ld returned 1 exit status

It was calling sqrt, and I forgot to link the math library.

When adding the link to the code, one can see the call to sqrt in it:

        .file   "sqrt2.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%f\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
.LFB14:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movq    (%rsi), %rdi
        xorl    %eax, %eax
        call    atoi
        cvtsi2sd        %eax, %xmm1
        sqrtsd  %xmm1, %xmm0
        ucomisd %xmm0, %xmm0
        jp      .L5
.L2:
        movl    $.LC0, %edi
        movl    $1, %eax
        addq    $8, %rsp
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        jmp     printf
.L5:
        .cfi_restore_state
        movapd  %xmm1, %xmm0
        call    sqrt
        jmp     .L2
        .cfi_endproc
.LFE14:
        .size   main, .-main
        .ident  "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
        .section        .comment.SUSE.OPTs,"MS",@progbits,1
        .string "Ospwg"
        .section        .note.GNU-stack,"",@progbits

One can see in this code the call to sqrt, and the lack of the constants that the optimizer put in.

From the comment above:

I have of course store the 10^6 sqrt(2.0) calls by doing a sum in a variable ( i.e: var += sqrt(2.0) ) and print it on screen at the end to be sure that compilator will not skip some codes. – Guillaume07 Dec 24 '12 at 19:19

So, consider - if you are dealing with constants, this is something that the C and C++ optimizers will identify and optimize out.


Failing having access to C#, I looked at how Java deals with the line:

System.out.println(Math.sqrt(2.0));

This instruction is compiled to the Java byte code of:

0  getstatic java.lang.System.out : java.io.PrintStream [16]
3  ldc2_w <Double 2.0> [22]
6  invokestatic java.lang.Math.sqrt(double) : double [24]
9  invokevirtual java.io.PrintStream.println(double) : void [30]

One can see that the Java complier doesn't have access to the information of the output of sqrt() to be able to optimize into a constant. It is possible that the JIT optimizer might have access to the information about the purity of calls through Math to StrictMath and replace multiple calls of Math.sqrt(2.0) to the same value (and not call it again), however it still has to call it once at that point to get the value. That said, I don't have any insight into what goes on at runtime in the JIT and how calls to pure functions that end up native might be optimized.

However, the C optimizer is still ahead of the game with a big loop (assuming that the JIT optimizer only needs to make one call to sqrt() to get that first value).

When looking at the optimization of the loop in C, the optimizer even precalculates the loop.

#include <math.h>
#include <stdio.h>

int main(void) {
    double sum = 0;
    int i = 0;
    for(i; i < 10; i++) {
        sum += sqrt(2.0);
    }
    printf("%f\n",sum);
}

through gcc -O3 -S sqrt3.c (still no -lm needed) becomes:

    .file   "sqrt3.c"
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC1:
    .string "%f\n"
    .text
    .p2align 4,,15
.globl main
    .type   main, @function
main:
.LFB14:
    .cfi_startproc
    movsd   .LC0(%rip), %xmm0
    movl    $.LC1, %edi
    movl    $1, %eax
    jmp printf
    .cfi_endproc
.LFE14:
    .size   main, .-main
    .section    .rodata.cst8,"aM",@progbits,8
    .align 8
.LC0:
    .long   2034370
    .long   1076644038
    .ident  "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
    .section    .comment.SUSE.OPTs,"MS",@progbits,1
    .string "Ospwg"
    .section    .note.GNU-stack,"",@progbits

And one can see that this code is identical to the first one, with different constants in .LC0 section. The loop has been calculated down to just "the ultimate value is this, don't bother doing it at run time."

  • 4
    This is the answer. I'm wondering though how come the MSVC C# complier isn't optimizing it out to a constant? Seems that this is one of the simpler optmizations for compilers to implement. – MrFox Jan 31 '13 at 17:10
  • 4
    @suslik The C# compiler is likely is looking at it and seeing "This is a call to a system library that is calling a native code library - I'll let it stay a it is because I don't know what it is going to do." That is, no information on the purity of the function call to native code is available for the optimizer to examine. –  Jan 31 '13 at 17:13
  • @suslik I've added another two bits of code - what the java calls compiles to and another that shows how gcc handles looping over the constant (precalculatig the value at compile time rather than actually doing the loop). –  Jan 31 '13 at 17:33
  • 1
    I have to correct the author on one important thing. There is no such thing as the "MSVC C#" compiler. The C# compiler is called "csc.exe" which likely stands for "C Sharp Compiler". Besides MSVC stands for "Microsoft Visual C" and "Visual C" is NOT Visual C#. There is also MSVC++ "Microsoft Visual C++" in both cases its a managed varition of C or C++ – Ramhound Feb 01 '13 at 14:33
  • @MichaelT: Interestingly, one conclusion that one might draw from this is that Java is slow because part of it is written in C (and thus can't be understood by the Java compiler) whereas if parts of Java *weren't* written in C but rather all in Java, then it would be as fast as C! (In fact, that's precisely how e.g. the Self system got its amazing performance; when it came out, it was faster than some production-quality C++ compilers despite being even more dynamic than, say, Ruby, Python or JavaScript and certainly Java and C#.) – Jörg W Mittag May 26 '13 at 16:05
  • @JörgWMittag It really boils down to that the compiler isn't caring too much and trusting that the JIT and hot spot will do the job... and that people aren't writing silly things. JIT/HotSpot aren't aware of the purity of a function at this time and so can't optimize that out - while GCC is aware of the purity of math functions and can recognize that in loops. The [StrictMath](http://stackoverflow.com/questions/4232231/) library is also an interesting read on the subject of how Java deals with math functions. –  May 26 '13 at 21:07
  • You should be able to download Mono on your system and build C# programs – JoelFan Mar 02 '15 at 18:07
  • @JoelFan I could, though the analysis here is more than just "have it on your system" but also "look at decompiled output and understand it" which I would have significantly more difficult with C# under mono than with Java. –  Mar 02 '15 at 18:41