4

In something as simple as

int sq(int x) { int y = x*x; return y; }

versus

int sq(int x) { return (x*x); }

the former function requires an extra IL step.

EDIT: IL code from the first example, using the current C# compiler from VS 2013 Update 4 (.NET Framework 4.5.1), code from the optimized release version:

  IL_0000:  ldarg.0
  IL_0001:  ldarg.0
  IL_0002:  mul
  IL_0003:  stloc.0   // These two instructions
  IL_0004:  ldloc.0   // seem to be unneccessary
  IL_0005:  ret

and from the second example:

  IL_0000:  ldarg.0
  IL_0001:  ldarg.0
  IL_0002:  mul
  IL_0003:  ret

Is that IL extra step indicative of data being copied in memory? Is this difference still present when using something seemingly better optimized? (C?)

If yes to first or both: can I use the latter, especially when the returned data takes a lot of space, and presumably a lot of time to be moved in memory? Or should I favor the former, because it is more readable? LINQ makes it easy to have whole functions inside a long return() line, but my example faces the same choice were it C. Speed or readability?

In general, would you put the last bit of maths between the parentheses in the return line?

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
nvja
  • 167
  • 4
  • 2
    While your question has not been closed, you might want to check out [Why was my question closed as primarily opinion-based?](http://meta.programmers.stackexchange.com/a/6491/). – Adam Zuckerman Dec 21 '14 at 04:57
  • 7
    @AdamZuckerman: there is nothing opioninated about the fact the compiler does not produce the same output here. – Doc Brown Dec 21 '14 at 08:42
  • 4
    @DocBrown it is, however, purely opinion that the first way of writing the code is clearer... – Jules Dec 21 '14 at 08:58
  • @Jules gooed point. Especially because I consider the second much, much clearer :] If the first would have been formatted in multiple lines it would be more readable though. – stijn Dec 21 '14 at 09:17
  • @DocBrown That doesn't look like the optimized machine code I'm getting. Look at my updated answer. – svick Dec 21 '14 at 14:15
  • 1
    @svick: upvoted your answer. I will leave the question as it is, maybe the JIT does indeed not optimize more when used in debugger context. – Doc Brown Dec 21 '14 at 14:21
  • @DocBrown 1. [That behavior is documented.](http://msdn.microsoft.com/en-us/library/ms241594.aspx) 2. If you don't believe me and the documentation, you can try it yourself. – svick Dec 21 '14 at 14:26
  • @svick: did I wrote I don't believe you? Was something in "upvoted your answer" unclear? The question which remains is: why does the C# compiler not optimize this at first hand when producing the IL code? – Doc Brown Dec 21 '14 at 14:29
  • 2
    @DocBrown Why should it? Such optimization wouldn't result in faster execution (because the JIT can already optimize this), but would likely result in worse debugging experience (if you're for some reason debugging Release mode assembly). – svick Dec 21 '14 at 14:35
  • @svick: sounds reasonable, why not add that to your answer? – Doc Brown Dec 21 '14 at 14:38
  • 1
    The `nop`s in your assembly code are a clear indication that JIT optimizations are disabled. Using the default settings VS disables JIT optimizations if you run in the debugger. – CodesInChaos Dec 21 '14 at 15:04
  • @docbrown, the question, before editing was and still is asking for an opinion: which code is better? It appears the question has morphed somewhat into asking about IL since I reviewed it. – Adam Zuckerman Dec 21 '14 at 15:05
  • @AdamZuckerman: I am not going to change the original question. All I did was giving a prove for what the OP already stated ("function requires an extra IL step"), since Telaystyn's answer seemed to ignore it. And yes, I agree, the original question has an opinionated part, but as we see, the question is pretty answerable with good reasoning about how the optimizers at the different levels work. – Doc Brown Dec 21 '14 at 15:22
  • In truth I have the option of doing `return(somelist.Where(..).Select(..).Union(anotherlist).Where(..)))` , and so on. Or more explicitly `newlist = somelist.Where(..).Select(..); secondnewlist = .. ; newlist.UnionWith(..); return(newlist)`. Should I put loads of things into the return line, with the benefit that I am getting rid of some data duplication steps? I am not a compiler designer, and so would have hoped that both compile to the same EXE - and would have gone for readability. But then LinqPad showed me different IL codes. So now there is a dillema: speed or readability? – nvja Dec 21 '14 at 19:22
  • 1
    @NickSandor There is no dilemma, both will be compiled to the same machine code. And even if there was a speed difference, it's very unlikely that it would be measurable. – svick Dec 21 '14 at 22:05
  • The time we've spent typing these comments is more than the summed difference in performance over the entire lifetime of this code :) – sea-rob Dec 22 '14 at 05:58
  • one may argue that conceptually, this has been addressed in [How would you know if you've written readable and easily maintainable code?](http://programmers.stackexchange.com/a/141010/31260) If your peers keep complaining about your way of doing things, be it one way or another, you better change to make them feel better – gnat Dec 22 '14 at 10:37
  • 2
    @NickSandor This sounds like an extremely premature micro-optimization. I can't imagine the performance impact of storing a reference in a variable showing up in any benchmark where the rest of the code is busy mapping, filtering, and merging lists. – Doval Dec 22 '14 at 12:36

4 Answers4

16

Any self respecting compiler will compile both examples to the same IL. Further, the first isn't any more readable, since you have a garbage variable with no information. It's just noise which decreases readability. If that temporary actually had a good name that provided some useful information to the reader, then the first way would (usually) be better.

But in general, favor readability over speed (which is only barely correlated to length of code).

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • IL came from LinqPAD. – nvja Dec 21 '14 at 03:13
  • 1
    +1'd you for the point of your answer, but saying "Any self respecting compiler will compile both examples to the same IL" is a bit misleading maybe. Not all languages compile to the IL the .NET languages do. The example mentioned (C) actually compiles straight to assembler – Dylan Meeus Dec 21 '14 at 03:13
  • 4
    @DylanMeeus - whatever. Temporary removal is one of the basic optimizations made in any optimizing compiler. – Telastyn Dec 21 '14 at 03:34
  • I think LinqPad does not work with C. Hence my question: is this a quirk of C#, meaning a C compiler would fare better, and I should just focus on readability? For all I know it could be a quirk of LinqPad. – nvja Dec 21 '14 at 03:53
  • 1
    @NickSandor Did you build your application in debug or release mode? As far as I'm aware, the compiler won't make any optimisations in debug mode by default. – Benjamin Hodgson Dec 21 '14 at 08:29
  • @BenjaminHodgson: see my edit, I tried it out and can confirm what the OP wrote. – Doc Brown Dec 21 '14 at 08:39
  • 2
    -1 for a (wrong) assumption about the compiler without actually trying it out (though I agree 100% to the remaining part of your answer). Actually, I would have had expected what you write in your first sentence, too, but it really seems to be wrong. – Doc Brown Dec 21 '14 at 08:40
  • Note that IL isn't executed directly, but is compiled to native code by the JIT (hence the I in its name, which stands for Intermediate). I can only presume that the redundant store/load is eliminated at this stage (otherwise the compiler would make the effort to optimize it away). – Jules Dec 21 '14 at 09:04
  • @Jules: see my edit, wrong assumption again – Doc Brown Dec 21 '14 at 09:09
  • 3
    @DocBrown interestingly, it turns out that the optimization isn't necessary. Modern processors implement a feature called store-load forwarding that would spot this pair of instructions before they reach actual execution units and essentially work out that they are a no-op. A memory write would be added to a queue, but as long as something else gets written to the same address reasonably quickly, it turns out the cpu can work out that even the store instruction has no actual effect. See https://fgiesen.wordpress.com/2013/03/04/speculatively-speaking/ – Jules Dec 21 '14 at 10:08
  • 1
    `Further, the first isn't any more readable` - well that's opinion-based again, personally I do often find it more readable, depending on the context of course. Also it's somewhat more convenient for debugging; I can set a breakpoint and preview the returned result more easily. – Konrad Morawski Dec 21 '14 at 12:23
  • @KonradMorawski - I would tend to agree, but in _this_ example, it does nothing but add noise. If the function were more complex, or the temporary gave it a good name, then that would be better. – Telastyn Dec 21 '14 at 14:48
13

First of all, micro-optimizations like this rarely make sense. You should focus on readability and worry about these kinds of optimizations only when you've identified the code that you actually need to optimize by profiling.


Now, if you're still interested about performance, looking at IL doesn't make much sense, because it's not IL that's executed.

Instead, what you should do is to measure.

Another option would be do look at the generated machine code (usually x86). But the problem with that is that CPUs are very complicated and it can be very difficult to figure out which version of the code is going to be faster.


Note: looking at the machine code is not as simple as opening the Disassembly window. You need to run in the Release mode, and make sure the CLR actually does optimize the code (either by unchecking "Supress JIT optimization on module load" or by starting without the debugger attached: Ctrl+F5, and attaching it later e.g. using Debugger.Break() and selecting to debug the code).

If you do that, you will most likely notice that both versions of the method are going to be inlined, which makes it hard to see which instructions belong to your method (but both versions generate the same machine code for me).

If you want to compare the two versions of the method in isolation, you can use [MethodImpl(MethodImplOptions.NoInlining)]. With that, the code I'm getting for both versions of the method is the same:

push        ebp  
mov         ebp,esp  
mov         eax,edx  
imul        eax,edx  
pop         ebp  
ret  
svick
  • 9,999
  • 1
  • 37
  • 51
  • 3
    Unless you are calling this millions of times in a tight loop, you shouldn't be wasting your time worrying about performance. – user949300 Dec 21 '14 at 04:37
  • 1
    @user949300 I know what you mean, but that is maybe not the best way to put it. Counterexample: if this gets called 10 million times in a loop but I'm only going to look at the output after a year, for some reason, it's still plenty fast and I shouldn't waste time on any worries.. *Usually* the principle is: don't do anything if there's no performance problem which manifests. If there is, measure. Then decide what needs fixing. – stijn Dec 21 '14 at 09:23
  • @user949300 I have added a note about that. – svick Dec 21 '14 at 14:36
  • With the risk of upsetting moderators, I admit my question could have been clearer. Perhaps removing the examples would have allowed people to focus on the question which gives the title of the topic. Thanks for answering all the points, including the one I cared about, speed vs readability. – nvja Dec 22 '14 at 19:13
4

The IL that you're looking at isn't necessarily indicative of the native code that the JIT compiler will generate at runtime (or that ngen will generate if you pre-compile your native binaries).

The IL in this case is obviously remaining somewhat representative of the original code.

EDIT: The comments posted after the OP posted the IL and disassembly have noted that the assembly is a debug build, so optimizations are turned off, and the JIT will likely optimize this away if you compile a release build.

EDIT: And I definitely agree with the comments that debugging the "assign to variable, return the variable" variant is easier, for the simple fact that you can inspect the result of the assignment before the method returns.

However, every compiler (C#, VB.NET, FORTRAN, COBOL, F#, etc.) is a different beast, transforming those languages into common IL, which is then transformed by the JIT into native code for the CPU the CLR finds itself running on.

IL is an acronym for "Intermediate Language," after all, not for "Fully Optimized Native Machine Language." I'd imagine the return on investment from creating fully optimized IL when the JIT itself is already an optimizing compiler may not be worth it in most cases.

Also, it might be interesting if you would post the actual IL from each example.

I guess I wouldn't really worry about it.

It's probably a rare occurrence that you're going to outwit the optimizations built into the compiler. Those optimizations are what those guys (and/or gals) are focusing on all the time.

I'd worry more about correct, human-readable code that's easy to debug, and the 80/20 rule. 80% of the performance issues are going to be in 20% of the code. Actually, it's probably more like 95/5.

Profile your code. Run tests and time them, then optimize the parts that are actually problematic. I'd bet you lunch that the problem bits are often in places you don't expect, and a lot of code you thought was going to be a performance issue just won't be, because it runs rarely or isn't in a super-critical path. And I'll bet the JIT optimizes away whatever you're worried about with this example, anyway. ;-)

Craig Tullis
  • 1,935
  • 12
  • 10
1

I'm going to directly challenge that the first example is more readable. In this code, it is very clear what is happening, you take an int x and return x*x.

int sq(int x) {
    return (x*x);
}

Adding an extra instruction (y=x*x; return y) adds nothing to readability, and in fact makes me wonder what I might be missing (eg. were there lines you've removed), because you are assigning a new variable.


However, in the general "readability vs. speed" argument, I would suggest readability trumps all others. In most cases, programmer time is vastly more expensive than computation or storage. CodeGolf.StackExchange is an exemplary example of this, as it is filled with alot of very fast, very clever solutions to problems, but the time to actually understand how some of them work is quite long.

"Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is." — Rob Pike

Optimisations in code to improve bytecode performance should only be considered after every other part of the code is profiled to identify potential bottlenecks. If your bytecode optimisation makes a naive O(n!) operation a little faster for each n, finding a O(n^2) solution in code will grant much better results.

Even then, when you start investigating bytecode operation to squeeze extra performance, the cost of adding an extra core to a server, or buying a little extra server time, might be minuscule compared to the time required for code changes and investigations for a bytecode optimal solution - the reason I say this is that if a bytecode optimal solution were trivially easy, it would already be incorporated into the compiler/interpreter.