100

I've been learning some C++, and often have to return large objects from functions that are created within the function. I know there's the pass by reference, return a pointer, and return a reference type solutions, but I've also read that C++ compilers (and the C++ standard) allow for return value optimization, which avoids copying these large objects through memory, thereby saving the time and memory of all of that.

Now, I feel that the syntax is much clearer when the object is explicitly returned by value, and the compiler will generally employ the RVO and make the process more efficient. Is it bad practice to rely on this optimization? It makes the code clearer and more readable for the user, which is extremely important, but should I be wary of assuming the compiler will catch the RVO opportunity?

Is this a micro-optimization, or something I should keep in mind when designing my code?

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Matt
  • 1,033
  • 2
  • 7
  • 10
  • 1
    Possible duplicate of [Is micro-optimisation important when coding?](https://softwareengineering.stackexchange.com/questions/99445/is-micro-optimisation-important-when-coding) – gnat Oct 12 '17 at 12:42
  • 1
    Well this is very dependant of multiple factors, the most obivous that come in mind is how the object you're returning was created ? Using new ? If so you have quite a problem there because you won't be able to delete it once returned and thus provoked a memory leak. Note that if I just take what you say in the title, I would say no, a typical example would be to write recursives codes. It's a bit less optimized than writing the same codes as a loop but if the compiler can do it, it will transform it into a loop aniway so why bother ? – Walfrat Oct 12 '17 at 12:43
  • @Walfrat I don't new the objects, since I do know that failing to delete them would result in a memory leak. Specifically, I am using Eigen libraries for matrix objects, and use the "Object obj(params)" syntax to nominally create the object on the stack. The Eigen libraries generally put the data on the heap and manage the memory for you, so I rely on their management to prevent memory leaks. So not using new, is this an acceptable practice? – Matt Oct 12 '17 at 12:49
  • 7
    TO answer to your edit, it is a micro optimisation because even if you tryed to benchmark what you're earning in nanosecond, you'd barely see it. For the rest, I'am too rotten in C++ to provide you a strict answer of why it wouldn't work. One of them if likely that there are cases when you need dynamic allocation and thus use new/pointer/references. – Walfrat Oct 12 '17 at 12:53
  • 4
    @Walfrat even if the objects are quite large, on the order of megabytes? My arrays can get enormous because of the nature of the problems I am solving. – Matt Oct 12 '17 at 13:06
  • 6
    @Matt I wouldn't. References/pointers exist precisely for this. Compiler optimizations are supposed to be beyond what programmers should take into consideration when building a program, even though yes, often times the two worlds overlap. – Neil Oct 12 '17 at 13:11
  • 5
    @Matt Unless you're doing something extremely specifics that is supposing to require developers with 10+ish experience in C/kernels, low hardware interactions you shouldn't need that. If you think you belong to something very specifics, edit your post and add an accurate description of what you're application is supposed to do (real-time ? heavy math computation ? ...) – Walfrat Oct 12 '17 at 13:28
  • 2
    If your objects are that large, then they don't belong in automatic storage in the first place. – Daniel Jour Oct 12 '17 at 13:52
  • 2
    @DanielJour you can have an object that *uniquely owns* a large amount of dynamic storage, which you *do* want to elide copies of. That belongs in automatic storage fine – Caleth Oct 12 '17 at 15:23
  • 37
    In the *particular case* of C++'s (N)RVO, yes, relying on this optimisation is perfectly valid. This is because the C++17 standard **specifically mandates** that it happen, in the situations that modern compilers were already doing it. – Caleth Oct 12 '17 at 15:24
  • What would you call 'large', and how often does the copying occur? Be careful not to optimize too early, especially if you can't yet answer these questions. – Casey Kuball Oct 12 '17 at 16:44
  • Re "I feel that the syntax is much clearer...", that may be your feeling as an inexperienced C++ programmer. It is not, I think, a feeling that will be shared by many experienced programmers. Which may be how you tell inexperienced from experienced programmers :-) – jamesqf Oct 12 '17 at 17:46
  • 2
    If it feels possibly confusing now, then whatever you do, document it. A couple comments won't hurt and may do wonders later. –  Oct 12 '17 at 21:03
  • 1
    Some advice - *if* your code functionally depends on a specific optimization, or is heavily impacted by it, make it as explicit as you can, where it will be seen. I'd certainly add a top-of-file comment pointing to the relevant function(s), and a mention of the relevant file(s) in the Makefile where I set `CFLAGS`. (That also encourages me to minimise the number of functions/files that depend on the optimization). – Toby Speight Oct 13 '17 at 14:20
  • @jamesqf, as an experienced C++ guy, I'd say that return-by-value is much clearer than writing to a passed reference (and that's partly why C++ is moving to mandated RVO). Good move-constructors help, too. – Toby Speight Oct 13 '17 at 14:22
  • Optimizations should only affect a program's performance (in terms of run time, memory usage, etc), not its semantics. As such, your program should behave the same regardless of what optimizations a compiler applies. – chepner Oct 15 '17 at 15:05
  • 2
    @Caleth: "*This is because the C++17 standard specifically mandates that it happen, in the situations that modern compilers were already doing it.*" This is incorrect. C++17 redefines what a prvalue means, which turns uses of prvalues into non-copy/move operations. But modern compilers are able to optimize named return values, as well as other cases that don't involve prvalues. And those are *not* part of C++17's rules. – Nicol Bolas Oct 16 '17 at 05:59

14 Answers14

130

Employ the least astonishment principle.

Is it you and only ever you who is going to use this code, and are you sure the same you in 3 years is not going to be surprised by what you do?

Then go ahead.

In all other cases, use the standard way; otherwise, you and your colleagues are going to run into hard to find bugs.

For example, my colleague was complaining about my code causing errors. Turns out, he had turned off short-circuit Boolean evaluation in his compiler settings. I nearly slapped him.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Pieter B
  • 12,867
  • 1
  • 40
  • 65
  • 6
    I disagree, but at the same time, I depend on short-circuit evaluation for optimizations sometimes. So meh. – Neil Oct 12 '17 at 13:05
  • 90
    @Neil that's my point, everyone relies on short-circuit evaluation. And you shouldn't have to think twice about it, it should be turned on. It's a defacto standard. Yes you can change it, but you shouldn't. – Pieter B Oct 12 '17 at 14:11
  • 51
    "I changed how the language works, and *your dirty rotten code* broke! Arghh!" Wow. Slapping would be appropriate, send your colleague to Zen training, there is a lot of it there. –  Oct 12 '17 at 14:18
  • 6
    Why would that even be an option? There is already a way to turn short-circuit off... `if (expr1 & expr2) { }` – TheCatWhisperer Oct 12 '17 at 14:42
  • 8
    @TheCatWhisperer it was some external code we really didn't want to change, but had to test in interaction with our own. Now there were functions in the evalation with side effects, but one of those (the first) was almost always false, so we never got to check the side effects of the other....hence my colleague turned it off for testing and forgot to turn it on. – Pieter B Oct 12 '17 at 15:02
  • Ha! Where does 'use auto everywhere' fit into the least astonishment principle? – James Oct 12 '17 at 15:48
  • 111
    @PieterB I'm pretty sure the C _and_ C++ language specs guarantee short-circuit evaluation. So it's not just a de facto standard, it's _the_ standard. Without it, you're not even using C / C++ anymore, but something that is suspiciously like it :P – marcelm Oct 12 '17 at 16:08
  • 48
    Just for reference, the standard way here is to return by value. – DeadMG Oct 12 '17 at 16:13
  • 1
    @DeadMG thank you, it took me a few reads of the answer to grok that :) – Matt Oct 12 '17 at 17:41
  • 10
    @TheCatWhisperer Please, don't do that. There are so many "boolean" values in C++ for which `&` instead of `&&` is entirely broken. And if you need to prefix one, or both of the expressions with a `!!`, what have you won? Please, just write readable code instead of micro-optimized code. – cmaster - reinstate monica Oct 12 '17 at 19:27
  • 2
    @cmaster no disagreement from me, was not encouraging the practice – TheCatWhisperer Oct 12 '17 at 19:39
  • 9
    He didn't say that it was a C or C++ compiler. Maybe it was some other language where short-circuit evaluation isn't guaranteed by the language standard (Pascal? BASIC?). – dan04 Oct 12 '17 at 20:31
  • 28
    @dan04 yes it was in Delphi. Guys, don't get caught up in the example it's about the point I made. Don't do surprising stuff nobody else does. – Pieter B Oct 12 '17 at 21:53
  • @PieterB I tried editing your example cause run-on sentences read weird. If you don't like it, no worries. – godskook Oct 13 '17 at 21:16
  • 1
    @PieterB The difference is that short-circuiting is a common idiom in most languages, so no experienced programmer would consider it to be "surprising stuff nobody else does". – Barmar Oct 13 '17 at 21:29
  • 4
    @Barmar I think you missed the point. *Turning off* short-circuit evaluation when compiling *other people's code* is surprising stuff nobody should do. – Wildcard Oct 14 '17 at 09:35
  • 6
    Pascal was the last language I know where short circuit evaluation of logical and / or was neither guaranteed to happen or not to happen. Which was an absolute pain in the ass requiring convoluted code to get the right behaviour. – gnasher729 Oct 14 '17 at 21:01
82

For this particular case, definitely just return by value.

  • RVO and NRVO are well-known and robust optimizations that really ought to be made by any decent compiler, even in C++03 mode.

  • Move semantics ensure that objects are moved out of functions if (N)RVO didn't take place. That's only useful if your object uses dynamic data internally (like std::vector does), but that should really be the case if it is that big -- overflowing the stack is a risk with big automatic objects.

  • C++17 enforces RVO. So don't worry, it won't disappear on you and will only finish establishing itself completely once compilers are up-to-date.

And in the end, forcing an additional dynamic allocation to return a pointer, or forcing your result type to be default-constructible just so you can pass it as an output parameter are both ugly and non-idiomatic solutions to a problem you will probably never have.

Just write code that makes sense and thank the compiler writers for correctly optimizing code that makes sense.

Quentin
  • 1,465
  • 9
  • 10
  • 9
    Just for fun see [how Borland Turbo C++ 3.0 from 1990-ish handles RVO](https://www.youtube.com/watch?v=RWavTVo7D3M). Spoiler: It basically works just fine. – nwp Oct 12 '17 at 16:13
  • 9
    The key here is it is not some random compiler-specific optimization or "undocumented feature," but something that, while technically optional in several versions of the C++ standard, was heavily pushed by the industry and pretty much every major compiler has done it for a very long time. –  Oct 12 '17 at 21:04
  • 7
    This optimization isn't quite as robust as one might like. Yes, it is rather reliable in the most obvious cases, but looking for instance at gcc's bugzilla, there are many barely-less-obvious cases where it is missed. – Marc Glisse Oct 13 '17 at 09:32
63

Now, I feel that the syntax is much clearer when the object is explicitly returned by value, and the compiler will generally employ the RVO and make the process more efficient. Is it bad practice to rely on this optimization? It makes the code clearer and more readable for the user, which is extremely important, but should I be wary of assuming the compiler will catch the RVO opportunity?

This isn't some little known, cutesy, micro-optimization that you read about in some small, little trafficked blog and then you get to feel clever and superior about using.

After C++11, RVO is the standard way to write this code of code. It is common, expected, taught, mentioned in talks, mentioned in blogs, mentioned in the standard, will be reported as a compiler bug if not implemented. In C++17, the language goes one step further and mandates copy elision in certain scenarios.

You should absolutely rely on this optimization.

On top of that, return-by-value just leads to massively easier code to read and manage than code that's return by reference. Value semantics is a powerful thing, that itself could lead to more optimization opportunities.

Barry
  • 1,308
  • 8
  • 11
  • 3
    Thanks, this makes a lot of sense and is consistent with the "least astonishment principle" mentioned above. It would make the code very clear and understandable, and makes it harder to mess up with pointer shenanigans. – Matt Oct 12 '17 at 17:39
  • 3
    @Matt Part of the reason I upvoted this answer is that it does mention "value semantics". As you get more experience in C++ (and programming in general), you will find occasional situations where value semantics cannot be used for certain objects because they are mutable and their changes need to be made visible to other code that uses that same object (an example of "shared mutability"). When these situations happen, the affected objects will need to be shared via (smart) pointers. – rwong Oct 13 '17 at 21:53
18

The correctness of the code you write should never depend on an optimization. It should output the correct result when executed on the C++ "virtual machine" that they use in the specification.

However, what you talk about is more of an efficiency sort of question. Your code runs better if optimized with a RVO optimizing compiler. That's fine, for all the reasons pointed out in the other answers.

However, if you require this optimization (such as if the copy constructor would actually cause your code to fail), now you're at the whims of the compiler.

I think the best example of this in my own practice is tail call optimization:

   int sillyAdd(int a, int b)
   {
      if (b == 0)
          return a;
      return sillyAdd(a + 1, b - 1);
   }

It's a silly example, but it shows a tail call, where a function is called recursively right at the end of a function. The C++ virtual machine will show that this code operates properly, though I may cause a little confusion as to why I bothered writing such an addition routine in the first place. However, in practical implementations of C++, we have a stack, and it has limited space. If done pedantically, this function would have to push at least b + 1 stack frames onto the stack as it does its addition. If I want to calculate sillyAdd(5, 7), this is not a big deal. If I want to calculate sillyAdd(0, 1000000000), I could be in real trouble of causing a StackOverflow (and not the good kind).

However, we can see that once we reach that last return line, we're really done with everything in the current stack frame. We don't really need to keep it around. Tail call optimization lets you "reuse" the existing stack frame for the next function. In this way, we only need 1 stack frame, rather than b+1. (We still have to do all those silly additions and subtractions, but they don't take more space.) In effect, the optimization turns the code into:

   int sillyAdd(int a, int b)
   {
      begin:
      if (b == 0)
          return a;
      // return sillyAdd(a + 1, b - 1);
      a = a + 1;
      b = b - 1;
      goto begin;  
   }

In some languages, tail call optimization is explicitly required by the specification. C++ is not one of those. I cannot rely on C++ compilers to recognize this tail call optimization opportunity, unless I go case-by-case. With my version of Visual Studio, the release version does the tail call optimization, but the debug version does not (by design).

Thus it would be bad for me to depend on being able to calculate sillyAdd(0, 1000000000).

ojdo
  • 131
  • 4
Cort Ammon
  • 10,840
  • 3
  • 23
  • 32
  • I wonder why languages don't include an explicit "tail call" syntax which would require that a compiler either perform a tail call (in which case arbitrary "recursion" depth would be no problem) or squawk at compile time if unable to do so (rather than crashing if code attempts deep recursion). That would seem better than having tail-call optimization as a feature that may improve performance, but couldn't safely be relied upon. – supercat Oct 12 '17 at 20:57
  • 2
    This is an interesting corner-case, but I do not think you can generalize it to the rule in your first paragraph. Suppose I have a program for a small device, that will load if and only if I use the compiler's size-reducing optimizations - is it wrong to do so? it seems rather pedantic to say that my only valid choice is to rewrite it in assembler, especially if that rewrite does the same things as the optimizer does to solve the problem. – sdenham Oct 12 '17 at 21:29
  • 5
    @sdenham I suppose there is a little room in the argument. If you're no longer writing for "C++," but rather writing for "WindRiver C++ compiler version 3.4.1," then I can see the logic there. However, as a general rule, if you're writing something that does not function properly according to the spec, you're in a very different sort of scenario. I know the Boost library has code like that, but they always put it in `#ifdef` blocks, and have a standards compliant workaround available. – Cort Ammon Oct 12 '17 at 21:48
  • I know there's a lot of contention regarding stuff like that with thread safety, such as double-checked locking, which is undefined behavior by the spec, but may have defined behavior on your compiler (such as gcc on x86/64) – Cort Ammon Oct 12 '17 at 21:50
  • Fair points. There are quite a few programs that will run out of resources on a given machine, if given a large enough problem. Tail-call optimization is perhaps unique in the extent to which it can change that behavior. – sdenham Oct 12 '17 at 22:22
  • 4
    is that a typo in the second block of code where it says `b = b + 1`? – stib Oct 13 '17 at 04:27
  • There;s a `goto` in the code!!! *screams* – Swanand Oct 13 '17 at 06:45
  • 1
    No, I disagree with this answer. In my experience, well-written C++ code relies heavily on compiler optimisations; in fact, *all* well-written C++ code does to some extent. Heck, the *C++ standard library* relies heavily on aggressive inlining. Without it, the standard library design would violate one of C++’s core guidelines: “you don’t pay for what you don’t use”. Your specific example is another case of that. *Do* rely on (simple) tail call optimisation: every modern compiler supports it (most go way beyond “simple”), and it sometimes yields more readable code. – Konrad Rudolph Oct 13 '17 at 10:34
  • 1
    … (continued) A more relevant example is: *do* use exceptions in C++. By doing so in performance sensitive code, you rely on exceptions being (near) zero-cost in the non-throwing case. This expectation is far from obvious — in fact, it used to be false for much of C++’s existence (hence guidelines, such as Google’s, to avoid exceptions). And yet it’s true in modern compilers, and many C++ experts consequently recommend the use of exceptions. – Konrad Rudolph Oct 13 '17 at 10:36
  • 2
    You might want to explain what you mean by a "C++ virtual machine", as that's not a term used in any standard document. I *think* you're talking about the execution model of C++, but not completely certain - and your term is deceptively similar to a "bytecode virtual machine" which relates to something entirely different. – Toby Speight Oct 13 '17 at 14:16
  • @CortAmmon: Most often, what one really wants to write for is neither "C++" nor "Acme C++ Version 2.7", but rather "C++ with the features needed for low-level programming on platform X implemented in the most natural fashion". Unfortunately, even though some common features could easily be implemented in consistent fashion across a wide range of implementations, the authors of the Standard don't seem to have much interest in features that would be useful in many but not all implementations. – supercat Oct 13 '17 at 18:51
  • @supercat, For your interest, [Kotlin does](https://kotlinlang.org/docs/reference/functions.html#tail-recursive-functions) include an explicit tail recursion syntax. – chris Oct 13 '17 at 21:36
  • @supercat As always, feel free to write a spec if you think you know what features everyone wants. That's how new languages get created! – Cort Ammon Oct 13 '17 at 23:33
  • 1
    @supercat Scala also has explicit tail recursion syntax. C++ is its own beast, but I think tail recursion is unidiomatic for non-functional languages, and mandatory for functional languages, leaving a small set of languages where it's reasonable to have explicit tail recursion syntax. Literally translating tail recursion into loops and explicit mutation simply is a better option for many languages. – prosfilaes Oct 15 '17 at 07:18
  • 2
    `void burnElectricity() { sillyAdd(0,-1); }` – xryl669 Oct 16 '17 at 10:48
  • Coming back to this a few years later, I thought it might be useful to add to the comments: destructors can complicate this pattern greatly. There's a very specific order in which destructors are called. Tail calling may break this. Its hard in general to prove that the arguments to the tail call do not hold a reference to an object in the frame. Such an object would need to be alive during the call for correctness. – Cort Ammon Sep 08 '22 at 14:44
8

In practice C++ programs are expecting some compiler optimizations.

Look notably into the standard headers of your standard containers implementations. With GCC, you could ask for the preprocessed form (g++ -C -E) and the GIMPLE internal representation (g++ -fdump-tree-gimple or Gimple SSA with -fdump-tree-ssa) of most source files (technically translation units) using containers. You'll be surprised by the amount of optimization which is done (with g++ -O2). So the implementors of containers rely on the optimizations (and most of the time, the implementor of a C++ standard library knows what optimization would happen and write the container implementation with those in mind; sometimes he would also write the optimization pass in the compiler to deal with features required by then standard C++ library).

In practice, it is the compiler optimizations which make C++ and its standard containers efficient enough. So you can rely on them.

And likewise for the RVO case mentioned in your question.

The C++ standard was co-designed (notably by experimenting good enough optimizations while proposing new features) to work well with the possible optimizations.

For instance, consider the program below:

#include <algorithm>
#include <vector>

extern "C" bool all_positive(const std::vector<int>& v) {
  return std::all_of(v.begin(), v.end(), [](int x){return x >0;});
}

compile it with g++ -O3 -fverbose-asm -S. You'll find out that the generated function don't run any CALL machine instruction. So most C++ steps (construction of a lambda closure, its repeated application, getting the begin and end iterators, etc...) have been optimized. The machine code contains only a loop (which does not appear explicitly in the source code). Without such optimizations, C++11 won't be successful.

addenda

(added december 31st 2017)

See CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” talk.

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
4

Whenever you use a compiler, the understanding is that it will produce machine- or byte-code for you. It does not guarantee anything about what that generated code is like, except that it will implement the source code according to the specification of the language. Note that this guarantee is the same regardless of the level of optimization used, and so, in general, there is no reason to regard one output as more 'right' than the other.

Furthermore, in those cases, like RVO, where it is specified in the language, it would seem to be pointless to go out of your way to avoid using it, especially if it makes the source code simpler.

A lot of effort is put into making compilers produce efficient output, and clearly the intent is for those capabilities to be used.

There may be reasons for using unoptimized code (for debugging, for example), but the case mentioned in this question does not appear to be one (and if your code fails only when optimized, and it is not a consequence of some peculiarity of the device you are running it on, then there is a bug somewhere, and it is unlikely to be in the compiler.)

sdenham
  • 253
  • 1
  • 6
3

I think others covered the specific angle about C++ and RVO well. Here is a more general answer:

When it comes to correctness, you should not rely on compiler optimizations, or compiler-specific behavior in general. Fortunately, you don't seem to be doing this.

When it comes to performance, you have to rely on compiler-specific behavior in general, and compiler optimizations in particular. A standard-compliant compiler is free to compile your code in any way it wants to, as long as the compiled code behaves according to the language specification. And I'm not aware of any specification for a mainstream language that specifies how fast each operation has to be.

svick
  • 9,999
  • 1
  • 37
  • 51
2

No.

That's what I do all the time. If I need to access an arbitrary 16 bit block in memory, I do this

void *ptr = get_pointer();
uint16_t u16;
memcpy(&u16, ptr, sizeof(u16)); // ntohs omitted for simplicity

...and rely on the compiler doing whatever it can to optimize that piece of code. The code works on ARM, i386, AMD64, and practically on every single architecture out there. In theory, a non-optimizing compiler could actually call memcpy, resulting in totally bad performance, but that is no problem for me, as I use compiler optimizations.

Consider the alternative:

void *ptr = get_pointer();
uint16_t *u16ptr = ptr;
uint16_t u16;
u16 = *u16ptr;  // ntohs omitted for simplicity

This alternative code fails to work on machines that require proper alignment, if get_pointer() returns a non-aligned pointer. Also, there may be aliasing issues in the alternative.

The difference between -O2 and -O0 when using the memcpy trick is great: 3.2 Gbps of IP checksum performance versus 67 Gbps of IP checksum performance. Over an order of magnitude difference!

Sometimes you may need to help the compiler. So, for example, instead of relying on the compiler to unroll loops, you can do it yourself. Either by implementing the famous Duff's device, or by a more clean way.

The drawback of relying on the compiler optimizations is that if you run gdb to debug your code, you may find out that a lot has been optimized away. So, you may need to recompile with -O0, meaning performance will totally suck when debugging. I think this is a drawback worth taking, considering the benefits of optimizing compilers.

Whatever you do, please make sure your way is actually not undefined behaviour. Certainly accessing some random block of memory as 16-bit integer is undefined behaviour due to aliasing and alignment issues.

juhist
  • 2,579
  • 10
  • 14
1

Compiler optimizations should only effect performance, not results. Relying upon compiler optimizations to meet non functional requirements is not only reasonable it is frequently the reason why one compiler is picked over another.

Flags that determine how particular operations are performed (index or overflow conditions for example), are frequently lumped in with compiler optimizations, but shouldn't be. They explictly effect the results of calculations.

If a compiler optimization causes different results, that is a bug -- a bug in the compiler. Relying upon a bug in the compiler, is in the long term a mistake -- what happens when it gets fixed?

Using compiler flags that change how calculations work should be well documented, but used as needed.

jmoreno
  • 10,640
  • 1
  • 31
  • 48
  • Unfortunately, a lot of compiler documentation does a poor job of specifying what is or is not guaranteed in various modes. Further, "modern" compilers writers seem oblivious to the combinations of guarantees that programmers do and don't need. If a program would work fine if `x*y>z` arbitrarily yields 0 or 1 in case of overflow, *provided that it has no other side-effects*, requiring that a programmer must either prevent overflows at all costs or force the compiler to evaluate the expression a particular way will needless impair optimizations vs. saying that... – supercat Oct 16 '17 at 15:34
  • ...the compiler might *at its leisure* behave as though `x*y` promotes its operands to some arbitrary longer type (thus allowing forms of hoisting and strength reduction that would change the behavior of some overflow cases). Many compilers, however, require that programmers either prevent overflow at all costs or force compilers to truncate all intermediate values in case of overflow. – supercat Oct 16 '17 at 15:40
1

All attempts at efficient code written in anything but assembly relies very, very heavily on compiler optimizations, starting with the most basic like efficient register allocation to avoid superfluous stack spills all over the place and at least reasonably good, if not excellent, instruction selection. Otherwise we'd be back to the 80s where we had to put register hints all over the place and use the minimum number of variables in a function to help archaic C compilers or even earlier when goto was a useful branching optimization.

If we didn't feel like we could rely on our optimizer's ability to optimize our code, we'd all still be coding performance-critical execution paths in assembly.

It's really a matter of how reliably you feel the optimization can be made which is best sorted out by profiling and looking into the capabilities of the compilers you have and possibly even disassembling if there's a hotspot you can't figure out where the compiler seems to have failed to make an obvious optimization.

RVO is something that has been around for ages, and, at least excluding very complex cases, is something compilers have been reliably applying well for ages. It's definitely not worth working around a problem that doesn't exist.

Err on the Side of Relying on the Optimizer, Not Fearing It

To the contrary, I'd say err on the side of relying too much on compiler optimizations than too little, and this suggestion is coming from a guy who works in very performance-critical fields where efficiency, maintainability, and perceived quality among customers is all one giant blur. I'd rather have you rely too confidently on your optimizer and find some obscure edge cases where you relied too much than relying too little and just coding out of superstitious fears all the time for the rest of your life. That'll at least have you reaching for a profiler and investigating properly if things don't execute as quickly as they should and gaining valuable knowledge, not superstitions, along the way.

You're doing well to lean on the optimizer. Keep it up. Don't become like that guy that starts explicitly requesting to inline every function called in a loop before even profiling out of a misguided fear of the optimizer's shortcomings.

Profiling

Profiling is really the roundabout but ultimate answer to your question. The problem beginners eager to write efficient code often struggle with is not what to optimize, it's what not to optimize because they develop all kinds of misguided hunches about inefficiencies that, while humanly intuitive, are computationally wrong. Developing experience with a profiler will start really giving you a proper appreciation of not only your compilers' optimization capabilities which you can confidently lean on, but also the capabilities (as well as the limitations) of your hardware. There's arguably even more value in profiling in learning what wasn't worth optimizing than learning what was.

-1

Software can be written in C++ on very different platforms and for lots of different purposes.

It completely depends on the purpose of the software. Should it be easy to maintain, expand, patch, refactor et.c. or are other things more important, like performance, cost or compatibility with some specific hardware or the time it takes to develop.

mathreadler
  • 200
  • 10
-2

I think the boring answer to this is: 'it depends'.

Is it bad practice to write code that relies on a compiler optimization that is likely to be turned off and where the vulnerability is not documented and where the code in question is not unit tested so that if it did break you'd know it? Probably.

Is it bad practice to write code that relies on a compiler optimization that is not likely to be turned off, that is documented and is unit tested? Maybe not.

Dave Cousineau
  • 313
  • 3
  • 10
-6

Unless there is more you are not telling us, this is bad practice, but not for the reason you suggest.

Possibly unlike other languages you have used before, returning the value of an object in C++ yields a copy of the object. If you then modify the object, you are modifying a different object. That is, if I have Obj a; a.x=1; and Obj b = a;, then I do b.x += 2; b.f();, then a.x still equals 1, not 3.

So no, using an object as a value instead of as a reference or pointer does not provide the same functionality and you could end up with bugs in your software.

Perhaps you know this and it does not negatively affect your specific use case. However, based on the wording in your question, it appears that you might not be aware of the distinction; wording such as "create an object in the function."

"create an object in the function" sounds like new Obj; where "return the object by value" sounds like Obj a; return a;

Obj a; and Obj* a = new Obj; are very, very different things; the former can result in memory corruption if not properly used and understood, and the latter can result in memory leaks if not properly used and understood.

Aaron
  • 231
  • 1
  • 5
  • 8
    Return value optimization (RVO) is a well-defined semantic where the compiler constructs a returned object one level up on the stack frame, specifically avoiding unnecessary object copies. This is well-defined behavior that has been supported long before it was mandated in C++17. Even 10-15 years ago, all major compilers supported this feature and did so consistently. –  Oct 12 '17 at 21:12
  • @Snowman I am not talking about the physical, low-level memory management, and I did not discuss memory bloat or speed. As I specifically showed in my answer, I am talking about the logical data. *Logically*, supplying the value of an object is creating a copy of it, regardless of how the compiler is implemented or what assembly is used behind the scenes. The behind-the-scenes low-level stuff is one thing, and the logical structure and behavior of the language is another; they are related, but they are not the same thing - both should be understood. – Aaron Oct 12 '17 at 21:38
  • 7
    your answer says "returning the value of an object in C++ yields a copy of the object" which is completely false in the context of RVO - the object is constructed _directly_ at the calling location, and no copy is ever made. You can test this by deleting the copy constructor and returning the object _that is constructed in the `return` statement_ which is a requirement for RVO. Furthermore, you then go on to talk about keyword `new` and pointers, which is not what RVO is about. I believe you either do not understand the question, or RVO, or possibly both. –  Oct 12 '17 at 22:33
-7

Pieter B is absolutely correct in recommending least astonishment.

To answer your specific question, what this (most likely) means in C++ is that you should return a std::unique_ptr to the constructed object.

The reason is that this is clearer for a C++ developer as to what's going on.

Although your approach would most probably work, you're effectively signalling that the object is a small value type when, in fact, it isn't. On top of that, you're throwing away any possibility for interface abstraction. This may be OK for your current purposes but is often very useful when dealing with matrices.

I appreciate that if you've come from other languages, all the sigils can be confusing initially. But be careful not to assume that, by not using them, you make your code clearer. In practice, the opposite is likely to be true.

Alex
  • 3,882
  • 1
  • 15
  • 16
  • When in Rome, do as the Romans do. –  Oct 12 '17 at 14:19
  • Thank you, I'll learn about the use of unique_ptr's. I do come from interpreted and memory managed languages, so I'm sorting out what's best practice for some of these cases. – Matt Oct 12 '17 at 15:02
  • @Matt This is quite a common reaction coming from other languages hence my educated guess. Unfortunately, C++ can be a bit of a mess wrt syntax in this area so you'll need to persevere. However, if you want to take advantage of the performance and control that C++ gives you then you'll have to deal with this pretty soon. Good luck! – Alex Oct 12 '17 at 15:09
  • 15
    This is not a good answer for types which do not themselves perform dynamic allocations. That the OP feels the natural thing in his use case is to return by value indicates that his objects have automatic storage duration on the caller side. For simple, not-too-large objects even a naive copy-return-value implementation will be orders of magnitudes faster than a dynamic allocation. (If, on the other hand, the function returns a container, then returning a unique_pointer may even be advantageous compared to a naive compiler return by value.) – Peter - Reinstate Monica Oct 12 '17 at 15:56
  • 9
    @Matt In case you didn't realize this is not best practice. Unnecessarily doing memory allocations and forcing pointer semantics on users is bad. – nwp Oct 12 '17 at 16:21
  • @nwp I didn't know that. It seems I have a lot to learn, and I'll see if I can figure out more to understand all of the different opinions here. Thank you. – Matt Oct 12 '17 at 16:25
  • 5
    First of all, when using smart pointers, one should return `std::make_unique`, not a `std::unique_ptr` directly. Second of all, RVO is not some esoteric, vendor-specific optimization: it is baked into the standard. Even back when it was not, it was widely supported and expected behavior. There is no point is returning a `std::unique_ptr` when a pointer is not needed in the first place. –  Oct 12 '17 at 21:09
  • 5
    @Snowman: There is no "when it was not". Although it's only recently become *mandatory*, every C++ standard ever has recognized [N]RVO, and made accommodations to enable it (e.g., the compiler has always been given explicit permission to omit use of the copy constructor on the return value, even if it has visible side effects). – Jerry Coffin Oct 13 '17 at 07:04