9

Whether in C or C++, I think that this illegal program, whose behavior according to the C or C++ standard is undefined, is interesting:

#include <stdio.h>

int foo() {
    int a;
    const int b = a;
    a = 555;
    return b;
}

void bar() {
    int x = 123;
    int y = 456;
}

int main() {
    bar();
    const int n1 = foo();
    const int n2 = foo();
    const int n3 = foo();
    printf("%d %d %d\n", n1, n2, n3);
    return 0;
}

Output on my machine (after compilation without optimization):

123 555 555

I think that this illegal program is interesting because it illustrates stack mechanics, because the very reason one uses C or C++ (instead of, say, Java) is to program close to the hardware, close to stack mechanics and the like.

However, on StackOverflow, when a questioner's code inadvertently reads from uninitialized storage, the most heavily upvoted answers invariably quote the C or C++ (especially C++) standard to the effect that the behavior is undefined. This is true, of course, as far as the standard goes—the behavior is indeed undefined—but it is curious that alternate answers that try, from a hardware or stack-mechanical perspective, to investigate why a specific undefined behavior (such as the output above) might have occurred, are rare and tend to be ignored.

I even remember one answer that suggested that undefined behavior could include reformatting my hard drive. I didn't worry too much about that, though, before running the program above.

My question is this: Why is it more important to teach readers merely that behavior is undefined in C or C++, than it is to understand the undefined behavior? I mean, if the reader understood the undefined behavior, then would he not be the more likely to avoid it?

My education happens to be in electrical engineering, and I work as a building-construction engineer, and the last time I had a job as a programmer per se was 1994, so I am curious to understand the perspective of users with more conventional, more recent software-development backgrounds.

Sisir
  • 828
  • 1
  • 7
  • 17
thb
  • 747
  • 5
  • 12
  • 3
    Sometimes it's really hard to understand what your program actually does until you look at the produced assembly and see that the compiler has suddenly optimized away a good chunk of code all due to one little piece of undefined behaviour. – chris Sep 13 '14 at 01:18
  • 7
    Undefined behavior means that anything could happen. Whether the output makes sense or not, it doesn't matter... It's just random luck that the compiler is implemented as you would expect it to be.... – Jaa-c Sep 13 '14 at 01:18
  • I see one close vote that the question is opinion-based. Before the total rises to five, if the next closing voter would help me to rephrase the question so that it is not primarily opinion-based, I would appreciate it. What I would like to understand is the rationale that underlies a class of answers that are common on this site. – thb Sep 13 '14 at 01:20
  • @Barmar: I had thought of that. Maybe you are right. If the consensus were indeed that the question belonged on meta, then I would repost it there. – thb Sep 13 '14 at 01:22
  • @thb Super-close power only applies to duplicates. – Barmar Sep 13 '14 at 01:22
  • 5
    How a compiler chooses to compile UB is too specific to be a useful SO question: it depends on the particular compiler, OS, machine architecture, optimisation levels, and which exact version of the compiler you're using. The series of articles at http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html is a good overview of why you should avoid UB and some of the things that can go wrong. – Paul Hankin Sep 13 '14 at 01:25
  • @Insilico Actually, that's exactly what I was talking about. I submitted a close vote saying that it belongs on meta.SO. – Barmar Sep 13 '14 at 01:26
  • @Barmar: Argh, serves me right for reading the question with my C++ glasses on. Ignore everything I've said. – In silico Sep 13 '14 at 01:27
  • Any knowledge you gain from analyzing the behavior of a particular piece of code exhibiting undefined behavior applies to that piece of code compiled using a particular version of a particular compiler using a particular set of compiler settings only. There is little point in doing so. – T.C. Sep 13 '14 at 01:27
  • There are four close votes now. I will add the fifth vote, myself, and go to meta, per consensus. – thb Sep 13 '14 at 01:33
  • 1
    @Barmar: If this question is about the philosophy of C++ UB, then it's not a question for our meta, because it's not about the site itself. Whether it is on-topic for this site is a separate issue (it *probably* isn't on-topic). The intent of this question can only be clarified by the asker - it would be unwise of them to follow community consensus because in actuality it's unclear what exactly is being asked. – BoltClock Sep 13 '14 at 08:49
  • @BoltClock I don't see this question as being about C++. It's about the proper way to answer a common type of question on SO. These are usually about C and C++, but they can in principle be about other languages. – Barmar Sep 13 '14 at 19:17
  • 4
    A different compiler, or the same compiler under different settings, different optimization levels, or perhaps even on a different system, might compile the code differently. You cannot know for certain what the results will be. As it is up to the inner "black magic" of the compiler, and it is possibly influenced by options and other outside parameters, making it possibly not reproducible, and even if it were, not advisable. If you want to learn about the stack there are better ways to do so, I would perhaps suggest looking at a valid codes assembly output. – Tommy Andersen Sep 13 '14 at 22:12
  • 2
    The problem with this question is in how you define "undefined" (ha!). If you know what the compiler is going to do, *it is not undefined*: it is implementation-defined (if the ISO C standard doesn't give the implementation explicit permission to define it, then it's implementation-defined and also you're now using GNU C or whatever rather than ISO C). It isn't meaningful to talk about "understanding" *true* UB; if it can be consistently understood, it isn't. – Alex Celeste Sep 15 '14 at 07:26
  • In this case, though, you don't actually know what it is going to do. You've got a pretty good theory based on experience. Unless the compiler vendor explicitly defined it, the behavior could easily change with a patch or even with a different set of compiler flags. But yes, "undefined" is probably not the best word. "Undependable" might be better. You can't assume that it will always work as observed. – Gort the Robot Nov 03 '14 at 20:55
  • On a similar context - https://softwareengineering.stackexchange.com/questions/398703/why-does-c-have-undefined-behaviour-and-other-languages-like-c-or-java-don – Sisir Sep 23 '19 at 11:52

5 Answers5

8

Undefined behavior ultimately means the behavior is non-deterministic. Programmers who are unaware that they are writing non-deterministic code are just bad ignorant programmers. This site aims to make programmers better (and less ignorant).

Writing a correct program in the face of non-deterministic behavior is not impossible. However, it is a specialized programming environment, and requires a different kind of programming discipline.

Even in your example, if the program receives an externally raised signal, the values on the "stack" may change in such a way that you don't get the expected values. Moreover, if the machine has trap values, reading random values may very well cause something strange to happen.

jxh
  • 479
  • 3
  • 10
  • I am not very familiar with *trap values.* I should investigate. – thb Sep 13 '14 at 01:30
  • 4
    @jxh I'm not sure *non-deterministic* is right. A program could be *undefined* but completely repeatable on a given platform, right? – quant Sep 13 '14 at 01:32
  • 3
    @Arman: It may or may not be repeatable on a given platform, that is the point. – jxh Sep 13 '14 at 01:33
  • @Arman: I agree with you, non-deterministic usually means that in a computation different choices / solutions are considered at the same time, while undefined behaviour means that the language specification does not define a specific behaviour for a given construct. – Giorgio Sep 13 '14 at 21:33
  • @Giorgio: I know that usage of "non-deterministic", but it is not what I mean in this context. The usage you refer to deals with formal systems such as automata. When dealing with debugging software, "non-deterministic behavior" means "the software may behave in an arbitrary and unpredictable way". For most systems, this means the program has a bug. – jxh Sep 14 '14 at 00:01
  • @jxh: The point is that undefined behaviour can be deterministic, but different from one implementation to another. – Giorgio Sep 14 '14 at 02:13
  • 1
    @Giorgio: The other point is that undefined behavior need not be deterministic, even for the exact same platform and implementation. – jxh Sep 14 '14 at 02:54
  • 1
    C and C++ use two different terms: undefined behavior and unspecified behavior. There's also indeterminately sequenced. And the distinction is important. It is possible, albeit difficult, to write a correct program in the presence of unspecified behavior. But no amount of careful coding can ensure correctness in the presence of undefined behavior. Undefined behavior removes the semantic meaning of your entire program. On the other hand, behavior left undefined by the language may be defined by the platform. – Ben Voigt Sep 15 '14 at 13:32
  • @quant Yes, undefined behavior can be totally repeatable on a given platform and sometimes on multiple platforms. And there are plenty of successful software products that utilize undefined behavior. – Adam S Sep 16 '14 at 19:08
  • @BenVoigt: Thanks for clarifying the difference between undefined, unspecified behavior. When I was speaking about a correct program for non-deterministic behavior, I meant systems such as onboard space computers, where redundant systems vote for consensus to deal with reading data from core memory that may have suffered partial memory damage (e.g., from radiation). – jxh Sep 21 '14 at 23:41
  • 1
    @jxh: Fault-tolerant systems are indeed quite interesting. But they aren't undefined-behavior tolerant. Copies running in lockstep which encounter undefined behavior may all make the wrong choice, and voting won't help then. – Ben Voigt Sep 22 '14 at 11:26
  • @BenVoigt: Thanks, I agree that a correct program should not have undefined behavior! – jxh Sep 22 '14 at 13:32
  • @jxh: Unfortunately, the authors of the Standard essentially rely upon implementations to behave sensibly in some cases where it actually imposes no requirements. Even crazily-aggressive compilers will process some of those cases sensibly because if they did otherwise they couldn't plausibly claim that all forms of UB were intended as invitations for compilers to jump the rails. – supercat Jul 03 '18 at 20:34
6

Why is it more important to teach readers merely that behavior is undefined in C or C++, than it is to understand the undefined behavior?

Because the specific behavior may not be repeatable, even from run to run without rebuilding.

Chasing down exactly what happened may be a useful academic exercise for better understanding the quirks of your particular platform, but from a coding perspective the only relevant lesson is "don't do that". An expression like a++ * a++ is a coding error, full stop. That's really all anyone needs to know.

John Bode
  • 10,826
  • 1
  • 31
  • 43
5

Frama-C's value analysis, a static analyzer the purported goal of which is to find all undefined behaviors in a C program, considers the assignment const int b = a; as okay. This is a deliberate design decision in order to allow memcpy() (typically implemented as a loop over unsigned char elements of a virtual array, and that the C standard arguably allows to re-implement as such) to copy a struct (which can have padding and uninitialized members) to another.

The “exception” is only for lvalue = lvalue; assignments without an intervening conversion, that is, for an assignment that amounts to a copy of a slice of memory for a memory location to another.

I (as one of the authors of Frama-C's value analysis) discussed this with Xavier Leroy at a time when he was himself wondering about the definition to pick in the verified C compiler CompCert, so he may have ended up using the same definition. It is in my opinion cleaner than what the C standard tries to do with indeterminate values that can be trap representations, and the type unsigned char that is guaranteed not to have any trap representations, but both CompCert and Frama-C assume relatively non-exotic targets, and perhaps the standardization committee was trying to accommodate platforms where reading an uninitialized int can indeed abort the program.

Returning b, or passing n1, n2 or n3 to printf in the end at least can be considered undefined behavior, because copying an uninitialized slice of memory does not making it initialized. With an oldish Frama-C version:

$ frama-c -val t.c
…
t.c:19:… accessing uninitialized left-value: assert \initialized(&n1);

And in an oldish version of CompCert, after minor modifications to make the program acceptable to it:

$ ccomp -interp t.c
Time 33: in function foo, expression <loc> = <undef>
ERROR: Undefined behavior
5

"Undefined Behavior" is shorthand for "This behavior is not deterministic; not only will it probably behave differently in different compilers or hardware platforms, it may even behave differently in different versions of the same compiler."

Most programmers would consider this an undesirable characteristic, especially since C and C++ are standards-based languages; that is, you use them, in part, because the language specification makes certain guarantees about how the language will behave, if you are using a standards-compliant compiler.

As with most things in programming, you have to weight the advantages and disadvantages. If the benefit of some operation that is UB exceeds the difficulty of getting it to behave in a stable, platform-agnostic fashion, then by all means, use the undefined behavior. Most programmers will think it is not worth it, most of the time.

The remedy for any undefined behavior is to examine the behavior that you actually get, given a particular platform and compiler. That sort of examination is not one that an expert programmer is likely to explore for you in a Q&A setting.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • +1 As @aschepler has explained better than I, the detailed specifics of undefined behavior tend to be of interest during debugging. If my unit test segfaults, and I understand the memory-management mechanics that produce segfaults, then I can debug my program faster. Of course you are right: it is hard to think of a case in which one would purposely invoke UB in finished code! – thb Sep 13 '14 at 21:51
  • 1
    You misses "with different compile options". Always fun when the Develop/Test/Release versions behave differently. – H H Sep 14 '14 at 12:46
  • 1
    Or even "may produce different results in consecutive runs of the same binary, resulting from a single compilation". – Vatine Nov 04 '14 at 10:00
  • Undefined Behavior was sometimes intended to mean that, and sometimes intended to mean "This action behavior should work identically on all implementations for platforms we know about, but would be allowed to behave differently on platforms where that would be problematic; there's no need to mandate the normal behavior on common platforms because compiler writers who aren't being deliberately obtuse will process things that way whether or not the Standard requires them to". An example of the latter would be `(-1)<<1` which C89 defined as -2 on platforms that use non-padded two's-complement... – supercat Jul 04 '18 at 17:34
  • ...integer types, but C99 regards as Undefined Behavior without giving any reason for the change. If one interprets the intended meaning as above, then it wouldn't be a breaking change except on platforms where the C89 behavior was impractical but some code relied upon it anyway. – supercat Jul 04 '18 at 17:36
1

If the documentation for a particular compiler says what it will do when code does something which is considered "Undefined Behavior" by the standard, then code which relies upon that behavior will work correctly when compiled with that compiler, but may behave in arbitrary fashion when compiled using some other compiler whose documentation does not specify the behavior.

If the documentation for a compiler does not specify how it will handle some particular "undefined behavior", the fact that a program's behavior seems to obey certain rules says nothing about how any similar programs will behave. Any variety of factors may cause a compiler to emit code which handles unexpected situations differently--sometimes in seemingly-bizarre fashion.

Consider, for example, on a machine where int is a 32-bit integer:

int undef_behavior_example(uint16_t size1, uint16_t size2)
{
  int flag = 0;
  if ((uint32_t)size1 * size2 > 2147483647u)
    flag += 1;
  if (((size1*size2) & 127) != 0) // Test whether product is a multiple of 128
    flag += 2;
  return flag;
}

If size1 and size2 were both equal to 46341 (their product is 2147488281) one might expect that the function would return 3, but a compiler could legitimately skip the first test entirely; either the product would be small enough that the condition would be false, or the upcoming multiplication would overflow and relieve the compiler of any requirement to do, or have done, anything. While such behavior may seem bizarre, some compiler authors seem to take great pride in their compilers' abilities to eliminate such "unnecessary" tests. Some people might expect that an overflow on the second multiply would, at worst, cause the all bits of that particular product to be arbitrarily corrupted; in fact, however, in any case where a compiler can determine that overflow either must have occurred or would be inevitable before the next sequenced observable side-effect, a compiler would be free to do anything it likes.

supercat
  • 8,335
  • 22
  • 28
  • Wouldn't the multiplication be done modulo UINT16_MAX? – curiousguy Jul 03 '18 at 20:24
  • @curiousguy: If `int` is a 32-bit integer, then values of type `uint16_t` will be promoted to `int` before any computations involving them. A rule which would generally be just fine if implementations only treated signed arithmetic as different from unsigned in cases where they would have different defined behaviors. – supercat Jul 03 '18 at 20:30
  • I believe any operand of unsigned type caused the operation to be unsigned. – curiousguy Jul 03 '18 at 20:59
  • @curiousguy: Some compilers worked that way in the days before the Standard, but the Standard specifies that unsigned types which rank below `unsigned` and have range of values that will fit entirely within that of `int`, get promoted to a signed `int`. – supercat Jul 03 '18 at 21:35