18

I was reading this question on SO which discusses some common undefined behavior in C++, and I wondered: does Java also have undefined behaviour?

If that is the case, then what are some common causes of undefined behaviour in Java?

If not, then which features of Java make it free from such behaviours and why haven't the latest versions of C and C++ been implemented with these properties?

Eight
  • 325
  • 2
  • 7
  • 4
    Java is very rigidly defined. Check the Java Language Specification. –  Jun 22 '12 at 08:50
  • 1
    Related: http://blogs.msdn.com/b/ericlippert/archive/2012/06/18/implementation-defined-behaviour.aspx – CodesInChaos Jun 22 '12 at 08:58
  • 5
    @user1249, "undefined behavior" is actually pretty rigidly defined as well. – Pacerier Jun 25 '14 at 16:00
  • Possible same on SO: http://stackoverflow.com/questions/376338/what-are-the-common-undefined-behaviours-that-java-programmers-should-know-about – Ciro Santilli OurBigBook.com Mar 17 '15 at 11:18
  • What does Java say about when you violate a "Contract"? Such as happens when you overload .equals to be incompatable with .hashCode? https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode() Is that colloquially undefined, but not technically in the same way that C++ is? – Mooing Duck Apr 17 '17 at 23:27
  • A related question: https://softwareengineering.stackexchange.com/questions/398703/why-does-c-have-undefined-behaviour-and-other-languages-like-c-or-java-don – Sisir Sep 22 '19 at 10:33

5 Answers5

18

In Java, you can consider behavior of incorrectly synchronized program undefined.

Java 7 JLS uses word "undefined" once, in 17.4.8. Executions and Causality Requirements:

We use f|d to denote the function given by restricting the domain of f to d. For all x in d, f|d(x) = f(x), and for all x not in d, f|d(x) is undefined...

Java API documentation specifies some cases when results are undefined - for example, in (deprecated) constructor Date(int year, int month, int day):

The result is undefined if a given argument is out of bounds...

Javadocs for ExecutorService.invokeAll(Collection) state:

The results of this method are undefined if the given collection is modified while this operation is in progress...

Less formal kind of "undefined" behavior can be found for example in ConcurrentModificationException, where API docs use term "best effort":

Note that fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast operations throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness...


##Appendix One of the question comments refers to an article by Eric Lippert which provides helpful introduction into topic matters: Implementation-defined behaviour.

I recommend this article for the language-agnostic reasoning, although it is worth keeping in mind that author targets C#, not Java.

Traditionally we say that a programming language idiom has undefined behaviour if use of that idiom can have any effect whatsoever; it can work the way you expect it to or it can erase your hard disk or crash your machine. Moreover, the compiler author is under no obligation to warn you about the undefined behaviour. (And in fact, there are some languages in which programs that use "undefined behaviour" idioms are permitted by the language specification to crash the compiler!)...

By contrast, an idiom that has implementation-defined behaviour is behaviour where the compiler author has several choices about how to implement the feature, and must choose one. As the name implies, implementation-defined behaviour is at least defined. For example, C# permits an implementation to throw an exception or produce a value when an integer division overflows, but the implementation must pick one. It cannot erase your hard disk...

What are some of the factors that lead a language design committee to leave certain language idioms as undefined or implementation-defined behaviours?

The first major factor is: are there two existing implementations of the language in the marketplace that disagree on the behaviour of a particular program? ...

The next major factor is: does the feature naturally present many different possibilities for implementation, some of which are clearly better than others? ...

A third factor is: is the feature so complex that a detailed breakdown of its exact behaviour would be difficult or expensive to specify? ...

A fourth factor is: does the feature impose a high burden on the compiler to analyze? ...

A fifth factor is: does the feature impose a high burden on the runtime environment? ...

A sixth factor is: does making the behaviour defined preclude some major optimization? ...

Those are just a few factors that come to mind; there are of course many, many other factors that language design committees debate before making a feature "implementation defined" or "undefined".

Above is only a very brief coverage; full article contains explanations and examples for the points mentioned in this excerpt; it is much worth reading. For example, details given for the "sixth factor" can give one an insight into motivation for many statements in Java Memory Model (JSR 133), helping to understand why some optimizations are allowed, leading to undefined behavior while others are prohibited, leading to limitations like happen-before and causality requirements.

None of the article materials is particularly new to me but I'll be damned if I ever seen it presented in such an elegant, consise and understandable way. Amazing.

Glorfindel
  • 3,137
  • 6
  • 25
  • 33
gnat
  • 21,442
  • 29
  • 112
  • 288
  • I'll add that the JMM != underlying hardware and the end result of an executing program with regards to concurrency can vary from say an WinIntel vs a Solaris – Martijn Verburg Jun 22 '12 at 09:33
  • 2
    @MartijnVerburg that's a pretty good point. Only reason why I hesitate to tag it as "undefined" is that memory model poses constraints like _happen-before_ and _causality_ on execution of correctly synced program – gnat Jun 22 '12 at 09:41
  • True, the spec defines how it should behave under the JMM, however, Intel et al don't always agree ;-) – Martijn Verburg Jun 22 '12 at 09:49
  • @MartijnVerburg I think the main point of JMM is to prevent _over-optimizing_ leaks from "disagreeing" processor makers. As far as I understand Java before 5.0 had this kind of headache with DEC Alpha, when speculative writes done under the hood could leak into program like "out of thin air" - hence, _causality_ requirement went into JSR 133 (JMM) – gnat Jun 22 '12 at 09:53
  • 9
    @MartinVerburg - it is a JVM implementer's job to make sure that the JVM behaves according to the JLS/JMM spec on any supported hardware platform. If different hardware behaves differently, it is the JVM implementer's job to deal with it ... and make it work. – Stephen C Jun 22 '12 at 13:11
  • Entirely true :-) – Martijn Verburg Jun 22 '12 at 13:44
  • @gnat, +1. But why did you abuse the rollover for your last paragraph? – Pacerier Jun 25 '14 at 16:05
  • @Pacerier I did that because it's more like a personal observation, only tangentially related to the answer – gnat Jun 25 '14 at 16:35
  • @gnat, Hm, shouldn't that be edited with a post scriptum? It seems like an abuse of the rollover, and in any case, small mobile screens would not have that effect anyway. – Pacerier Jun 26 '14 at 18:22
  • @Pacerier well, per my reading, the way it's used here is compliant with [markdown help instructions](http://meta.stackexchange.com/editing-help#spoilers "'To hide a certain piece of text and have it only be visible when a user moves the mouse over it...'") and with [MSE guidance I could find](http://meta.stackexchange.com/questions/114876 "which in brief is currently 'there's no policy'"). However, now that you made me look closer at it, it feels visually inferior and probably is indeed worth deletion. I can't figure how to best edit to address this, need some time to figure – gnat Jun 26 '14 at 18:36
  • 3 more examples of undefined behavior in the standard library are in [`java.util.Arrays`](https://docs.oracle.com/javase/8/docs/api/java/util/Arrays.html): `binarySearch()`, `deepHashCode()`, and `deepEquals()` have undefined behavior in certain cases. – Adam Rosenfield Jun 08 '17 at 00:38
  • Upon reread, the examples in the question (and comments) are mostly about undefined *values* (which is *unspecified* behavior) rather than *undefined behaviors*, which is not quite the same thing. – Mooing Duck Aug 25 '23 at 15:35
10

Off the top of my head, I don't think there is any undefined behaviour in Java, at least not in the same sense as in C++.

The reason for this is that there is a different philosophy behind Java than behind C++. A core design goal of Java was to allow programs to run unchanged across platforms, so is specification defines everything very explicitly.

In contrast, a core design goal of C and C++ is efficiency: there should not be any features (including platform independance) that cost performance even if you don't need them. To this end, the specification deliberately does not define some behaviours because defining them would cause extra work on some platforms and thus reduce performance even for people who write programs specifically for one platform and are aware of all its idiosyncracies.

There's even an example where Java was forced to retroactively introduce a limited form of undefined behaviour for exactly that reason: the strictfp keyword was introduced in Java 1.2 to allow floating point calculations to deviate from following exactly the IEEE 754 standard as the spec had previously demanded, because doing so required extra work and made all floating-point calculations slower on some common CPUs, while actually producing worse results in some cases.

Michael Borgwardt
  • 51,037
  • 13
  • 124
  • 176
  • 2
    I think it's important to note the other main goal of Java: security and isolation. I think this, too, is a reason for the lack of 'undefined' behaviour (as in C++). – K.Steff Jun 22 '12 at 10:21
  • 3
    @K.Steff: Hyper-modern C/C++ is totally unsuitable for anything remotely security related. Given `int x=-1; foo(); x<<=1;` hyper-modern philosophy would favor rewriting `foo` so that any path which doesn't exit must be unreachable. This, if `foo` is `if (should_launch_missiles) { launch_missiles(); exit(1); }` a compiler could (and according to some people should) simplify that to simply `launch_missiles(); exit(1);`. The traditional UB was random code execution, but that used to be bound by the laws of time and causality. New improved UB is bound by neither. – supercat Apr 19 '15 at 14:52
3

Java tries quite hard to exterminate undefined behaviour, precisely because of the lessons of earlier languages. For instance, class-level variables are automatically initialized; local variables are not auto-initialized for performance reasons, but there is sophisticated data-flow analysis to prevent anyone from writing a program that would be able to detect this. References are not pointers, so invalid references cannot exist, and dereferencing null causes a specific exception.

Of course there remain some behaviours that are not fully specified, and you can write unreliable programs if you assume that they are. For instance, if you iterate over a normal (non-sorted) Set, the language guarantees that you will see each element exactly once, but not in which order you will see them. The order might be the same on successive runs, or it might change; or it might stay the same as long as no other allocations occur, or as long as you don't update your JDK, etc. It is near-impossible to get rid of all such effects; for instance, you would have to explicitly order or randomize all Collections operations, and that is simply not worth the small additional un-undefined-ness.

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
  • References are pointers under another name – curiousguy Jun 02 '18 at 06:30
  • @curiousguy - "references" generally are assumed not to permit the use of arithmetic manipulation of their numeric value, which is often allowed for "pointers". The former is therefore a safer construct than the latter; combined with a memory management system that doesn't allow an object's storage to be reused while a valid reference to it exists, references prevent memory use errors. Pointers cannot do so, even when appropriate memory management is used. – Jules Jul 06 '18 at 09:39
  • @Jules Then it's a matter of terminology: you may call one thing a pointer or a reference, and decide to use "reference" in "safe" languages and "pointer" in languages that allow the use of pointer arithmetic and manual memory management. (AFAIK "pointer arithmetic" is only done in C/C++.) – curiousguy Nov 28 '18 at 00:00
2

You have to understand the "Undefined Behavior" and its origin.

Undefined Behavior means a behavior which is not defined by the standards. C/C++ have too many different compiler implementations and additional features. These additional features tied the code to the compiler. This was because there was no centralized language development. So some of the advanced features from some of the compilers became "undefined behaviors".

Whereas in Java the language specification is controlled by Sun-Oracle and there is nobody else trying to make specifications and thus no undefined behaviors.

Edited Specifically answering the Question

  1. Java is free from undefined behaviors because standards were created before compilers
  2. Modern C/C++ compilers have more/less standardized the implementations, but the features implemented before the standardization still remain tagged as "undefined behavior" because ISO kept mum on these aspects.
Sarvex
  • 217
  • 1
  • 3
  • 7
  • 3
    You may be right that there is no UB in Java, but even when one entity control everything, there may be reasons to have UB, so the reason you give doesn't lead to the conclusion. – AProgrammer Jun 22 '12 at 14:09
  • 3
    Besides, both C and C++ are standardized by ISO. While there may be multiple compilers, there's just one standard at a time. – MSalters Jun 22 '12 at 14:48
  • @AProgrammer I am not sure why you missed the conclusion part. **Undefined Behavior** is a direct result of different complier manufactures trying to add **new features** to their product. To explicitly state "**Undefined Behavior** is sequential hazard, when the products precede specification" – Sarvex Jun 22 '12 at 15:46
  • @MSalters Agreed that there is only one standard at a time, but the important question is **When**. When ISO started standardizing C/C++ there were already many different compilers behaving differently already present in the market. So ISO chose the most common subset and left out so-called "advanced features" of the existing compilers. – Sarvex Jun 22 '12 at 15:53
  • 2
    @SarvexJatasra, I don't agree that it is the only source of UB. For instance, one UB is dereferencing dangling pointer and there are good reasons to leave it an UB in any language which hasn't a GC, even if you start your spec now. And those reasons have nothing to do with existing practice or existing compilers. – AProgrammer Jun 22 '12 at 16:07
  • @AProgrammer you don't agree because you are mixing **Undefined Behavior** with **Runtime Bugs**
    A quote from Wikipedia
    _In computer programming, undefined behavior is a feature of some programming languages—most famously C.In these languages, to simplify the specification and allow some flexibility in implementation, the specification leaves the results of certain operations specifically undefined_
    – Sarvex Jun 22 '12 at 16:44
  • 1
    @SarvexJatasra, and if I take the overflow of signed intergers as exemple, is this also a runtime bug? – AProgrammer Jun 22 '12 at 18:23
  • @AProgrammer "Signed Integers Overflow" is undefined behavior in C because different compilers did different things i.e. some compilers reset the signed integer to Zero whereas some other picked the lowest negative number and others simply ignored the operation. Because of these inconsistencies this was marked as Undefined Behavior. – Sarvex Jun 22 '12 at 18:32
  • 2
    @SarvexJatasra, signed overflow is UB because the standard says explicitly so (it is even the example given with the definition of UB). Dereferencing an invalid pointer is also an UB for the same reason, the standard says so. – AProgrammer Jun 22 '12 at 18:37
  • @AProgrammer you are still arguing about **WHAT**, When I am trying to explain **WHY**. Ask yourself why is only signed integer overflow undefined and not unsigned integer and why only in C/C++ – Sarvex Jun 22 '12 at 18:50
  • 1
    I believe signed int overflow in C is only undefined behavior if the result cannot be represented. I can find no normative text in the standard saying signed int overflow is always undefined per definition. However, C11 6.2.5/9 "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type." –  Jun 26 '12 at 11:49
  • As for the topic of undefined behavior, it has (nowadays) nothing to do with different implementations doing different things, that is known either as unspecified- or implementation-defined behavior. Undefined behavior in C could either mean either 1) the standard committee was too meek (avoiding favouring some particular party or vendor) or too lazy to enforce a standard for a particular language mechanism, or 2) the behavior is a blatant bug that the standard can do nothing about, ie divide by zero in runtime. –  Jun 26 '12 at 11:56
  • 1
    @SarvexJatasra: The why is simple, by not specifying certain behaviors, compilers can make optimizations that assume those behaviors never happen, causing faster programs, for whatever the target architecture may be. That's also why they refuse to fix it, removing the undefined behavior would force our programs to go slower on some processors. – Mooing Duck Aug 05 '13 at 23:53
  • 1
    @SarvexJatasra '"Signed Integers Overflow" is undefined behavior in C because different compilers did different things' No, that's not why it's undefined. See [What Every C Programmer Should Know About Undefined Behavior](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html), in particular the section titled _Advantages of Undefined Behavior in C, with Examples_. – bames53 May 01 '14 at 18:06
  • 2
    @bames53: None of the cited advantages would require the level of latitude hypermodern compilers are taking with UB. With the exceptions of out-of-bounds memory accesses and stack overflows, which can "naturally" induce random code execution, I can't think of any useful optimization which would require broader latitude than to say that most UB-ish operations yield indeterminate values (which might behave as though they have "extra bits") and may only have consequences beyond that if an implementation's docs expressly reserve the right to impose such; docs may give "Unconstrained behavior"... – supercat Apr 15 '15 at 20:06
  • ...as the consequence of an action, relieving the implementation of any obligations in that regard, but a program shall be deemed standards-compliant if it will work correctly on all legitimate compilers whose documentation is consistent with the program's documented requirements. – supercat Apr 15 '15 at 20:12
  • 1
    @supercat That would be awful. It's effectively the same as having undefined behavior in the spec, but it would encourage people to actually write such code, and encourage compiler developers to support some such code. Probably with different compilers supporting different, incompatible subsets. It would be a resource drain and cause lots of problems for no benefit that I can see. – bames53 Apr 15 '15 at 20:36
  • 1
    @bames53: What is awful about giving programmers ways to write smaller and faster code, *which is supposed to be the whole point of optimization*? Or allowing programmers to have *existing* code continue to work the way it always had (a concept which underlies C's historical failure to specify things like two's-complement behavior)? Can you offer examples of *real-world-useful* optimizations which could not be achieved by a programmer using a compiler that was limited to having things like `-1<<4` yield an Indeterminate Value rather than UB? I would genuinely like to see some. – supercat Apr 15 '15 at 21:05
  • 1
    @supercat I described what's awful about it: it's effectively the same as UB, because programmers still can't safely write any of that code but they'd be encouraged to do so. It's far, far better to have UB in the spec and then have compilers document things they support, like -fwrapv. As for an example: take this from the linked article: "if the variable is defined to wrap around on overflow, then the compiler must assume that the loop is possibly infinite [...] which then disables these important loop optimizations." This statement is also true if overflow produces an indeterminant value. – bames53 Apr 15 '15 at 21:29
  • 1
    @bames53: I don't follow your last statement. If N is 2147483647, then following the pass where i is 2147483647, it would become Indeterminate Value, and the compiler would be entitled to have the comparison arbitrarily yield true or false as it saw fit. In case my terminology is unclear, I don't mean "an unspecified value" but something much looser. Every calculation or comparison involving Intermediate Value may arbitrarily yield a different result. If any trap representations exist in the same flavor as the Intermediate Value (integer, floating-point, data pointer, or f-pointer), ... – supercat Apr 15 '15 at 21:50
  • ...then Intermediate Value may arbitrarily act as one of them. *VERY* loose semantics, but quite useful if one is writing a function (e.g. a JPEG decoder) which must yield correct output when given valid input, and may yield arbitrary output when given invalid input, but in any case must not exhibit any Undefined Behavior beyond its output. Such requirements are very common, and 1990s-style constraints allow them to be satisfied much more cheaply than would be possible otherwise. – supercat Apr 15 '15 at 21:55
  • @supercat In that case what you're describing is pretty much exactly how current compilers already behave in many cases; e.g., LLVM IR already uses `undef` values rather than, say, assuming code must be dead and eliminating it. This behavior _still permits exactly the kinds of optimizations people who complain about UB hate_. – bames53 Apr 15 '15 at 22:53
  • But your suggestion does still prevent certain optimizations that do depend on fully unconstrained behavior; e.g., eliminating a null pointer check when the pointer is never null. So you've both failed to fix the language for people that want consistent behavior, and have mandated reduced performance for people who want maximum performance. – bames53 Apr 15 '15 at 22:53
  • @bames53: My beef is with the idea that if e.g. code is supposed to call a method if N is less than 128, then shift some value left by N and pass the result to a second method which won't care about its parameter in cases where N is greater than 31, generated code should call the first method even if N was 128 or more. If the purpose of aggressive Undefined-Behavior treatment is to allow optimization, how is that purpose served by forcing programmers to include extra code that the optimizer will likely be unable to remove (since it may have no way of knowing that the called method will... – supercat Apr 15 '15 at 23:10
  • ...ignore its argument). – supercat Apr 15 '15 at 23:11
  • @supercat That example just sounds like the code that figures out to ignore the shift result when N isn't valid shift value should be rearranged; instead of ignoring it after the fact, move it to where it can prevent the shift from taking place at all. – bames53 Apr 15 '15 at 23:36
  • I have seen some reasonable complaints about code necessary to avoid UB: for example an overflow check that relies on the value wrapping around is simple and generically works for different int sizes, whereas checking for overflow by relying on `INT_MAX` is not generic. – bames53 Apr 15 '15 at 23:40
  • @bames53: The given example was intended for simplicity rather than realism, but such issues are frequently encountered when writing many types of performance-critical code. Further, it strikes me that compiler-vendors are using a very blunt tool which is far less effective than would be a better-designed tool, and which has an excessively-high level of collateral damage. Additionally, the examples I've seen to promote the "necessity" of UB for optimization merely suggest that in exchange for throwing away anything resembling safety one can achieve optimizations comparable to 1960s FORTRAN... – supercat Apr 16 '15 at 02:59
  • ... (or, for that matter, a C compiler I used in the 1990s for a TI DSP). Do you disagree that it is common for things like graphic decoders to have requirements that invalid input files won't have consequences beyond either a recognizable trap or arbitrary output, and that testing everything to ensure no sequence of data can generate invalid shifts or integer overflows improves much greater overhead than would be the case if such things merely yielded Indeterminate Values? – supercat Apr 16 '15 at 03:09
1

Java eliminates essentially all the undefined behavior found in C/C++. (For example: Signed integer overflow, division by zero, uninitialized variables, null pointer dereference, shifting more than bit width, double-free, even "no newline at end of source code".) But Java has a few obscure undefined behaviors that are rarely encountered by programmers.

  • Java Native Interface (JNI), a way for Java to call C or C++ code. There are many ways to screw up in JNI, like getting the function signature wrong, making invalid calls to JVM services, corrupting memory, allocating/freeing stuff incorrectly, and more. I have made these mistakes before, and generally the whole JVM crashes when any one thread executing JNI code commits an error.

  • Thread.stop(), which is deprecated. Quote:

    Why is Thread.stop deprecated?

    Because it is inherently unsafe. Stopping a thread causes it to unlock all the monitors that it has locked. (The monitors are unlocked as the ThreadDeath exception propagates up the stack.) If any of the objects previously protected by these monitors were in an inconsistent state, other threads may now view these objects in an inconsistent state. Such objects are said to be damaged. When threads operate on damaged objects, arbitrary behavior can result. This behavior may be subtle and difficult to detect, or it may be pronounced. Unlike other unchecked exceptions, ThreadDeath kills threads silently; thus, the user has no warning that his program may be corrupted. The corruption can manifest itself at any time after the actual damage occurs, even hours or days in the future.

    https://docs.oracle.com/javase/8/docs/technotes/guides/concurrency/threadPrimitiveDeprecation.html

Nayuki
  • 184
  • 2
  • 11