91

I'm sure designers of languages like Java or C# knew issues related to existence of null references (see Are null references really a bad thing?). Also implementing an option type isn't really much more complex than null references.

Why did they decide to include it anyway? I'm sure lack of null references would encourage (or even force) better quality code (especially better library design) both from language creators and users.

Is it simply because of conservatism - "other languages have it, we have to have it too..."?

zduny
  • 2,623
  • 2
  • 19
  • 24
  • 105
    null is great. I love it and use it every day. – Pieter B May 02 '14 at 12:22
  • 17
    @PieterB But do you use it for the majority of references, or do you want most references not to be null? The argument is not that there shouldn't be nullable data, only that it should be explicit and opt-in. –  May 02 '14 at 12:54
  • 3
    @delnan I don't want it for the majority of my references. But when I use it, it makes things trivially easy. Like with lists with objects. – Pieter B May 02 '14 at 13:04
  • 6
    @PieterB Unfortunately, it also makes bugs trivially easy. – Doval May 02 '14 at 13:06
  • 11
    @PieterB But when the majority should not be nullable, wouldn't it make sense to make null-ability the exception rather than the default? Note that while the usual design of option types is to force explicit checking for absence and unpacking, one can also have the well-known Java/C#/... semantics for opt-in nullable references (use as if not nullable, blow up if null). It would at least prevent some bugs, and make a static analysis that complains about missing null checks much more practical. –  May 02 '14 at 13:09
  • Have you checked the King James programming bible to see if NULL is truly evil? http://kingjamesprogramming.tumblr.com ;) – FrustratedWithFormsDesigner May 02 '14 at 18:03
  • 8
    @GrandmasterB He doesn't have to make is own language - there's already Standard ML, OCaml, and Haskell. There's also Scala and F# which only have `null` for interoperability with Java and .NET respectively. Considering `null` offers no advantages and big problems compared to `Maybe`/`Option` types, "evil" is a fitting term for it. – Doval May 02 '14 at 19:40
  • 21
    WTF is up with you guys? Of all the things that can and do go wrong with software, trying to dereference a null is no problem at all. It ALWAYS generates an AV/segfault and so gets fixed. Is there so much of a bug shortage that you have to worry about this? If so, I have plenty spare, and none of them invoves problems with null references/pointers. – Martin James May 02 '14 at 20:55
  • 1
    @GrandmasterB: The questions in here are reasonable questions. The editorial comments are off-topic and I would encourage the OP to delete them and rather focus on the question being asked rather than on expressing an opinion about a counterfactual world. – Eric Lippert May 02 '14 at 21:22
  • 3
    @MartinJames It's true that there are much worse bugs, but this style of crash is annoying and can be eradicated very easily. Option types provide good bang for the buck; they aren't more complex than pervasive nullability, just different. In contrast, it's not really clear how a language might help preventing more insidious bugs, and such features would probably be more complicated. –  May 02 '14 at 22:48
  • 13
    @MartinJames "It ALWAYS generates an AV/segfault and so gets fixed" - no, no it doesn't. – detly May 03 '14 at 08:46
  • 9
    -1. This is like asking why mathematics has zero, when it causes so many problems for division. – Dawood ibn Kareem May 03 '14 at 09:52
  • 7
    @DavidWallace 0 is not like null at all. 0 is valid member of its type (integers, rationals, real numbers etc.) while null has no type at all - it's a special marker. – zduny May 03 '14 at 10:17
  • 8
    @MartinJames: The fewer invalid states your program has, the fewer ways it can be incorrect. Simple as that. – user541686 May 03 '14 at 10:23
  • 1
    No, "null" is a valid member of its type (references). It refers to "no object" in exactly the same way that "zero cows" refers to no cows. – Dawood ibn Kareem May 03 '14 at 10:31
  • It's not a character is why I use it, leaving me free to use whatever character I want without it being interpreted as null. – JVE999 May 03 '14 at 12:21
  • 3
    @MartinJames Considering that the inventor himself claims the introduction of null pointers to be his biggest mistake and the fact that your claim that it's not a big deal is completely wrong (e.g. the Linux guys had to use aspecial commandline flags in gcc while they tracked down all those - possibly security relevant - bugs due to null referencing before checking - no idea if they still do), do you want to reconsider your statement? ;) – Voo May 03 '14 at 20:28
  • 4
    Null is not evil. If you watch his misleadingly named famous speach "The Billion dollar Mistake", Tony Hoare talks about how allowing *any* variable to be able to hold null was a huge mistake. The alternative - using Options - does *not* in fact get rid of null references. Instead it allows you to specify which variables are allowed to hold null, and which aren't. – B T May 04 '14 at 03:24
  • 'myStruct *thing;' what do you want it pointing at, some random shit on the stack, which may, or may not, point to a valid object. or something that will immediately raise a segfault/AV if dereferenced? Your choice... – Martin James May 04 '14 at 16:12
  • @MartinJames Neither. I want to write applications in a safe language with a decent type system. – Doval May 04 '14 at 17:55
  • @Doval well that's good, me too, right up to the point where I want to do I/O and need to interact with the OS API which, in most cases, requires C-style vars and calls. That means pointers, and they can be unassigned or null. – Martin James May 04 '14 at 23:03
  • Also embedded, also drivers. Those gonna need pointers, which can be null or unassigned. – Martin James May 04 '14 at 23:09
  • I will take objective-c's nil over Java's null (and NullPointerExceptions) and Haskell's Maybe anyday. – Jonathan. May 04 '14 at 23:21
  • @MartinJames That's what a foreign function interface is for. – Doval May 04 '14 at 23:59
  • 1
    "WTF is up with you guys?" -- Intelligence and knowledge. "Of all the things that can and do go wrong with software, trying to dereference a null is no problem at all." -- false. " This is like asking why mathematics has zero, when it causes so many problems for division." -- No, it's nothing like that. " It refers to "no object" in exactly the same way that "zero cows" refers to no cows." -- false. "using Options - does not in fact get rid of null references. Instead it allows you to specify which variables are allowed to hold null, and which aren't." -- false. – Jim Balter Apr 27 '15 at 01:18
  • The existence and use of null isn't actually the problem. The problem with null in languages like java is that ANY VARIABLE can be set to null. Even tho java is statically typed and tries so hard to catch type-related errors, it provides absolutely no way of preventing a variable from holding null during static analysis. This is a huge gaping hole in an otherwise very tight very restrictive type system. Its really an inconsistency in java, and languages don't have to have this inconsistency. In Kotlin, for example, `String? a = null` is fine while `String b = null` is a compiler error. – B T Sep 07 '18 at 00:00
  • Using `null` isn't a sin. The real sin is programming languages that don't produce compile errors when the user tries to dereference a nullable variable without checking that it's not null. Having used `flow` to type check my JavaScript programs for the past few years, I can tell you that I rarely see null pointer errors anymore. The Java compiler is a crime against humanity for not warning/erroring on code that may produce a `NullPointerException`. – Andy Oct 16 '18 at 05:35
  • Actual answer: Null references are *unavoidable* in Java-style OO, because object fields may be assigned in the constructor. This means at any point during the execution of the constructor, some fields might not be assigned yet. Since other methods can be called in the constructor, it is just not possible for the compiler to ensure a field always has been assigned a value before it is accessed. Even the advanced nullability analysis in C# or TypeScript cannot ensure this. But typing fields as optional would be wrong if the field is always assigned after the execution of the constructor. – JacquesB Jul 02 '23 at 13:45

10 Answers10

127

I'm sure designers of languages like Java or C# knew issues related to existence of null references

Of course.

Also implementing an option type isn't really much more complex than null references.

I beg to differ! The design considerations that went into nullable value types in C# 2 were complex, controversial and difficult. They took the design teams of both the languages and the runtime many months of debate, implementation of prototypes, and so on, and in fact the semantics of nullable boxing were changed very very close to shipping C# 2.0, which was very controversial.

Why did they decide to include it anyway?

All design is a process of choosing amongst many subtly and grossly incompatible goals; I can only give a brief sketch of just a few of the factors that would be considered:

  • Orthogonality of language features is generally considered a good thing. C# has nullable value types, non-nullable value types, and nullable reference types. Non-nullable reference types don't exist, which makes the type system non-orthogonal.

  • Familiarity to existing users of C, C++ and Java is important.

  • Easy interoperability with COM is important.

  • Easy interoperability with all other .NET languages is important.

  • Easy interoperability with databases is important.

  • Consistency of semantics is important; if we have reference TheKingOfFrance equal to null does that always mean "there is no King of France right now", or can it also mean "There definitely is a King of France; I just don't know who it is right now"? or can it mean "the very notion of having a King in France is nonsensical, so don't even ask the question!"? Null can mean all of these things and more in C#, and all these concepts are useful.

  • Performance cost is important.

  • Being amenable to static analysis is important.

  • Consistency of the type system is important; can we always know that a non-nullable reference is never under any circumstances observed to be invalid? What about in the constructor of an object with a non-nullable field of reference type? What about in the finalizer of such an object, where the object is finalized because the code that was supposed to fill in the reference threw an exception? A type system that lies to you about its guarantees is dangerous.

  • And what about consistency of semantics? Null values propagate when used, but null references throw exceptions when used. That's inconsistent; is that inconsistency justified by some benefit?

  • Can we implement the feature without breaking other features? What other possible future features does the feature preclude?

  • You go to war with the army you have, not the one you'd like. Remember, C# 1.0 did not have generics, so talking about Maybe<T> as an alternative is a complete non-starter. Should .NET have slipped for two years while the runtime team added generics, solely to eliminate null references?

  • What about consistency of the type system? You can say Nullable<T> for any value type -- no, wait, that's a lie. You can't say Nullable<Nullable<T>>. Should you be able to? If so, what are its desired semantics? Is it worthwhile making the entire type system have a special case in it just for this feature?

And so on. These decisions are complex.

Eric Lippert
  • 45,799
  • 22
  • 87
  • 126
  • 12
    +1 for everything but especially bringing up generics. It's easy to forget there were periods of time in both Java and C#'s history where generics didn't exist. – Doval May 02 '14 at 21:24
  • 2
    Maybe a dumb question (I'm just an IT undergraduate) - but couldn't option type be implemented on syntax level (with CLR not knowing anything about it) as a regular nullable reference that requires "has-value" check before using in code? I believe option types don't need any checks at runtime. – zduny May 02 '14 at 21:31
  • 2
    @mrpyo: Sure, that's a possible implementation choice. None of the other design choices go away, and that implementation choice has many pros and cons of its own. – Eric Lippert May 02 '14 at 21:37
  • But it could be done without generics. Ha! I got you there ;) – zduny May 02 '14 at 23:01
  • 1
    @mrpyo I think forcing a "has-value" check is not a good idea. Theoretically it is a very good idea, but in practice, IMO it would bring all sorts of empty checks, just to satisfy the compiler - like checked exceptions in Java and people fooling it with `catches` that do nothing. I think it is better to let the system blow up instead of continuing operation in a possibly invalid state. – NothingsImpossible May 03 '14 at 00:51
  • 1
    +1 for interoperability. One thing I've come to value in languages is that they *can* interoperate easily when they need to, and null isn't going away there. In addition, I've found that if there's a concept I find unsafe in a language, if the language is rich enough I can often avoid using the concept at least within my codebase by simulating a "better" concept, perhaps like Maybe. I'm all for *me* writing safe code with my own rules vs. *everybody* writing code like me; convention is a powerful tool of safety and quality. – J Trana May 03 '14 at 04:56
  • Another issue is that being able to trap reads of array elements which has not been written would, for some types, impose substantial per-element costs. Having reads of not-yet-written array elements be legal for some types but not others would be icky. Consequently, types stored in arrays pretty much have to have a default value, and for mutable reference types there is no default value that could work better than `null`. – supercat May 03 '14 at 15:55
  • @supercat I don't see the issue. Clearly an array of `Optional` (just using some common syntax) could theoretically be implemented in a way that'd give you exactly the same performance as now. An array of non-nullable references would need a default constructor of some kind, which would come at a cost sure. But since that's something you can't express at all in Java/C# right now I'd think this wouldn't be such a big issue (and I think the mental model would be easy to understand for programmers) – Voo May 03 '14 at 20:42
  • 2
    @voo: Arrays of nonnullable reference type are hard for a lot of reasons. There are many possible solutions and all of them impose costs on different operations. Supercat's suggestion is to track whether an element can legally be read before it is assigned, which imposes costs. Yours is to ensure that an initializer runs on each element before the array is visible, which imposes a different set of costs. So here's the rub: no matter which of these techniques one chooses, someone is going to complain that its not efficient for *their* pet scenario. This is serious points against the feature. – Eric Lippert May 03 '14 at 21:01
  • 1
    @Eric I thought supercat meant we'd also have costs for the `Optional` array (which would map to arrays we have right now). Certainly we have some kind of cost attributed to arrays for nonnullable reference types no doubt, but since that's something we can't express at all right now and the existing version wouldn't be impacted that seems not such a big problem. C# in many situations already picks correctness and clarity over raw performance after all. Anybody who needs that last bit of performance could just use nullable arrays! – Voo May 03 '14 at 21:13
  • `What about in the finalizer of such an object, where the object is finalized because the code that was supposed to fill in the reference threw an exception?` C++ solves that problem by not destructing objects that weren't fully constructed :) – fredoverflow May 04 '14 at 10:02
  • @Voo: Requiring a type system to recognize a difference between "Array of nullable references" from "Array of references that aren't really supposed to be null, but might be anyway" would add complexity. There are probably ways it could be done, but it's not trivial. – supercat May 05 '14 at 03:08
  • Can you expand on a few of these problems? I guess I'm not familiar enough with C# to see the subtleties involved. For example regarding interoperability with null supporting languages it seems obvious to translate such a value with `Maybe` or `Optional`. Regarding the KingOfFrance: All the semantics can just as well be understood with `Maybe`. – Perseids May 05 '14 at 06:37
  • @Perseids: Suppose there are three values of `Maybe`, true, false and null. What is the value of `true | null` ? If null means "I don't know" then the answer should be true because the null is "really" either true or false, and either would produce true. If null means "the question is nonsensical and should not have been asked" then `true | null` should also be null. Since the desired values are different under different interpretations, `Maybe` cannot cover both of them. – Eric Lippert May 05 '14 at 13:47
  • But why not issue a type error in this case? I would argue `Maybe` is not a valid input to `|`. In the same context now we would often encounter a NullPointerException which carries basically the same semantic but is more cumbersome to handle as an exception. – Perseids May 05 '14 at 13:54
  • 2
    @Perseids: And now you see how it goes. This is just one tiny little design problem of thousands, all of which have to be considered carefully. – Eric Lippert May 05 '14 at 13:58
  • 1
    Oh, the items weren't meant to be read as a pro/contra list, but as a list of work that had to be done, details that had to be worked out. Ok, thx. – Perseids May 05 '14 at 14:06
100

Disclaimer: Since I don't know any language designers personally, any answer I give you will be speculative.

From Tony Hoare himself:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

Emphasis mine.

Naturally it didn't seem like a bad idea to him at the time. It's likely that it's been perpetuated in part for that same reason - if it seemed like a good idea to the Turing Award-winning inventor of quicksort, it's not surprising that many people still don't understand why it's evil. It's also likely in part because it's convenient for new languages to be similar to older languages, both for marketing and learning curve reasons. Case in point:

"We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp." -Guy Steele, co-author of the Java spec

(Source: http://www.paulgraham.com/icad.html)

And, of course, C++ has null because C has null, and there's no need to go into C's historical impact. C# kind of superseded J++, which was Microsoft's implementation of Java, and it's also superseded C++ as the language of choice for Windows development, so it could've gotten it from either one.

EDIT Here's another quote from Hoare worth considering:

Programming languages on the whole are very much more complicated than they used to be: object orientation, inheritance, and other features are still not really being thought through from the point of view of a coherent and scientifically well-based discipline or a theory of correctness. My original postulate, which I have been pursuing as a scientist all my life, is that one uses the criteria of correctness as a means of converging on a decent programming language design—one which doesn’t set traps for its users, and ones in which the different components of the program correspond clearly to different components of its specification, so you can reason compositionally about it. [...] The tools, including the compiler, have to be based on some theory of what it means to write a correct program. -Oral history interview by Philip L. Frana, 17 July 2002, Cambridge, England; Charles Babbage Institute, University of Minnesota.[ http://www.cbi.umn.edu/oh/display.phtml?id=343]

Again, emphasis mine. Sun/Oracle and Microsoft are companies, and the bottom line of any company is money. The benefits to them of having null may have outweighed the cons, or they may have simply had too tight a deadline to fully consider the issue. As an example of a different language blunder that probably occurred because of deadlines:

It's a shame that Cloneable is broken, but it happens. The original Java APIs were done very quickly under a tight deadline to meet a closing market window. The original Java team did an incredible job, but not all of the APIs are perfect. Cloneable is a weak spot, and I think people should be aware of its limitations. -Josh Bloch

(Source: http://www.artima.com/intv/bloch13.html)

Doval
  • 15,347
  • 3
  • 43
  • 58
  • 33
    Dear downvoter: how I can improve my answer? – Doval May 02 '14 at 13:38
  • 9
    You didn't actually answer the question; you only provided some quotes about some after-the-fact opinions and some extra hand-waving about "cost." (If null is a billion-dollar mistake, shouldn't the dollars saved by MS and Java by implementing it reduce that debt?) – DougM May 02 '14 at 18:54
  • 30
    @DougM What do you expect me to do, hit up every language designer from the past 50 years and ask him why he implemented `null` in his language? Any answer to this question will be speculative unless it comes from a language designer. I don't know of any that frequent this site besides Eric Lippert. The last part is a red herring for numerous reasons. The amount of 3rd party code written on top of MS and Java's APIs obviously outweighs the amount of code in the API itself. So if your customers want `null`, you give them `null`. You also suppose they've accepted `null` is costing them money. – Doval May 02 '14 at 18:59
  • 4
    If the only answer you can give is speculative, state that clearly in your opening paragraph. ( You asked how you could improve your answer, and I responded. Any parenthetical is merely commentary you can feel free to ignore; that's what parenthesis are for in English, after all.) – DougM May 02 '14 at 19:54
  • 1
    @DougM Fair enough. Done. – Doval May 02 '14 at 19:56
  • 8
    This answer is reasonable; I've added some more considerations in mine. I note that `ICloneable` is similarly broken in .NET; unfortunately this is one place where the shortcomings of Java were not learned from in time. – Eric Lippert May 02 '14 at 21:16
  • It's probably worth noting that your Hoare quote is now twelve years old. I had an OO book that old (it was actually a bit older), and based on what we now know about OO, the book felt like it was 50% complete. – Robert Harvey May 02 '14 at 22:04
  • 1
    @RobertHarvey That's fair. However, Java and C# were both around by then, and it's my opinion that in the decade that's elapsed we've gained more insight into how mainstream OOP languages are *flawed*. (Such is life, of course; time has uncovered design flaws in, say, Haskell as well, which has many merits over mainstream languages.) So if anything, the quote is perhaps more generous than it would be today. – Doval May 02 '14 at 22:45
  • Excellent quotes. And Cloneable is indeed rather broken. – ApproachingDarknessFish May 03 '14 at 05:10
  • 1
    I think the leading disclaimer is out of place, and detracts from the article which is quite adequately supported without it. I'm adding an up-vote to offset @DougM, and suggesting that the disclaimer be removed. – mc0e May 04 '14 at 15:26
  • @mc0e What if I move it to the end of the answer? – Doval May 04 '14 at 16:09
  • 1
    One of the Hoare quotes is 12 years old. The first one is 34 years old! Certainly does nothing to degrade the truthfulness of what he says. – Adam Crossland May 04 '14 at 18:25
  • An actual statement rather than a "disclaimer" would make this a better answer. – DougM May 04 '14 at 20:24
  • "that's what parenthesis are for in English" -- "parenthesis" is singular in English. – Jim Balter Apr 27 '15 at 01:21
32

Null serves a very valid purpose of representing a lack of value.

I will say I'm the most vocal person I know about the abuses of null and all the headaches and suffering they can cause especially when used liberally.

My personal stance is people may use nulls only when they can justify it's necessary and appropriate.

Example justifying nulls:

Date of Death is typically a nullable field. There are three possible situations with date of death. Either the person has died and the date is known, the person has died and the date is unknown, or the person is not dead and therefore a date of death does not exist.

Date of Death is also a DateTime field and doesn't have an "unknown" or "empty" value. It does have the default date that comes up when you make a new datetime which varies based on language utilized, but there is technically a chance that person did in fact die at that time and would flag as your "empty value" if you were to use the default date.

The data would need to represent the situation properly.

Person is dead date of death is known (3/9/1984)

Simple, '3/9/1984'

Person is dead date of death is unknown

So what's best? Null, '0/0/0000', or '01/01/1869' (or whatever your default value is?)

Person is not dead date of death is not applicable

So what's best? Null, '0/0/0000', or '01/01/1869' (or whatever your default value is?)

So lets think each value over...

  • Null, it has implications and concerns you need to be wary of, accidently trying to manipulate it without confirming it's not null first for example would throw an exception, but it also best represents the actual situation... If the person isn't dead the date of death doesn't exist... it's nothing... it's null...
  • 0/0/0000, This could be okay in some languages, and could even be an appropriate representation of no date. Unfortunately some languages and validation will reject this as an invalid datetime which makes it a no go in many cases.
  • 1/1/1869 (or whatever your default datetime value is), the problem here is it gets tricky to handle. You could use that as your lack of value value, except what happens if I want to filter out all my records I don't have a date of death for? I could easily filter out people who actually died on that date which could cause data integrity issues.

The fact is sometimes you Do need to represent nothing and sure sometimes a variable type works well for that, but often variable types need to be able to represent nothing.

If I have no apples I have 0 apples, but what if I don't know how many apples I have?

By all means null is abused and potentially dangerous, but it's necessary at times. It's only the default in many cases because until I provide a value the lack of a value and something needs to represent it. (Null)

Eric J Fisher
  • 648
  • 5
  • 11
  • 41
    `Null serves a very valid purpose of representing a lack of value.` An `Option` or `Maybe` type serves this very valid purpose without bypassing the type system. – Doval May 02 '14 at 19:45
  • Correct, so in a case lets say we have an int variable that we've make clear was an int type, so technically it's not null, it's a data type. But if you look into your int type it's value property is null. So while type is how it should be, it's really just shifting the null. – Eric J Fisher May 02 '14 at 19:51
  • I'm not clear on what you mean by "it's not null, it's a data type. Buf if you like into your int type it's value property is null." – Doval May 02 '14 at 19:58
  • 38
    Nobody is arguing that there shouldn't be an lack-of-value value, they are arguing that values that may be missing should *explicitly* be marked as such, rather than *every* value being potentially missing. –  May 02 '14 at 20:05
  • 2
    I guess RualStorge was talking in relation to SQL, because there are camps that state every column should be marked as `NOT NULL`. My question wasn't related to RDBMS though... – zduny May 02 '14 at 20:13
  • 1
    We're not talking a specific language here. C# and Java were provided as examples, but I am answering considering other languages like PHP, TSQL, etc. and Mpyro is correct I was using TSQL as my example point because it's probably the least accommodating here, but the problem does extend to many other languages. I do believe you *could* get around nulls entirely with a language as modern as C# or Java, but I think that's more fighting around the problem than fixing it. We can handle nulls, Empty, Options, Maybes, etc. End of the day no matter what you call it you'll need to handle it. – Eric J Fisher May 02 '14 at 20:23
  • @RualStorge: no matter what you call it you'll need to handle it – to some degree, sure. But if you make it explicit it becomes way more obvious _how to correctly handle it_. And indeed easier if the language is powerful enough, since option types form a monad, so it can basically be as simple as just changing the signatures to _mention_ there's a lack-of-value possibility; the handling itself is taken care for by the compiler (with the help of some mighty mathematical proofs). – leftaroundabout May 02 '14 at 20:45
  • 6
    +1 for distinguishing between "no value" and "unknown value" – David May 02 '14 at 20:57
  • 1
    @RualStorge The argument is more convincing if you look at it from the other direction. We sure need Nulls or Maybes for some functions, but the real advantage of having a non-null reference type *by-default* lies in all the other functions that *don't* need null values. As it is now (with null references) I have to ask myself with every reference "could the author of the function I got the reference from or the callee under reasonable circumstances put a null in there" (reasonable is in there to exclude unrecoverable error situations). Most of the time the answer is "no", but sometimes – Perseids May 05 '14 at 06:54
  • 1
    the programmer using the reference judges wrong. The advantage of a Maybe type with non-null references is that the programmer responsible for the reference can (and has to) explicitly tell you whether you need to consider the none-value case or not. – Perseids May 05 '14 at 06:57
  • None of these answers show any understanding of the state of the art. Null has been completely removed from the programming model of Rust. Ceylon has null but it is always safe; there is no NullPointerException or equivalent because it isn't possible for it to occur. This is without any loss of power. – Jim Balter Apr 27 '15 at 01:31
  • 2
    Wouldn't it make more sense to separate out a person's state? I.e. A `Person` type has a `state` field of type `State`, which is a discriminated union of `Alive` and `Dead(dateOfDeath : Date)`. – jon hanson Nov 14 '15 at 16:55
  • in the case of relational databases, any column with a nullable field can have that column extracted to another table with a foreign key constraint on the original table. in other words: a 0..1 relation does not imply a need for neither null nor default values. – sara Apr 10 '16 at 18:45
  • Great answer, have been thinking about a practical purpose to having null all day... – Adam Hughes Aug 17 '17 at 20:47
  • 1
    @JimBalter As others said, the null was just shifted around, in case of rust to [`std::option`](https://doc.rust-lang.org/std/option/index.html). Of course, now having the default be non-nullable is extremely useful. – Deduplicator May 21 '19 at 20:09
  • @jon-hanson So, either a date OfDeath or Null? – Deduplicator May 21 '19 at 20:11
  • "As others said, the null was just shifted around" -- as I noted, this completely lacks understanding. std::option is a monad. There is no null in the semantics of the language. – Jim Balter May 22 '19 at 00:28
  • @Deduplicator No, that's self-evidently not what I said. – jon hanson May 22 '19 at 07:24
  • @jon-hanson Yes, you painted it green. Big deal. – Deduplicator May 22 '19 at 08:28
10

I wouldn't go so far as "other languages have it, we have to have it too..." like it's some sort of keeping up with the Joneses. A key feature of any new language is the ability to interoperate with existing libraries in other languages (read: C). Since C has null pointers, the interoperability layer necessarily needs the concept of null (or some other "does not exist" equivalent that blows up when you use it).

The language designer could've chosen to use Option Types and force you to handle the null path everywhere that things could be null. And that almost certainly would lead to less bugs.

But (especially for Java and C# due to the timing of their introduction and their target audience) using option types for this interoperability layer would likely have harmed if not torpedoed their adoption. Either the option type is passed all the way up, annoying the hell out of C++ programmers of the mid to late 90's - or the interoperability layer would throw exceptions when encountering nulls, annoying the hell out of C++ programmers of the mid to late 90's...

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • 3
    The first paragraph doesn't make sense to me. Java does not have C interop in the shape you suggest (there's JNI but it already jumps through a dozen hoops for *everything* pertaining to references; plus it's rarely used in practice), same for other "modern" languages. –  May 02 '14 at 13:00
  • @delnan - sorry, I am more familiar with C#, which does have this sort of interop. I rather assumed that many of the foundational Java libraries use JNI at the bottom as well. – Telastyn May 02 '14 at 13:06
  • 7
    You make a good argument for allowing null, but you can still *allow* null without *encouraging* it. Scala is a good example of this. It can seamlessly interoperate with Java apis that use null, but you're encouraged to wrap it in an `Option` for use within Scala, which is as easy as `val x = Option(possiblyNullReference)`. In practice, it doesn't take very long for people to see the benefits of an `Option`. – Karl Bielefeldt May 02 '14 at 13:28
  • @KarlBielefeldt - a good point. C# eventually put in option types for structs, and I wouldn't be surprised if they went that way if initial development weren't done later. – Telastyn May 02 '14 at 13:30
  • 1
    Option types go hand-in-hand with (statically verified) pattern matching, which C# unfortunately doesn't have. F# does though, and it's wonderful. – Steven Evers May 02 '14 at 18:25
  • 1
    @SteveEvers It's possible to fake it using an abstract base class with private constructor, sealed inner classes, and a `Match` method that takes delegates as arguments. Then you pass lambda expressions to `Match` (bonus points for using named arguments) and `Match` calls the right one. – Doval May 02 '14 at 18:50
  • @Doval Do you happen to have a link or something? I tried doing something similar and it turned out kinda nasty. Yours sounds like it might not be so bad. – Steven Evers May 02 '14 at 20:31
  • @SteveEvers I can't take credit for it, but [here you go](http://programmers.stackexchange.com/a/228125/116461). Replace `Either = Left | Right` with the ADT of your choice. The missing detail is that the subclasses need to be `sealed` inner classes, and the base class needs to have a private constructor. That ensures no one can add additional subclasses and break it. – Doval May 02 '14 at 20:37
9

First of all, I think we can all agree that a concept of nullity is necessary. There are some situations where we need to represent the absence of information.

Allowing null references (and pointers) is only one implementation of this concept, and possibly the most popular although it is know to have issues: C, Java, Python, Ruby, PHP, JavaScript, ... all use a similar null.

Why ? Well, what's the alternative ?

In functional languages such as Haskell you have the Option or Maybe type; however those are built upon:

  • parametric types
  • algebraic data types

Now, did the original C, Java, Python, Ruby or PHP support either of those features ? No. Java's flawed generics are recent in the history of the language and I somehow doubt the others even implement them at all.

There you have it. null is easy, parametric algebraic data types are harder. People went for the simplest alternative.

Matthieu M.
  • 14,567
  • 4
  • 44
  • 65
  • +1 for "null is easy, parametric algebraic data types are harder." But I think it wasn't so much an issue of parametric typing and ADTs being harder; it's just that they're not perceived as necessary. If Java had shipped without an object system, on the other hand, it would've flopped; OOP was a "showstopping" feature, in that if you didn't have it, no one was interested. – Doval May 03 '14 at 16:07
  • @Doval: well, OOP might have been necessary for Java, but it was not for C :) But it's true that Java aimed at being simple. Unfortunately people seem to assume that a simple language leads to simple programs, which is kinda strange (Brainfuck is a very simple language...), but we certainly agree that complicated languages (C++...) are not a panacea either even though they can be incredibly useful. – Matthieu M. May 03 '14 at 17:39
  • 1
    @MatthieuM.: Real systems are complex. A well-designed language whose complexities match the real-world system being modeled can allow the complex system to be modeled with simple code. Attempts to oversimplify a language simply push the complexity onto the programmer who's using it. – supercat May 08 '14 at 13:43
  • @supercat: I could not agree more. Or as Einstein is paraphrased: "Make everything as simple as possible, but not simpler." – Matthieu M. May 08 '14 at 14:23
  • @MatthieuM.: Einstein was wise in many ways. Languages which try to assume "everything is an object, a reference to which may be stored in `Object`" fail to recognize that practical applications need unshared mutable objects and sharable immutable objects (both of which should behave like values), as well as sharable and unsharable entities. Using a single `Object` type for everything doesn't eliminate the need for such distinctions; it merely makes it harder to use them correctly. – supercat May 08 '14 at 15:07
6

Null/nil/none itself is not evil.

If you watch his misleadingly named famous speach "The Billion dollar Mistake", Tony Hoare talks about how allowing any variable to be able to hold null was a huge mistake. The alternative - using Options - does not in fact get rid of null references. Instead it allows you to specify which variables are allowed to hold null, and which aren't.

As a matter of fact, with modern languages that implement proper exception handling, null dereference errors aren't any different than any other exception - you find it, you fix it. Some alternatives to null references (the Null Object pattern for example) hide errors, causing things to silently fail until much later. In my opinion, its much better to fail fast.

So the question is then, why do languages fail to implement Options? As a matter of fact, the arguably most popular language of all time C++ has the ability to define object variables that cannot be assigned NULL. This is a solution to the "null problem" Tony Hoare mentioned in his speech. Why does the next most popular typed language, Java, not have it? One might ask why it has so many flaws in general, especially in its type system. I don't think you can really say that languages systematically make this mistake. Some do, some don't.

B T
  • 340
  • 2
  • 9
  • 1
    One of the biggest strengths of Java from an implementation perspective, but weaknesses from a language perspective, is that there is only one non-primitive type: the Promiscuous Object Reference. This enormously simplifies the runtime, making possible some extremely lightweight JVM implementations. That design, however, means that every type must have a default value, and for a Promiscuous Object Reference the only possible default is `null`. – supercat May 05 '14 at 03:05
  • Well, one *root* non-primitive type at any rate. Why is this a weakness from a language perspective? I don't understand why this fact requires that every type have a default value (or conversely why multiple root types would allow types to not have a default value), nor why that is a weakness. – B T May 05 '14 at 07:33
  • What other kind of non-primitive could a field or array element hold? The weakness is that some references are used to encapsulate identity, and some to encapsulate the values contained within the objects identified thereby. For reference-type variables used to encapsulate identity, `null` is the only sensible default. References used to encapsulate value, however, could have a sensible default behavior in cases where a type would have or could construct a sensible default instance. Many aspects of how references should behave depend upon whether and how they encapsulate value, but... – supercat May 05 '14 at 14:56
  • ...the Java type system has no way of expressing that. If `foo` holds the only reference to an `int[]` containing `{1,2,3}` and code wants `foo` to hold a reference to an `int[]` containing `{2,2,3}`, the fastest way to achieve that would be to increment `foo[0]`. If code wants to let a method know that `foo` holds `{1,2,3}`, the other method won't modify the array nor persist a reference beyond the point where `foo` would want to modify it, the fastest way to achieve that would be to pass a reference to the array. If Java had an "ephemeral read-only reference" type, then... – supercat May 05 '14 at 15:05
  • ...the array could be passed safely as an ephemeral reference, and a method which wanted to persist its value would know that it needed to copy it. In the absence of such a type, the only ways to safely expose the contents of an array are to either make a copy of it or encapsulate it in an object created just for that purpose. – supercat May 05 '14 at 15:22
  • Ah i see, so you're not saying that requiring values to have a default value is bad in any way. You're saying allowing only one default value (null) is bad. – B T May 05 '14 at 20:34
  • My point was not so much that having only one default value is bad, but rather that having `null` as the only default value is a necessary consequence of having a single Promiscuous Object Reference type. – supercat May 06 '14 at 01:49
  • Rust has no null at all and Ceylon has it as a separate type that can be used to compose "reference or null" types but null can never be the value of a reference ... Ceylon has no NullPointerException because such exceptions aren't semantically possible. And these languages are no less powerful for not having null ... but their designs reflect good, modern understanding of type theory. "Instead it allows you to specify which variables are allowed to hold null, and which aren't." -- this is quite wrong ... Option is a union type; no reference variable ever has a value of "Nothing". – Jim Balter Apr 27 '15 at 01:48
4

Because programming languages are generally designed to be practically useful rather than technically correct. The fact is that null states are a common occurrence due to either bad or missing data or a state that has not yet been decided. The technically superior solutions are all more unwieldy than simply allowing null states and sucking up the fact that programmers make mistakes.

For example, if I want to write a simple script that works with a file, I can write pseudocode like:

file = openfile("joebloggs.txt")

for line in file
{
  print(line)
}

and it will simply fail if joebloggs.txt doesn't exist. The thing is, for simple scripts that's probably okay and for many situations in more complex code I know it exists and the failure won't happen so forcing me to check wastes my time. The safer alternatives achieve their safety by forcing me to deal correctly with the potential failure state but often I don't want to do that, I just want to get on.

Jack Aidley
  • 2,954
  • 1
  • 15
  • 18
  • 14
    And here you gave an example of what's exactly wrong with nulls. Properly implemented "openfile" function should throw an exception (for missing file) that would stop execution right there with exact explanation of what happened. Instead if it returns null it propagates further (to `for line in file`) and throws meaningless null reference exception, which is OK for such a simple program but causes real debugging problems in much more complex systems. If nulls didn't exist, designer of "openfile" wouldn't be able to make this mistake. – zduny May 02 '14 at 17:41
  • 2
    +1 for "Because programming languages are generally designed to be practically useful rather than technically correct" – Martin Ba May 02 '14 at 17:53
  • 2
    Every option type I know of allows you to do the failing-on-null with a single short extra method call (Rust example: `let file = something(...).unwrap()`). Depending on your POV, it's an easy way to not handle errors or a succinct assertion that null cannot occur. The time wasted is minimal, and you save time in other places because you don't have to figure out whether something *can* be null. Another advantage (which may by itself be worth the extra call) is that you *explicitly* ignore the error case; when it fails there is little doubt what went wrong and where the fix needs to go. –  May 02 '14 at 18:14
  • 5
    @mrpyo Not all languages support exceptions and/or exception handling (a la try/catch). And exceptions can be abused as well -- "exception as flow control" is a common anti-pattern. This scenario -- a file doesn't exist -- is AFAIK the most frequently cited example of that anti-pattern. It would appear you're replacing one bad practice with another. – David May 02 '14 at 20:22
  • @David Than you can use `?file` (option type) which forces you to to check if value is there, and handle the opposite case. Choose the one you prefer, both solve potential problems. – zduny May 02 '14 at 20:28
  • Also if in your case file might not exist you should write `if (fileexists("joebloggs.txt")) { file = openfile("joebloggs.txt"; ... }` and not use `try ... catch` therefore not using exceptions as flow control... – zduny May 02 '14 at 20:36
  • "Because programming languages are generally designed to be practically useful rather than technically correct.": A technically correct language is more useful than one that is not: It gives you less chances of making mistakes. This is an example of the more general myth that in the real world you (always) need some kind of hack in order to do useful things. – Giorgio May 02 '14 at 20:45
  • @mrpyo Not all languages support option types either. – David May 02 '14 at 20:58
  • 9
    @mrpyo `if file exists { open file }` suffers from a race condition. The only reliable way to know if opening a file will succeed is to try opening it. –  May 02 '14 at 22:34
  • @delnan: That `.unwrap()` is exactly the kind of thing I'm talking about. I know plenty of programmers who dislike languages like Ruby because you have to go to the lengths of typing `end`. When writing Python it irritates me that I have to type `+= 1` rather than `++`. The extra typing is, most likely, worth it in the majority of cases but it doesn't feel like it until you've got a really solid grasp of programming and, frankly, many programmers - even professional programmers - don't. – Jack Aidley May 02 '14 at 23:02
  • 1
    Frankly, "technically correct" means little more than "does things how i think they should be done". – cHao May 03 '14 at 00:57
  • "The technically superior solutions are all more unwieldy " -- No, they aren't; both Rust and Ceylon are counterexamples. – Jim Balter Apr 27 '15 at 01:33
4

There are clear, practical uses of the NULL (or nil, or Nil, or null, or Nothing or whatever it is called in your preferred language) pointer.

For those languages that does not have an exception system (e.g. C) a null pointer can be used as a mark of error when a pointer should be returned. For example:

char *buf = malloc(20);
if (!buf)
{
    perror("memory allocation failed");
    exit(1);
}

Here a NULL returned from malloc(3) is used as a marker of failure.

When used in method/function arguments, it can indicate use default for the argument or ignore the output argument. Example below.

Even for those languages with exception mechanism, a null pointer can be used as indication of soft error (that is, errors that is recoverable) especially when the exception handling is expensive (e.g. Objective-C):

NSError *err = nil;
NSString *content = [NSString stringWithContentsOfURL:sourceFile
                                         usedEncoding:NULL // This output is ignored
                                                error:&err];
if (!content) // If the object is null, we have a soft error to recover from
{
    fprintf(stderr, "error: %s\n", [[err localizedDescription] UTF8String]);
    if (!error) // Check if the parent method ignored the error argument
        *error = err;
    return nil; // Go back to parent layer, with another soft error.
}

Here, the soft error does not cause the program to crash if not caught. This eliminates the crazy try-catch like Java have and have a better control in program flow as soft errors are not interrupting (and the few remaining hard exceptions are usually not recoverable and left uncaught)

Maxthon Chan
  • 157
  • 2
  • 5
    The problem is that there's no way to distinguish variables which should never contain `null` from those that should. For example, if I want a new type that contains 5 values in Java, I could use an enum, but what I get is a type that can hold 6 values (the 5 I wanted + `null`). It's a flaw in the type system. – Doval May 02 '14 at 17:51
  • @Doval If that is the situation just assign NULL a meaning (or if you have a default, treat it as a synonym of the default value) or use the NULL (which never should appear in the first place) as a marker of soft error (i.e. error but at least not crashing just yet) – Maxthon Chan May 02 '14 at 17:59
  • 1
    @MaxtonChan `Null` can only be assigned a meaning when the values of a type carry no data (e.g. enum values). As soon as your values are anything more complicated (e.g. a struct), `null` can't be assigned a meaning that makes sense for that type. There is no way to use a `null` as a struct or a list. And, again, the problem with using `null` as an error signal is that we can't tell what might return null or accept null. Any variable in your program could be `null` unless you're extremely meticulous to check every single one for `null` before every single use, which no one does. – Doval May 02 '14 at 18:06
  • 1
    @Doval: There would be no particular inherent difficulty in having an immutable reference type regard `null` as a usable default value (e.g. have the default value of `string` behave as an empty string, the way it did under the preceding Common Object Model). All that would have been necessary would have been for languages to use `call` rather than `callvirt` when invoking non-virtual members. – supercat May 03 '14 at 15:58
  • @supercat That's a good point, but now don't you need to add support for distinguishing between immutable and non-immutable types? I'm not sure how trivial that is to add to a language. – Doval May 03 '14 at 16:04
  • @Doval: The types whose methods can sensibly work with `null` should use non-virtual calls and implement sensible behavior when `this` is null. Those which can't, shouldn't. – supercat May 03 '14 at 16:34
  • @Doval: Incidentally, while having immutable types include default values wouldn't require special Framework "immutable type" support, designing in support for such types could offer huge benefits. For example, `ImmutableObject` could include an `Equate` method which, if given a reference to another object of the exact same type, would arrange things so that on the next GC cycle, only one object would be kept, and all references to either of the orginal objects would refer to the one that was kept. Unlike interning, which involves comparing objects to things with which they might be equal... – supercat May 03 '14 at 18:47
  • ...the aforementioned `Equate` approach would operate upon things which had already been discovered to be equal. – supercat May 03 '14 at 18:48
  • @Doval Every class can be assigned a default value - the soft error I mentioned multiple times is one, and empty string is another. I work with Objective-C extensively and there is no non-virtual in Objective-C, and Objective-C ignores method calls to `nil` and returns, yet again, `nil`. – Maxthon Chan Jun 10 '14 at 18:08
  • @MaxthonChan I'm not convinced every type can have a *sensible* default value. If I declare an enum for colors (i.e. `Red | Green | Blue`) which color is the sensible default? What is the sensible default for function types like `(int * int) -> int`? And why would I ever want the language to *silently default* to *doing nothing*? That's a recipe for masking bugs. – Doval Jun 10 '14 at 18:34
  • @Doval Any nonsense value can be treated as a soft error. The "silent default to do nothing" behaviour of Objective-C is a feature as it can cut your coding in half, eliminating most (if not all) null checks. – Maxthon Chan Jun 10 '14 at 18:49
  • @MaxthonChan It's not a feature when I neither want nonsense values nor noop behavior. If you want an extra value in your type, you should opt into it. If you want to ignore a certain value, you should opt into it. Anything else is problematic. – Doval Jun 10 '14 at 18:55
4

There are two related, but slightly different issues:

  1. Should null exist at all? Or should you always use Maybe<T> where null is useful?
  2. Should all references be nullable? If not, which should be the default?

    Having to explicitly declare nullable reference types as string? or similar would avoid most (but not all) of the problems null causes, without being too different from what programmers are used to.

I at least agree with you that not all references should be nullable. But avoiding null is not without its complexities:

.NET initializes all fields to default<T> before they can first be accessed by managed code. This means that for reference types you need null or something equivalent and that value types can be initialized to some kind of zero without running code. While both of these have severe downsides, the simplicity of default initialization may have outweighed those downsides.

  • For instance fields you can work around this by requiring initialization of fields before exposing the this pointer to managed code. Spec# went this route, using different syntax from constructor chaining compared with C#.

  • For static fields ensuring this is harder, unless you pose strong restrictions on what kind of code may run in a field initializer since you can't simply hide the this pointer.

  • How to initialize arrays of reference types? Consider a List<T> which is backed by an array with a capacity larger than the length. The remaining elements need to have some value.

Another problem is that it doesn't allow methods like bool TryGetValue<T>(key, out T value) which return default(T) as value if they don't find anything. Though in this case it's easy to argue that the out parameter is bad design in the first place and this method should return a discriminating union or a maybe instead.

All of these problems can be solved, but it's not as easy as "forbid null and all is well".

CodesInChaos
  • 5,697
  • 4
  • 19
  • 26
  • The `List` is IMHO the best example, because it would require either that every `T` have a default value, that every item in the backing store be a `Maybe` with an extra "isValid" field, even when `T` is a `Maybe`, or that the code for the `List` behave differently depending upon whether `T` is itself a nullable type. I would consider initialization of the `T[]` elements to a default value to be the least evil of those choices, but it of course means that the elements need to *have* a default value. – supercat May 03 '14 at 15:51
  • Rust follows point 1 -- no null at all. Ceylon follows point 2 -- non-null by default. References that can be null are explicitly declared with a union type that includes either a reference or null, but null can never be the value of a plain reference. As a result, the language is completely safe and there's no NullPointerException because it isn't semantically possible. – Jim Balter Apr 27 '15 at 01:36
3

Most useful programming languages allow data items to be written and read in arbitrary sequences, such that it will often not be possible to statically determine the order in which reads and writes will occur before a program is run. There are many cases where code will in fact store useful data into every slot before reading it, but where proving that would be difficult. Thus, it will often be necessary to run programs where it would be at least theoretically possible for code to attempt to read something which has not yet been written with a useful value. Whether or not it is legal for code to do so, there's no general way to stop code from making the attempt. The only question is what should happen when that occurs.

Different languages and systems take different approaches.

  • One approach would be to say that any attempt to read something that has not been written will trigger an immediate error.

  • A second approach is to require code to supply some value in every location before it would be possible to read it, even if there would be no way for the stored value to be semantically useful.

  • A third approach is to simply ignore the problem and let whatever would happen "naturally" just happen.

  • A fourth approach is to say that every type must have a default value, and any slot which has not been written with anything else will default to that value.

Approach #4 is vastly safer than approaach #3, and is in general cheaper than approaches #1 and #2. That then leaves the question of what the default value should be for a reference type. For immutable reference types, it would in many cases make sense to define a default instance, and say that the default for any variable of that type should be a reference to that instance. For mutable reference types, however, that wouldn't be very helpful. If an attempt is made to use a mutable reference type before it has been written, there generally isn't any safe course of action except to trap at the point of attempted use.

Semantically speaking, if one has an array customers of type Customer[20], and one attempts Customer[4].GiveMoney(23) without having stored anything to Customer[4], execution is going to have to trap. One could argue that an attempt to read Customer[4] should trap immediately, rather than waiting until code attempts to GiveMoney, but there are enough cases where it's useful to read a slot, find out that it doesn't hold a value, and then make use of that information, that having the read attempt itself fail would often be a major nuisance.

Some languages allow one to specify that certain variables should never contain null, and any attempt to store a null should trigger an immediate trap. That is a useful feature. In general, though, any language which allows programmers to create arrays of references will either have to allow for the possibility of null array elements, or else force the initialization of array elements to data which cannot possibly be meaningful.

supercat
  • 8,335
  • 22
  • 28
  • Wouldn't a `Maybe`/`Option` type solve the problem with #2, since if you don't have a value for your reference *yet* but will have one in the future, you can just store `Nothing` in a `Maybe `? – Doval May 03 '14 at 03:51
  • @Doval: No, it wouldn't solve the problem -- at least, not without introducing null references all over again. Should a "nothing" act like a member of the type? If so, which one? Or should it throw an exception? In which case, how are you any better off than simply using `null` correctly/sensibly? – cHao May 03 '14 at 14:02
  • @Doval: Should the backing type of a `List` be a `T[]` or a `Maybe`? What about the backing type of a `List>`? – supercat May 03 '14 at 14:22
  • @supercat I'm not sure how a backing type of `Maybe` makes sense for `List` since `Maybe` holds a single value. Did you mean `Maybe[]`? – Doval May 03 '14 at 15:19
  • @cHao `Nothing` can only be assigned to values of type `Maybe`, so it's not quite like assigning `null`. `Maybe` and `T` are two distinct types. – Doval May 03 '14 at 15:21
  • @Doval: Correct, `Maybe[]`. Can you see any good way to deal with the fact that the entire array must be created before sensible values can be known for its contents, other than by requiring that the array-element type to have a default value? – supercat May 03 '14 at 15:37
  • @supercat No, not particularly. I'm just thinking out loud, and have been programming in `Standard ML` as of late. The approach there is that the array constructors take either the default value, or a function that takes the array index and returns the value to initialize that index with. If I can't do either, I make it an array of `'a option` instead of an array of `'a`. And if I still don't like that, I just accept that an array is fundamentally the wrong choice - if I want to add to a collection later, I should pick a collection that I can grow. I'm not sure how efficient that approach is, – Doval May 03 '14 at 15:45
  • ...but to me it makes the most sense from a semantics and correctness point of view. If the performance later turns out to be bad I can always rewrite the critical code later. – Doval May 03 '14 at 15:46
  • @Doval: What would an efficiently-expandable collection type use for backing storage if not a slightly-over-allocated array? I guess maybe the system array type could have separate "allocated" and "valid" sizes, but I don't know of implementations that work that way. – supercat May 03 '14 at 16:41
  • +1 for enumerating all the possible ways the problem can be solved – B T May 06 '14 at 03:57
  • @BT: Probably not all of them (there are likely others I haven't thought of), but I'm glad you like the answer. It's easy to require all variables to be written with "something" before they can be read, but it's not possible to to define something meaningful to write them with; of the meaningless things one could write, `null` has the advantage of being obviously meaningless. – supercat May 08 '14 at 13:50
  • "No, it wouldn't solve the problem -- at least, not without introducing null references all over again." -- Such comments show a complete lack of understanding of type theory. – Jim Balter Apr 27 '15 at 01:41
  • @JimBalter: How is a `Maybe` which either holds a non-nullable reference to a `T` or doesn't, different from a reference that can either identify a `T` or be null? If one wishes to duplicate the state of an array which has had some but not all items written, not necessarily in sequential order, having a type which can hold the state of any numbered element, empty or not, would seem a very useful thing to have, and using a combination of a reference and a flag seems more expensive than using a nullable reference. – supercat Apr 27 '15 at 02:11
  • Your question is probably meant to be rhetorical, but it just reveals a complete failure to understand or even attempt to understand -- all too common among C programmers who have a good grasp of the hardware level but fail at abstraction. First, on the expense: optimized implementations avoid the extra flag by representing the Nothing case of Maybe with an invalid address -- null is a good candidate -- so the representation is identical. But while null is a legal but invalid value for a reference type, so it can be dereferenced, resulting in undefined behavior, ... – Jim Balter Apr 27 '15 at 17:34
  • ... neither Nothing nor Maybe is a reference type and so cannot be dereferenced. In order to extract a T from Maybe you must test the type first ... this is a compile-time restriction imposed by the semantics of the language and the Maybe type. Thus, it is impossible to dereference null (Nothing). By declaring something as a Maybe rather than a reference, you have said that this is a thing that might be Nothing, so it is necessary to test the type to see if it is a reference before using it as one. But with T* foo, there is no way to declare which pointers might hold null ... – Jim Balter Apr 27 '15 at 17:35
  • ... and which might not, so it takes programmer discipline to always check for null in the right places, and invariably this discipline fails. With Maybe, the compiler/language semantics enforces the discipline. The generated code is the same, but Maybe is safe while nullable references are not. This is what type systems can do for you. You should go learn about it. – Jim Balter Apr 27 '15 at 17:35
  • @JimBalter: What would you think of having distinct kinds of assignment operators, and allowing parameters to indicate which type of "assignment" should be used when passing them? One kind of assignment operator would allow null to be regarded like any other reference, and the other would assert that the reference in question is not null and trap if it is? – supercat Apr 27 '15 at 17:36
  • Who wants traps? There are far more general mechanisms ... a precondition on a function/method that asserts that a parameter isn't null lets you take whatever action you want if it is. But for nullable references, the combination of Maybe and non-nullable references is far better because it makes it semantically impossible to have a null where it shouldn't be. And Maybe is just a specific example of a disjunctive type ... take a look at Ceylon, which makes powerful use of them. – Jim Balter Apr 27 '15 at 17:39
  • http://ceylon-lang.org/documentation/spec/html/typesystem.html – Jim Balter Apr 27 '15 at 17:44
  • @JimBalter: Traps are useful for the scenario where one wishes to ensure that a piece of code *will not run* when a particular invariant doesn't hold, but where one has no particular means of handling the situation beyond indicating that an operation couldn't be performed because of a broken invariant. Otherwise, I'd say that while it may be useful to have types that must be initialized when created and could only be written using a trap-if-null assignment (or, for parameters, trapping if null when they are passed), for places where a variable, although nullable, is supposed to be non-null... – supercat Apr 27 '15 at 17:54
  • ...having an assignment operator which would trap immediately if that isn't the case, in addition to the one which would let nulls pass through, would help make such intention clear, and provide fail-fast behavior if the intention is violated. – supercat Apr 27 '15 at 17:55
  • @JimBalter: The intersection and union types sound very powerful, though at a brief glance I'm not quite sure how they handle binding in case of overlapped functionality. Were they not required to be commutative, ordering could be used to establish ranking, but otherwise I'm not quite clear how conflicts would get resolved. – supercat Apr 27 '15 at 17:59