In retrospect, has it been a good idea to use three-valued logic for SQL NULL comparisons?

Question

In SQL, NULL means "unknown value". Thus, every comparison with NULL yields NULL (unknown) rather than TRUE or FALSE.

From a conceptional point of view, this three-valued logic makes sense. From a practical point of view, every learner of SQL has, one time or another, made the classic WHERE myField = NULL mistake or learned the hard way that NOT IN does not do what one would expect when NULL values are present.

It is my impression (please correct me if I am wrong) that the cases where this three-valued logic helps (e.g. WHERE myField IS NOT NULL AND myField <> 2 can be shortened to WHERE myField <> 2) are rare and, in those cases, people tend to use the longer version anyway for clarity, just like you would add a comment when using a clever, non-obvious hack.

Is there some obvious advantage that I am missing? Or is there a general consensus among the development community that this has been a mistake?

PS: I know that this borders on "too broad" and "opinion-based", but I've tried to make it as answerable as possible. It is not meant as a rant, but as a genuine question. — Heinzi, Nov 07 '13 at 22:08
A relational model for databases is based on relational algebra. Its not that its a mistake - its just the way the math under the system works. — , Nov 07 '13 at 22:14
@MichaelT: If you expand on that, it could be a good answer to this question. — FrustratedWithFormsDesigner, Nov 07 '13 at 22:26
@FrustratedWithFormsDesigner I'll think about it... while I have a reasonable grasp of SQL and have touched on relational algebra - the deeper implications of the math are lost on me. — , Nov 07 '13 at 22:30
@MichaelT I don't see anything in relation algebra that would require `NULL = NULL` to be `NULL`. — svick, Nov 07 '13 at 23:39
@svick "Are these two unknown values the same?" "I don't know, they're unknown." — Izkata, Nov 12 '13 at 03:36
The point above by @Izkata is quite important; according to the mathematical definition, it's perfectly correct, and makes sense. Perhaps the naming of `NULL` is misleading (or rather, we have different connotations to null from programming), and it should be something like `UNKNOWN`. E.g. "What is two plus an unknown value?" should now start making more sense. — Daniel B, Nov 12 '13 at 06:32
@DanielB: That was exactly the point of my question: From a mathematical point of view, it all makes perfect sense. However, from a practical point of view, does it really make life easier for developers? — Heinzi, Nov 12 '13 at 06:41
@Heinzi I understand, I wrote a comment instead of an answer specifically because it was slightly tangential to your question (which is a good question, BTW). Clearly, this is a slightly counter-intuitive topic for developers, but my guess is that having the logic firmly grounded in mathematics outweighs the negatives (e.g. proving the correctness of a DB implementation, or allowing the query optimiser to formally reason about execution plan equivalences, etc). — Daniel B, Nov 12 '13 at 07:46

score 3 · Accepted Answer · answered Nov 08 '13 at 01:14

I think the crux of the problem is that as well as "UNKNOWN" it is also used to mean "NOT APPLICABLE" or "ABSENT" e.g. You have a PERSONS table with a SPOUSE_ID. What do you put in their for a single person? In most cases a designer will make this field NULLable to be filled with the partners ID when available and left blank for sad singles and happy bachelors.

In my experience this is actually the most common use for NULLs. So while a comparison of two UNKNOWN values should result in another UNKNOWN; a comparison between two ABSENT values should result in equality -- but SQL does not allow for this.

It would have been trivial to add another extra operator (say "==") to the mass of SQL keywords and operators which would indicate you want 2 nulls to be considered equal.

While I think the relational model is sound and has a long future ahead, I think the mess that is SQL is due for a total rethink. It would be nice if we could start again from the very beginning and have an API based on Codd's original relational algebra.

So if you join PERSONS through SPOUSE_ID in order to get all couples, you will have get all married couples and *every single married to every other single*? I'm not sure how useful that would be. The best way to represent an absent value in a relational database is by just not representing it. E.g. you have a junction table for marriages with two foreign keys to persons. An unmarried person would just not have an entry in this table. — JacquesB, Apr 23 '22 at 22:25

score 1 · Answer 2 · answered Nov 08 '13 at 03:14

The problem here is that NULL isn't a value--it's a nebulous set of values, and you don't know which one it is. Setting equality here is meaningless, because then the values wouldn't be NULL. It enforces the mathematical underpinnings of relational databases. In many ways, it's like asking why infinity = infinity isn't valid.

JacquesB · Answer 3 · 2022-04-23T13:40:29.187

1

In your particular example, it doesn't make a big difference, but in general, this behavior is necessary for queries to give valid results.

Three-valued logic is just an instance of the more general principle that any expression which contains a NULL evaluates to NULL. E.g. NULL - 17 yields NULL.

This happens because NULL means unknown or not-applicable. Let's say you don't know the balance on a given bank account. Then you withdraw $17. What is the balance now? The only reasonable answer is "I still don't know". Giving any number would be wrong.

Even if a database does not contain any NULL's in the stored data, SQL queries can easily return unknown results. A valid query can return an empty set, and a lot of operations, like taking the MAX of a given column, will not have a meaningful result for an empty set. So NULL's are unavoidable, and you need some way to handle them which does not give you misleading output.

Eliminating three-valued logic would mean that any boolean expression would either be true or false, even if the answer is unknowable or meaningless. So you would not be able to distinguish between a valid result and an invalid result, which would make the database pretty useless (if not dangerous).

edited Apr 23 '22 at 13:40

answered Apr 23 '22 at 13:16

JacquesB

57,310
21
127
176

Nearly all functions and operators in SQL should be seen as working only on values, and being lifted for handling absence, a common operation for functional languages and others having an optional type. Thus any absence means the result is absence. – Deduplicator Apr 23 '22 at 19:16
What makes the three-valued logic so vexatious in SQL is the inconsistency of its application. Naturally in 3VL you'd think Null = Null would be Null. Naively (especially to seasoned programmers accustomed to other languages), you might even think it would be True. But in fact it evaluates to False! As does the comparison of any non-null value to Null. But Null <> Null (the inverse of the equality comparison) also evaluates to False! – Steve Apr 23 '22 at 19:27
@Deduplicator, myself I don't find that conception - of Null being the absence of a value - generalises properly. Rather, Null should be seen as a special value (used for various purposes in the system, including missing data, inapplicable data, and default data), and every operator should be regarded as having some sort of special (and often idiosyncratic) handling for that special value. – Steve Apr 23 '22 at 19:50
@Steve: NULL = NULL should indeed evaluate to NULL. In which database do you see it being false? – JacquesB Apr 23 '22 at 19:56
@JacquesB, in any context where a condition is evaluated! I can't recall that NULL = NULL is even valid syntax, in a context that would allow the resulting value to be assigned. – Steve Apr 23 '22 at 20:13
@JacquesB: Indeed NULL = NULL evaluates to NULL. I assume Steve refers to the fact that `WHERE someExpressionEvaluatingToNull` has the same effect *as if* that expression evaluated to False. – Heinzi Apr 23 '22 at 20:29
@JacquesB: About your answer: I disagree with your conclusion (in C#, for example, `null == null` evaluates to true, without the language being either useless or dangerous), but you raise an important point with respect to *consistency*: If `NULL - 17` yields `NULL`, having `NULL = NULL` yield True would special-case the equality operator (which the C# designers deliberately did, but which is by no means an "obvious right choice"). – Heinzi Apr 23 '22 at 20:34
@Heinzi, which database technology are you referring to where that is valid syntax (with the intended meaning, I mean)? And what, in your view, is the difference between evaluating to Null and being treated as False, and evaluating directly to False? – Steve Apr 23 '22 at 20:45
@Heinzi, C# doesn't have special handling for Null = Null, because Null is treated as a value in the first place. For example, a Null pointer is merely a pointer to memory address zero, and if two pointers point to zero then they are equal. It's actually memory address zero that has special handling, in that any attempt to access the memory there (rather than merely referencing its address) causes an exception to be thrown. – Steve Apr 23 '22 at 20:49
@Heinzi: Nulls in C# is completely different than null in SQL (despite having the same name). Nulls in C# does not mean "unknown", it is a distinct value, different from any non-null value. C# does not have any built-in support for unknown values. For example taking Max of an empty list will just throw an exception. – JacquesB Apr 23 '22 at 21:57
@Steve: *And what, in your view, is the difference between [a] evaluating to Null and being treated as False, and [b] evaluating directly to False?* Here's an example where it makes a difference: `WHERE NOT (someExpressionEvaluatingToNull)` yields no rows in case [a] (= what SQL currently does) and all rows in case [b] (because NOT False is True). – Heinzi Apr 24 '22 at 19:37
@JacquesB (and Steven): It's not so clear-cut in C#. Consider, for example, .NET's nullable value types (e.g. `int?` aka `Nullable`). It can be null, but it's not a "null pointer", it's something *very* similar to SQL's nullable data types. Now, in C#, `a == b` will yield `Boolean` true if `a` and `b` are "null" `Nullable`s; whereas in VB.NET `a = b` will yield a `Nullable` with value null. Both are valid choices - VB.NET decided to do it the "SQL way", whereas C# decided otherwise. [It was not an easy decision](https://stackoverflow.com/a/4400231/87698). – Heinzi Apr 24 '22 at 19:49
1

@Heinzi, yes that's a fair point; it's only the overall expression that gets treated as False when evaluated to Null. – Steve Apr 24 '22 at 20:57
@Heinzi, on the C# nullable value types, you're right the picture becomes unclear, and certainly inconsistent with comparing reference types to null. – Steve Apr 24 '22 at 21:08

score 0 · Answer 4 · edited May 23 '17 at 12:40

0

This seems to be a duplicate question... https://stackoverflow.com/questions/7078837/why-doesnt-sql-support-null-instead-of-is-null

I don't think it's a mistake - this behavior is described in the ANSI standard.

Most databases allow you to change behavior of the equality operator.

set ansi_nulls on
if null = null
   print 'this will not print' 
set ansi_nulls off
if null = null  
   print 'this should print'

Also, you can note that many programming languages expect similar semantics when comparing objects against null.

edited May 23 '17 at 12:40

Community

1

answered Nov 07 '13 at 23:56

Bruce H

5
3

1

What other programming languages have similar semantics? In all the other languages I know `something == null` works as expected, you don't need something like `something is null` in them. – svick Nov 08 '13 at 02:12
@svick: Python uses `something is None` because the meaning of `something == None` could be overridden by the type of `something`. – Greg Hewgill Nov 08 '13 at 03:34
@svick Also in Visual Basic you would use "is null" cf http://msdn.microsoft.com/en-us/library/system.string.isnullorempty(v=vs.110).aspx?cs-save-lang=1&cs-lang=vb#code-snippet-1 – Bruce H Nov 12 '13 at 00:28
In Swift, nil == nil, nil != non-nil and non-nil != nil. >, >= etc compare as if all nil values come first (or last, can’t remember). Then you have floating-point with “interesting” semantics for NaN. – gnasher729 Apr 24 '22 at 10:06

score 0 · Answer 5 · answered Nov 08 '13 at 00:27

0

TRUE and FALSE make statements about the content of a value. NULL indicates the complete absence of any value at all. If NULL behaved the same way as FALSE, then you would be unable to account correctly for the absence of data in a query, something that is very important in a database. I think that alone makes NULL different enough from TRUE/FALSE that it merits being handled specially.

answered Nov 08 '13 at 00:27

Evicatos

662
6
12

4

The question isn't “why aren't `NULL` and `FALSE` the same thing?”. It's basically “why was SQL designed so that `myField = NULL` doesn't work the way most people would expect?” – svick Nov 08 '13 at 02:10
1

@svick: a c programmer would expect that to return NULL and for myfield to be NULL afterwards. But since SQL was supposed to be understandable (and written) by random people with no programming experience, one could argue that NULL means "unknown" in the sense of "I don't know". Is this "I don't know" the same as that "I don't know"? I don't know! – Móż Nov 12 '13 at 02:08

In retrospect, has it been a good idea to use three-valued logic for SQL NULL comparisons?

5 Answers5