75

I read an article on BBC. One of the examples they said was that people with surname 'Null' are having problems with entering their details in some websites.

No explanation is given about the error they are facing.

But as far as I know the string 'Null' and the actual Null value is completely different (from a database point of view).

Why would this cause problems in a database?

Nitish
  • 779
  • 1
  • 5
  • 9
  • 1
    [This article](https://medium.com/@blakeross/mr-fart-s-favorite-colors-3177a406c775#.brhmo58nb) is an article that is quite relevant to this and good, too. – enderland Mar 25 '16 at 13:40
  • 2
    This is a somewhat famous blog article about assumptions that programmers make about names, written by one of the people quoted in that BBC article: http://kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ – Jörg W Mittag Mar 25 '16 at 17:00
  • 12
    [Relevant xkcd](https://xkcd.com/327/) – David says Reinstate Monica Mar 25 '16 at 18:03
  • 2
    See also: http://stackoverflow.com/questions/4456438/how-do-i-correctly-pass-the-string-null-an-employees-proper-surname-to-a-so – Foon Mar 25 '16 at 18:55
  • 5
    The first time I saw [this guy](http://i.imgur.com/RW1F8b4.jpg) on TV I assumed it was a database bug. Then I found out it's actually his name. – Nate Eldredge Mar 25 '16 at 23:20
  • I'm voting to close this question as off-topic because the entire premise is false, plain and simple. `null != "null"` Barring a deliberate sql injection attack on a poorly written application it is not a problem and is definitely not a ubiquitous database problem. –  Mar 26 '16 at 18:16
  • 3
    @JarrodRoberson How can you say the "entire premise is false", given the description of the issues faced by "Jennifer Null" and the like-named in the link the OP posted? It's a real issue that faces real end-users. – Gort the Robot Mar 26 '16 at 20:44
  • 1
    @StevenBurnap - if you have to ask this question you will not understand the answer, which is in my comment. the keyword `null` is not the same thing as the `String` with the 4 bytes `"null"`. But here goes; That is not how databases compare things. `null` is a special value that **requires** a special keyword to compare against in `SQL`. `IS NULL` vs using the `=` which compares non-null things. So there is not a single existing RDBMS that uses the SQL language that would have the problem with someone's name being entered in as a `String` with the value of `"Null'". –  Mar 26 '16 at 21:02
  • Meet Uncle Dev. He's looking after our finances so just send all the bills to him... – user_1818839 Mar 26 '16 at 21:07
  • 2
    @JarrodRoberson Right, but there are many processes that happen before the value gets there that "cause problems in many databases". This is a big issue that clearly some writing production software do not understand, causing bugs that real people suffer from. As such, it is something that ought to be answered in a way that people searching google can hit. " if you have to ask this question you will not understand the answer" is a patronizing way of looking at new users. – Gort the Robot Mar 26 '16 at 23:14
  • **irregardless** of what happens before a value gets to the database this is **not** a problem in **any** databases period. Sql injection is not what they are talking about either. So this is a fundamentally flawed question because it is based on a precondition that just does not exist. –  Mar 27 '16 at 03:34
  • @enderland, A nice article. I should add that the biological principle of recapitulation applies here, too. What an individual engineer learns about user inputs during his lifetime (ontogeny), the industry has learned during its (phylogeny). The answers and comments here tend to implicitly assume the availability of all modern techniques and technologies and imply that any confusions between 'Null' and NULL are the result of bad engineering. What they ignore is the fact that SABRE, for instance, emerged quite long before SQL and RDBMSes; and I suppose some old subsystems are still with us. – ach Mar 28 '16 at 14:24
  • Related: https://www.reddit.com/r/ProgrammerHumor/comments/4c8h98/i_legally_changed_my_last_name_to_null_about_a/ – Angelo.Hannes Mar 28 '16 at 16:42
  • 1
    The bottom line is that if the program is written correctly, there should be no problem with a user named `smith` or `Null` or `'; DROP TABLE *; --`. But there are poorly written programs and they choke on all sorts of data, such as people with names like `O'Leary`. `Null` is not inherently bad. It just excites bugs in poorly-written code. – Andy Lester Mar 31 '16 at 18:16

7 Answers7

103

It doesn't cause database problems. It causes problems in applications written by developers that don't understand databases. At the root of the problem is that much database-related software displays a NULL record as the string NULL. When an application then relies on the string form of a NULL record (likely also using case-insensitive comparison operations), then such an application will consider any "null" string to be NULL. Consequently a name Null would be considered to not exist by that application.

The solution is to declare non-null columns as NOT NULL in the database, and to not apply string operations to database records. Most languages have excellent database APIs that make string-level interfaces unnecessary. They should always be preferred, also since they make other mistakes such as SQL injection less likely.

amon
  • 132,749
  • 27
  • 279
  • 375
  • 31
    In this case, however, if you read the article in question, making a last name field `NOT NULL` will cause a whole set of problems for other people. _"Some individuals only have a single name, not a forename and surname."_ – Mike G Mar 25 '16 at 13:27
  • 1
    @mikeTheLiar Empty strings might do the trick in that case. – Darkhogg Mar 25 '16 at 13:52
  • 42
    @Darkhogg lots of people disagree with me about this but I think that names are like email addresses - don't bother validating them, give the user a single text box and let them put whatever they want. This is information that if I _really_ need it I will get it from you in a way that is certain to be correct. – Mike G Mar 25 '16 at 13:57
  • 2
    @mikeTheLiar Yeah, I'm of that opinion too, but sometimes because of application restrictions or administrative hassle you *have* to use multiple fields. – Darkhogg Mar 25 '16 at 14:00
  • @mikeTheLiar I agree completely. This is known as the Robustness Principle https://en.wikipedia.org/wiki/Robustness_principle: "Be conservative in what you do, be liberal in what you accept from others" – JimmyJames Mar 25 '16 at 14:02
  • 8
    @mikeTheLiar I don't know the name for this but there is a whole class of errors that come out of creating overly restrictive rules on data. Often you'll see postal codes and telephone numbers defined as numeric in applications and databases. They aren't really numbers because it makes no sense to do mathematical operations on them. So when someone tries to enter a Canadian address, they are stuck. – JimmyJames Mar 25 '16 at 14:11
  • 19
    @JimmyJames yeah, zip codes stored as numerics and suddenly [anyone living here](http://www.findazip.com/zip-codes.html?letter=0) has an base-8 zip code. "If you're not doing math with it, it's a string, Full Stop." – Mike G Mar 25 '16 at 14:13
  • 8
    @mikeTheLiar. The problem with treating names as a single string (usually preferable, I agree) is when there's a requirement for alphabetical sorting by surname. – TRiG Mar 25 '16 at 17:45
  • @TRiG an entirely valid point. I would counter that it would be easy enough to work around with a reasonable level of success which has a much smaller chance of blowing up the system. I agree however that there is no 100% complete solution. – Mike G Mar 25 '16 at 17:51
  • 8
    @TRIG : "alphabetical sorting"... ROFL. Sorry for the delay; had to go catch my breath. Not all names are written in sortable glyph sets. Not all glyph sets having pairwise sorting have a linear sort order. The sort order of a name can depend on where it originated, not where it is being used (see regional spelling variations). "alphabetical sorting"... Heh. There really needs to be a "Falsehoods programmers believe about character sets"... – Eric Towers Mar 26 '16 at 22:31
  • 6
    @EricTowers I think that's overly pedantic. There are plenty of applications for sorted names in which accuracy is *not* paramount; a spelling variation or 'unsortables' listed last causing no real problems. – OJFord Mar 27 '16 at 01:53
  • 2
    @EricTowers Any list of strings can be sorted in deterministic way, it's just bits underneath anyway. If there is not a single correct sorting order, that only makes the sorting easier, not harder. Also, rows in a table are always in some order, there is no "unordered table". – hyde Mar 27 '16 at 14:42
  • 3
    @hyde : Consider "東海林", which must be sorted according to its pronunciation, which is either "Tōkairin" or "Shōji", which are bitwise identical in the glyph set and don't sort anywhere near each other. Also, "庄司", "庄子", "東海林", and "小路" are all "Shōji" and don't bitwise sort adjacent. – Eric Towers Mar 27 '16 at 16:00
  • 4
    @hyde : Your "rows in a table are always in some order" is true (in written form) and wildly false for database tables. For database tables, in the problem domain, tables are unsorted, and in the solution domain it may be impossible to "sort" tables split across multiple servers. Without a "SORT BY", even an SQL system returning results from a table split on multiple servers may/will return results in an order depending on the order in which the server responses were received. Table compaction can change the order of records returned. Et c. Apparent implementation ordering is illusory. – Eric Towers Mar 27 '16 at 16:07
  • 2
    @EricTowers I stand corrected about SQL table default sort order (somewhat related [link](http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/05/20/without-order-by-there-is-no-default-sort-order.aspx)). About your example "東海林", if the written form does not specify correct way to sort, then there has to be a default way. Or does it depend on first name? Sex? Something else, which should be apparent from the person's information (talking about sorting by surname still)? – hyde Mar 27 '16 at 16:45
  • 5
    @hyde : The typical method of dealing with this in the source country (Japan) is to provide two name fields -- the first contains "東海林" and the second a phonetic spelling in kana. This is because the only way to know which it is, is to ask. (It doesn't depend on personal name, sex, source region, ... It's largely parental/familial whim.) – Eric Towers Mar 27 '16 at 16:51
13

To answer your specific question there are many steps along the chain of events between a web form and the database. If the last name Null is erroneously interpreted as a NULL value then the system may reject a perfectly valid name as being invalid. This can happen at the database layer as explained by amon. Incidentally if this is the specific issue then the database is also probably open to SQL injection AKA the Bobby Tables attack. Another step in the chain that could be causing problems is the serialization process.

Overall the article was about a bigger problem. The world is a big messy place that doesn't always conform to our assumptions. This is especially apparent when you try to internationalize your application. At the end of the day we need to ensure our applications handle and encode our data properly. It is up to the business to decide how many resources we dedicate towards supporting increasingly complicated edge cases. While I fully support being inclusive, I will understand if the business decides that "the artist formally known as Prince" needs to use a Unicode character to represent his name in our database.

Erik
  • 238
  • 2
  • 8
  • It's hard to imagine this being caused by the sort of unsafe string interpolation that can lead to SQL injection. If you forget to quote user input in an SQL query (e.g. `INSERT INTO users (first, last) VALUES($first, $last)` evaluates to `INSERT INTO users (first, last) VALUES(Jennifer, Null)`) everyone whose names *aren't* valid SQL keywords or column names are just going to throw errors and not have their records inserted either. The cause must be more complex. – nobody Apr 19 '16 at 23:07
  • @AndrewMedico in your straw man example yes but there are lots of ways to do things wrong. Never underestimate the power of stupidity<\strike> ignorance. The bottom line is we have no idea what is the actual problem because we can't review the code in question – Erik Apr 19 '16 at 23:13
8

Well, before it's entered into the database, it's a DOM element, then a javascript variable passed around, validated, and manipulated, then a JSON value, then a variable in whatever backend JSON library you're using, then a variable passed around, validated, and manipulated in your backend programming language, then an element of some sort of DAO, then part of a SQL string. Then to get the value back out, you do it all in reverse. That's a lot of places for programmers to make mistakes, and usually a lot of it without the benefit of static typing.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
2

Most likely its a programming issue. If you look at this answer here on how NULLs are being passed you could easily cause some undesired behavior if you were "Mr. Null".

https://stackoverflow.com/questions/4620391/mysql-and-php-insert-null-rather-than-empty-string

You can see that if some data element was passed as NULL the data would be interpolated as a database null in the database.

"NULL" != Database Null

Some use cases and related behavior...

Let's say last name was marked in the database as not null, now when data is inserted it will be interpreted as a NULL and fail the insert.

Another case is let's say the last name was nullable in the database. Mr. NULL is inserted and is transformed into DBNull.Value which is not the same as "NULL". After the insert we can't find Mr. Null because his last name is not "NULL" but in reality a database null value.

So, those would be 2 cases of problems. As @Amon points out, databases themselves have no issues with nulls, although one should understand how nulls are handled in each RDMS instance as there will be differences between different vendors.

Jon Raynor
  • 10,905
  • 29
  • 47
  • "You can see that if some data element was passed as NULL the data would be interpolated as a database null in the database." - the linked SO question/accepted-answer does not appear to show this? – MrWhite Apr 21 '16 at 11:00
2

I would attribute the problem to sloppy programming and poor design of some implementations of SQL. "Null" the name should always be presented and interpreted with quotes. null, the database value, should always be presented without quotes; but when writing ad-hoc code, it's easy to slip into the "anything will do" paradigm and accept things believed to be a string in unquoted form.

This is compounded by the fact that other types of data; numbers for example, can and are accepted in either form because the interpretation is unambiguous.

ddyer
  • 4,060
  • 15
  • 18
  • You mean poor implementations of _applications using_ SQL, surely? No serious implementation of an RDBMS itself would be vulnerable to this (just as no serious application is!) – underscore_d Apr 19 '16 at 10:39
0

A problem, fundamentally, is that the term "null" is applied two different database concepts, sometimes using context to distinguish between them:

  1. Something doesn't have a known value
  2. Something is known to have no value

While context can sometimes suffice to distinguish between those concepts, there are times when it really doesn't. If one is using a record to hold a search query, for example, there should be a difference between saying "I want someone by the name of [whatever], with no last name", versus "I want someone whose first name is [whatever] but whose last name is unknown." Many database engines have a bias toward one meaning or the other, but they're not all the same. Code which is expecting a database engine to work one way may malfunction if run on a different engine that runs differently.

supercat
  • 8,335
  • 22
  • 28
0

Most of the existing answers focus on the non-SQL parts of an application, but there may be a problem in SQL too:

If instructed to filter out records where a user's last name is not available, someone who doesn't understand SQL very well may write a filter WHERE u.lastname != 'NULL'. Because of the way SQL works, this will appear to check whether u.lastname IS NOT NULL: all NULL records get filtered out. All non-NULL records remain.

Except of course for records where u.lastname == 'NULL', but there may not have been any such record available during testing.

This becomes more likely if the SQL is generated by some sort of framework, where that framework doesn't expose an easily accessible way to check for non-NULL-ness with parameters, and someone notices "hey, if I pass in the string NULL, it does exactly what I want!"

hvd
  • 486
  • 3
  • 9