3

I've recently joined a project that uses a database model which is, to me at least, unconventional. Every column, except the primary key, is nullable. Instead, the constraint is enforced in the code used to access the database.

The reasoning of the developers is that it's more flexible to do so, and since the REST API exposing this data is supposed to be the single point of data access, there is no risk of corrupt data entering the database.

It's something I hadn't heard of before, and I didn't manage to find any resources championing it. I can appreciate their reasoning and I can't really poke any holes in it either, but my gut feeling says this method is absolutely wrong.

So, does it make sense to place constraints in the code instead of in the database? The database product of choice is SQL Server.

Stijn
  • 351
  • 1
  • 13

3 Answers3

9

Of course it's more flexible to allow everything. Arguing against this is a losing game. The point of constraints, types, information hiding etc. etc. is that we've slowly and painfully realized that total flexibility isn't good for us. We're too error-prone to assume that much responsibility, and we're too bad about judging our own abilities to make the right trade-off ourselves.

In this case, a database with nullable columns cannot guarantee a lot of the things which constitute the advantage of having a relational database in the first place. REST is a nice way of accessing things, but it simply can't deliver the kind of assurances (like atomicity, durability etc.) that people assume from using databases. Maybe all that is of no importance in your particular system, but I really doubt it.

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
  • 1
    To be fair, there has been a movement towards No-SQL type databases which are generally more flexible than their traditional database counterparts. There are certainly disadvantages to this approach, but it does make the program the controller of its data. That said, I do think it is wrong to attempt to make a traditional database act like a NO-SQL database. – Neil Sep 29 '17 at 13:54
  • 1
    Sure, non-relational DBs have their place. But they weren't invented because the relational model was too limiting. They were used first for extremely (Yahoo, Facebook...) big data sets when relational DBs could not scale enough *at all*. – Kilian Foth Sep 29 '17 at 13:57
  • 1
    @Neil - I agree, but if you're going with a relational database with any kind of power, let it do what it does best. Otherwise, you have to deal with the disadvantages and get little upside. – JeffO Sep 29 '17 at 15:14
  • 1
    @kilianFoth Not to be pedantic, but technically SQL databases were invented after non-relational databases. They have come back into vogue order to solve scalability (and other) problems that SQL databases have a harder time with. – Paul Sep 29 '17 at 16:00
3

This line of reasoning is valid if you never ever have mistakes or bugs in the code.

You only need a single bug in the application code to cause a NULL to be saved in the database. Even if this bug is quickly discovered or fixed, the NULL remains in the database and will cause other parts of the code to fail in obscure ways, since the rest of the code is written with the expectation that this field is never NULL. You can't really get rid of the NULL again, since you don't know what the correct value was supposed to be. So now you have to rewrite all the code to handle the special case where this field is NULL.

Obviously there could also be mistakes in the schema/constraints design, but this would only lead to data corruption in the unlucky case where there is a bug in the schema and a matching bug in the application code. So you are still much better protected. In any case, application logic is typically much more complex and faster changing that the database schema, and hence more error prone.

Of course you should still validate and enforce the business constraints at the application level. I'm just pointing out the risk of only doing it at the application level and not have the database reject invalid data.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
  • I'm not going to downvote, but I think this is a little bit of a strawman argument, as currently phrased. It presumes that the team writing the DB schema is better/more correct/has a better understanding of the requirements than the application developers. – Paul Sep 29 '17 at 15:58
  • 3
    The key here is that the DB schema provides a single possible point of failure. Bugs in code are effectively infinite possible points of failure since an error may exist now or may be introduced in the future. – 17 of 26 Sep 29 '17 at 16:44
  • 2
    @Paul: Not at all, I assume they are the same people, since this is typically the case with this style of architecture. The point is that even if there is an error or oversight in the schema design, you would only get data corruption if there is an *exactly* similar error or oversight in the application code. And even then you are not worse off than without constraints. In any case, a schema is typically a lot less complex than application logic and therefore less error prone. – JacquesB Sep 29 '17 at 16:46
  • @17of26 and JacquesB I agree that there needs to be some form of schema enforcement, I just disagree that it has to be in the DB in all cases. The major advantage in my experience of having it in the DB is if the DB is shared by multiple applications. But there are plenty of ways to enforce a correct schema (and in some cases more complex rules than SQL is good at) in the application code as well, and also provides that single point of reference. There are arguments about the performance of schema enforcement in the application layer, but that's a race-your-horses problem to me. – Paul Sep 29 '17 at 16:57
  • 1
    @Paul: Of course the data constraints should be enforced at the application level also. That is a given. I'm just pointing out the risk of *only* doing it at the application level. – JacquesB Sep 29 '17 at 17:11
2

The reason that constraints in the database are usually the best place to start, is because of concurrency.

If your application supports more than a single user, or is a web application where you can have multiple people updating data, or a single person with multiple browser tabs open, then constraints at the application level are subject to race conditions: Multiple people could be updated the same record at the same time. Even pre-checks at the application level require milliseconds to occur, and in that small span of time two people could commit their changes at the same time, thus bypassing your constraints at the application level.

Databases have long ago solved race conditions affecting inserting, updating and deleting the same data, which they call concurrency.

Let's also not forget that data can end up in the database without going through your application. A DBA can log in and run a SQL script.

We put constraints in the database simply because we don't have to program around concurrency and race conditions in the application tier, and data can be inserted or updated without going through your application.

Greg Burghardt
  • 34,276
  • 8
  • 63
  • 114