62

In my experience, many of the projects I have read in the past didn't have relationship definitions in the database, instead they only defined them in the source code. So I'm wondering what are the advantages/disadvantages of defining relations between tables in the database and in source code? And the broader question is about other advanced features in modern databases like cascade, triggers, procedures... There are some points in my thoughts:

In the database:

  • Correct data from design. Prevent application errors which can cause invalid data.

  • Reduce network round trip to application when inserting/updating data as application has to make more query(s) to check data integrity.

In source code:

  • More flexible.

  • Better when scaling to multiple databases, as sometimes the relation can be cross-database.

  • More control over data integrity. The database doesn't have to check every time the application modifies data (complexity can be O(n) or O(n log n) (?)). Instead, it's delegated to application. And I think handling data integrity in the application will lead to more verbose error messages than using the database. Eg: when you create an API server, if you define the relations in the database, and something goes wrong (like the referenced entity doesn't exist), you will get an SQL Exception with a message. The simple way will be to return 500 to the client that there is an "Internal server error" and the client will have no idea what is going wrong. Or the server can parse the message to figure out what's wrong, which is an ugly, error-prone way in my opinion. If you let the application handle this, the server can generate a more meaningful message to client.

Is there anything else?

Edit: as Kilian points out, my point about performance & data integrity is very misguided. So I edited to correct my point there. I totally understand that letting the database handle it will be a more efficient and robust approach. Please check the updated question and give some thoughts about it.

Edit: thank you everyone. The answers I received all point out that the constraints/relations should be defined in the database. :). I have one more question, as it is quite out of scope of this question, I've just posted it as a separate question: Handle database error for API server. Please leave some insights.

Yoshi
  • 747
  • 1
  • 5
  • 7
  • 4
    "as application has to make more query(s) to check data integrity." Not necessarily. If the database is fully under the control of your application, extra checks of the data integrity may be overly defensive programming. You don't necessarily need them; just test your application appropriately to ensure it makes only valid changes to the database. –  Oct 26 '16 at 06:53
  • Sometimes I feel it is need for application to handle data integrity for more verbose error message. Please check the updated question for details. – Yoshi Oct 26 '16 at 07:24
  • 9
    There's one thing you never should forget: Unless everybody involved writes perfect software, if the checks are in the software, one of these checks will fail and lead to constraints not being enforced. It's not a question of if, but of when. This leads to hard to reproduce errors and long hours of massaging the data to fit the software enforced constraints again. – Dabu Oct 26 '16 at 08:43
  • 10
    Something worth mentioning... once you introduce integrity problems to your database it is a kin to opening Pandora's box. It is a nightmare to reintroduce integrity to a anomaly-ridden database. Keeping tight controls on your database may be a hassle but it will save you a lot of pain in the long run. –  Oct 26 '16 at 13:30
  • 3
    In source code: You eventually end up writing most of a database. – Blrfl Oct 26 '16 at 14:45
  • You are quite correct -- the user shouldn't be seeing SQLException error messages; data changes that cause such errors should be checked in the application code before trying to update the database, or at least these errors should be handled and presented with a better UI. But that doesn't mean you don't need to define relations in your database. – Zev Spitz Oct 26 '16 at 15:32
  • 7
    I once asked a very talented programmer a similar question he told me "Its like brakes on a car. The point isn't to make the car go slower, but to allow it to go faster safer." Sure its possible to run without constraints but if bad data somehow gets in, it can cause a serious crash – mercurial Oct 26 '16 at 20:55
  • Related question: [Business logic: Database vs code](http://softwareengineering.stackexchange.com/questions/314490/business-logic-database-vs-code) – Radu Murzea Oct 27 '16 at 07:20
  • 1
    Confusingly, in database terminology, _relation_ is a synonym for _table_. In your question title, I think you mean _association_, (aka _foreign key constraint aka _foreign key relationship_). It may help to rename your question. – dcorking Oct 27 '16 at 13:21
  • Read [this](http://thedailywtf.com/articles/Directive-595). Then go put basic constraints to your DB. (Some of the more fine grained ones may be left out, if the database is protected by an application of some kind.) – jpmc26 Oct 27 '16 at 22:01
  • I am just curious if you are self thought programmer, or you have some academia background. I am (currently at uni) and I would never think of making my own RDBMS - so many things can go wrong, + reinventing the wheel. – Kyslik Oct 27 '16 at 23:06
  • **Note:** You tagged this with _mysql_ and _relational-database_, although you don't mention it in the question itself. The answers would be completely different, when we're talking about other database types, e.g. document database MongoDB. – Christian Strempfer Oct 28 '16 at 10:06
  • In some cases, adding integrity checks to the database can *increase* performance. Some modern RDBMSs are able to use constraint information to optimise queries, by choosing plans that are faster, but would be incorrect if the integrity constraints did not hold. – James_pic Oct 28 '16 at 10:16
  • ***The database doesn't have to check every time application modify data***. No, but any respectable database implementation for integrity handling is probably decades ahead of what you would write yourself. – sampathsris Oct 28 '16 at 10:57
  • I'm writing an ETL for a large database where the creators did not ask this question. They created no FK constraints. The data quality is just awful--there are gobs of FK's that point to nowhere. If I had a time machine, I'd use it to go back in time and offer them money to add FK constraints. – Wayne Conrad Oct 28 '16 at 21:53
  • Verbosity of error messages is irrelevant. Degree of verbosity is always under control of the app (which may choose to let it pass), not the DB... except when the DB is accessed outside of the app, at which points all constraints must be in the DB. – user2338816 Oct 29 '16 at 08:53
  • 1
    Analyzed a bad app years ago that enforced constraints fairly well. Forced the user to make record corrections until **all** errors were gone, then only after displaying the finally correct data would the app allow the 'Enter' key to signal writing the record to the DB. The astonishing bug? That "finally correct" record image wasn't rechecked after the final 'Enter' key was pressed. I.e., once the no_Errors condition was set, the input could be changed in any way and would be written to the DB. Massive integrity checks; then 'anything goes'. – user2338816 Oct 29 '16 at 09:08
  • "many of the projects I have read in the past didn't have relationship definitions in the database," - How far back are you going? Years ago RDBMS didn't have DRI - maintaining RI in the application was the only option. – peterG Oct 29 '16 at 22:12

11 Answers11

119

The database doesn't have to check for data integrity every time application modify data.

This is a deeply misguided point. Databases were created for precisely this purpose. If you need data integrity checks (and if you think you don't need them, you're probably mistaken), then letting the database handle them is almost certainly more efficient and less error-prone than doing it in application logic.

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
  • 3
    "if you think you don't need them, you're probably mistaken". I could be mistaken (I'm wrong about lots of things) but I often think I don't need additional data integrity checks if the database is under my application's control and the application has been appropriately tested. Of course simple constraints should be implemented in the database, but more complex integrity requirements can add a lot of complexity and overhead to implement, and they may not be needed. –  Oct 26 '16 at 07:47
  • 5
    @dan1111 I don't understand your comment... are you saying: simple constraints are enforced by the database, so they are not a problem for the application code, more complex constraints are too hard to implement so just give up on them? Or are you saying that implementing complex constraints using database triggers (and similar mechanism) is too complex and so it's better to implement them in the application code? – Bakuriu Oct 26 '16 at 08:43
  • 48
    You can't even do integrity or constraints in non DB code. Transactions can't see the results of others until they are committed (and even then maybe not). You may get the illusion of integrity but it's subject to timing or serious scalability issues due to locks. Only the database can do this correctly. – LoztInSpace Oct 26 '16 at 12:04
  • 17
    Anecdotally, to follow on from @LoztInSpace's comment, I once worked for a (terrible) company where one of that tables was set up in such a way that rather than letting the DB auto increment the ID, the application took the last rows ID, added one to it and used that as the new ID. Unfortunately, about once a week duplicate IDs were inserted bringing the application crashing to a halt.. – James T Oct 26 '16 at 14:16
  • 9
    @dan1111 You never write bugs in the application, right? – user253751 Oct 26 '16 at 19:40
  • 1
    I am on a verge here. Transactions are a huge part of regular businesses, but they are usually present to prevent data inconsistency. If you require your username to be unique that's part of business rule and shall be within the application layer. Obviously your database may act as a safety net but it should not be the database layer enforcing the rules. – Andy Oct 27 '16 at 07:05
  • 4
    @DavidPacker I might agree, however once you have multiple clients accessing the database, you can _only_ enforce constraints in the database. Unless, that is, you start locking tables wholesale instead of by rows, with the performance hit that carries. – Iker Oct 27 '16 at 07:16
  • 1
    @immibis, of course I make mistakes, but at some point double-checking everything you did becomes overly defensive and not a good use of resources. I'm saying that in some cases, database integrity checks may be overly defensive. –  Oct 28 '16 at 12:20
  • 1
    @dan1111 The thing is that in most code, if you accidentally corrupt a data structure, it'll be "fixed" once the program ends, but if you accidentally corrupt your database, you have to deal with it for the rest of the lifetime of your application (or business!) - which could be very short depending on what you broke. – user253751 Oct 29 '16 at 01:15
  • MyIsam doesn't even have foreign key constraints, and MySQL with MyISAM was at one point *the* web database storage option. Not state of the art, but it's not like data was always corrupt. – John Oct 29 '16 at 10:16
  • @Iker Why would you give someone write-access to a database directly? It is 2016, middlewares are the way to go. – Andy Oct 29 '16 at 14:54
  • `The database doesn't have to check for data integrity every time application modify data.` — `This is a deeply misguided point. Databases were created for precisely this purpose.` — *cough*Mongo*cough* – Mark K Cowan Oct 29 '16 at 20:55
  • @John popularity does not mean correct or even good decisions a lot of people would not touch MySql (and still won't) because of that design decision – mmmmmm Oct 29 '16 at 22:14
71

TL;DR: Relationship constraints should go in the database.


Your application ain't big enough.

You are correct, indeed, that enforcing relationships across databases may require enforcing them in the application.

I would point out, however, that you should first check the documentation of the database software you are using, and check existing product offers. For example, there are clustering offers on top of Postgres and MySQL.

And even if you end up needing to have some validation in the application, don't throw out the baby with the bath water. After all, the less you have to do, the better off you are.

Finally, if you are worried about future scalability issues, I am afraid that your application will have to undergo significant changes before it can scale anyway. As a rule of thumb, every time you grow 10x, you have to re-design... so let's not sink too much money into failing to anticipate scalability issues, and instead use money to actually reach the point where you have those issues.

Your application ain't correct enough.

What is the chance that the database you use have a faulty implementation of the check compared to the chance that your application has a faulty implementation of the check?

And which one do you alter most often?

I'd bet on the database being correct, any time.

Your developers ain't thinking distributed enough.

Reduce network round trip to application when insert/update data as application has to make more query(s) to check data integrity.

Red Flag!1

If you are thinking:

  • check if the record exists
  • if not, insert record

then you failed the most basic concurrency issue: another process/thread might be adding the record as you go.

If you are thinking:

  • check if the record exists
  • if not, insert record
  • check if the record was inserted as a duplicate

then you failed to account for MVCC: the view of the database that you have is a snapshot at the time your transaction started; it does not show all the updates that are occurring, and maybe not even committed.

Maintaining constraints across multiple sessions is a really hard problem, be glad it's solved in your database.

1 Unless your database properly implements the Serializable property; but few actually do.


Last:

And I think, handle data integrity in application will let to more verbose error message than using database. Eg: when you create an API server. If you define relations in database, and something go wrong(like the referenced entity doesn't exist), you will get an SQL Exception with message.

Do not parse error messages, if you use any production-grade database it should return structured errors. You'll have some error code, at least, to indicate what is possibly wrong, and based on this code you can craft a suitable error message.

Note that most of the times the code is enough: if you have an error code telling you that a referenced foreign key does not exist, then it's likely that this table only has one foreign key, so you know in the code what the problem is.

Also, and let's be honest here, most of the times you will not handle errors that gracefully anyway. Just because there are so many of them and you'll fail to account for them all...

... which just ties in to the correctness point above. Each time you see a "500: Internal Server Error" because a database constraint fired and was not handled, it means the database saved you, since you just forgot to handle it in the code.

Matthieu M.
  • 14,567
  • 4
  • 44
  • 65
  • 3
    Haha, you wrote this as I was writing my comment, ironically emphasising the point we are both making. I totally agree: You can't even do integrity or constraints in non DB code. Transactions can't see the results of others until they are committed (and even then maybe not). You may get the illusion of integrity but it's subject to timing or serious scalability issues due to locks. Only the database can do this correctly. – LoztInSpace Oct 26 '16 at 12:08
  • @LoztInSpace: I've had to implement (and optimize) the lock approach for some seriously weird constraints before; it's not funny :( – Matthieu M. Oct 26 '16 at 12:15
  • 8
    All good points. Another is that relationships in a database are self documenting. If you've ever had to reverse engineer a database that had its relationships defined only in the code querying it, you will hate anyone that does it that way. – Kat Oct 27 '16 at 18:57
  • 1
    @Kat11: That's true. And self-describing also has the advantage that tools can easily understand the data and act on it, which can be useful sometimes. – Matthieu M. Oct 28 '16 at 06:22
  • 1
    Your argument about MVCC is not accurate in DBs that implement SERIALIZABLE isolation correctly (modern versions of PostgreSQL do, for example - although many major RDBMSs do not). In such a DB, even the first, naïve, approach would work correctly - if writes conflict, they will be rolled back as a serialization failure. – James_pic Oct 28 '16 at 09:37
  • @James_pic: It is not clear to me what you mean by "conflict" here, and I do not have time to check the SERIALIZABLE guarantees of postgres right now. If you do not have a simple explanation, I'll check it up later tonight to signal the potential special-case of postgres. – Matthieu M. Oct 28 '16 at 09:42
  • 1
    In DBs that implement SERIALIZABLE correctly, if you take all the successfully committed transactions, then there is some ordering (which may not be the same as wall-clock ordering), such that if you had run all of them serially (with no concurrency) in that order, all results would have been exactly the same. It's tricky to get right, and the SQL specs are vague enough that you could convince yourself that it's OK to allow write skew at SERIALIZABLE level, so many DB vendors treat SERIALIZABLE as SNAPSHOT ISOLATION (I'm looking at you Oracle). – James_pic Oct 28 '16 at 09:59
  • @James_pic: I dread to think of the difficulty in implementing this correctly, to be honest, and wonder at the runtime cost. Thanks for chiming in, I'll edit the answer. – Matthieu M. Oct 28 '16 at 10:36
  • Good answer but I think the consistency and concurrency issues should be more front-and-center. Unless this application is only accessed by one instance of the application at a time, trying to prevent race conditions across instances of an applications is a Sisyphean task You could succeed but you'd most likely end up reinventing what the database gives you. The other thing is that in my experience it's rare for a database to only be used by one application over it's lifetime. YMMV. – JimmyJames Oct 28 '16 at 15:43
  • If you're curious about the gory details of SERIALIZABLE support under MVCC, [the Postgres wiki](https://wiki.postgresql.org/wiki/Serializable) has a nice walkthrough of the issues, and they also published a [whitepaper](https://drkp.net/papers/ssi-vldb12.pdf) (money quote: "Our experiments with a transaction processing and a web application benchmark show that our serializable mode has a performance cost of less than 7% relative to snapshot isolation, and outperforms two-phase locking significantly on some workloads."). – Kevin Oct 29 '16 at 01:36
  • What does "structured error message" mean? I know Sql Server only ever gives me a string. What do you get from others? – John Oct 29 '16 at 10:09
  • @John: Oracle has an error code (code 1 is [unique constraint violated](https://www.techonthenet.com/oracle/errors/ora00001.php)), so [has sqlite](https://www.sqlite.org/rescode.html). They are generally rough, but point you to the right direction. [Sql Server seems to have such as well](https://msdn.microsoft.com/en-us/library/windows/desktop/ms716256(v=vs.85).aspx), you may find [their meaning in the database itself](http://stackoverflow.com/questions/13584124/is-there-an-overview-of-all-sql-server-2012-error-codes). – Matthieu M. Oct 29 '16 at 11:01
  • @MatthieuM. Yes, they have a number. Not enough for creating error messages for the user, and usually even less information than the error message contains. I thought maybe some were actually telling you *what* constraint was violated in a machine-evaluatable way. – John Oct 29 '16 at 11:14
  • @John: Oh, I have wished for that for so long... but no. However, I find relying on parsing the message + matching the table/index/constraint name extremely brittle in practice. If you have a foreign key (parent absent) error and a single parent, you know which it is; if you have 2+ parents and really need to know which it is, you can always query the database... but in most cases I've seen, even on hairy schemas, the application should generally not even get in that situation in the first place so the error message would end up being "Oops, we have a bug" anyway. – Matthieu M. Oct 29 '16 at 11:53
  • @MatthieuM. There are also scenarios where it means "someone else beat you to it" (more commonly for simple uniqueness constraints than foreign key constraints). Usually my own software just gives an ugly error originating for the constraint violation that one can live with as it's so rare - and I think most software behaves like that. Sadly, sql databases are not designed for coders, although coders have few real alternatives. – John Oct 29 '16 at 13:26
  • @John: *Sadly, sql databases are not designed for coders* => and let's not talk about operators. I've seen entire database clusters going from busy to frozen up with a mere 10% increase because of lock contention, badly written queries of a single application monopolizing all resources, ... – Matthieu M. Oct 29 '16 at 15:55
52

The constraints should lie within your database, as (with the best will in the world), your application will not be the only thing to ever access this database.

At some point, there may need to be a scripted fix within the database, or you may need to migrate data from one table to another on deployment.

Additionally you may get other requirements e.g. "Big customer X really needs this excel sheet of data imported into our application database this afternoon", where you will not have the luxury of adapting your application code to suit when a dirty SQL script will get it done in time.

This is where database level integrity will save your bacon.


Additionally, picture the developer who takes your role at this company after you leave and is then tasked with making database changes.

Will he hate you if there are no FK constraints within the database so that he can tell what relationships a table has before he changes it? (Clue, the answer is yes)

Paddy
  • 2,623
  • 16
  • 17
  • 33
    Oh brother. I can't tell you _how_ many times I've had to explain to people that a database has **more than one client!** Even if _right now_ there is only one client and only one avenue for data to enter the system, designing your application and schema based on this assumption is the best way for Future Yoshi to hate Past Yoshi. – Greg Burghardt Oct 26 '16 at 12:46
  • How about the case when you must load the customer's excel to DB ASAP alghough it breaks foreign key constraints? If the top priority is getting it loaded it is easier without FKs. You'll pray nothing breaks but you were just following orders ;) – NikoNyrh Oct 26 '16 at 20:30
  • 9
    @nikonyrh I wouldn't be doing that. The constraints are there so that applications can rely on consistent data. To disable FK 'just to get it in' is madness. – Paddy Oct 27 '16 at 07:10
  • 5
    Agreed. Also, even if your application is the only client, you could have different versions of your application attempting to enforce slightly different constraints. Usually, hilarity ensues (well, not really. It's more like 'chaos and utter frustration' than 'hilarity'). – Iker Oct 27 '16 at 07:18
  • 5
    I can absolutely attest to this. In my case I was stuck on MyISAM which does not actually support foreign keys, so I ended up with 250GB of data with integrity enforced by the application. When it came to start pruning data to get the backlog to a more manageable size, and when it became clear that the application itself wasn't going to be able to handle this at all, chaos ensued. I don't know why I'm using the past tense; this is still happening _now_ and the problem (two years on) still has yet to be resolved. \*sniff\* – Lightness Races in Orbit Oct 27 '16 at 15:25
  • 1
    I would argue that a decent code base should make it easy to write a one-off script using the persistence layer from your application at least as quickly as to write raw SQL. 'Modifying your application's code' should *never* be necessary for one-off scripts. – Jonathan Cast Oct 27 '16 at 20:58
  • 1
    @NikoNyrh: If you disable foreign keys just to load data, you are almost always going to painstakingly do a data repair down the road. And it's not going to be pretty. – sampathsris Oct 28 '16 at 11:01
  • True, but ideally the software would be robust enough to handle these issues "transparently". Large-scale databases don't support foreign keys or joins anyway so more and more of this logic has to be build to the application. But comments are not for chat bla bla, and in most cases traditional dbs have good enough performance as not everybody is Facebook, Twitter or Uber. – NikoNyrh Oct 28 '16 at 14:10
  • *"your application will not be the only thing to ever access this database"* Practically every modern dbms has a command-line interface and a GUI interface, and someone will use them at some point. – Mike Sherrill 'Cat Recall' Oct 28 '16 at 14:32
17

You should have relations in the database.

As the other answer notes, performance of constraint checking will be far better inside that database than inside your application. Database constraint checks are one of the things that databases are good at.

If you ever need additional flexibility - e.g. your noted cross database references - then you can remove the constraints deliberately and with consideration. Having consistency within your database means that you have the option of modifying those constraints, and certainty of referential integrity.

Kirk Broadhurst
  • 4,199
  • 18
  • 27
  • 1
    True. I should have said that performance of constraint checking will be better handled in the database than in the application. – Kirk Broadhurst Oct 26 '16 at 07:50
13
  • We no longer live in one back-end <-> one front-end world.
  • Most solutions involve a web front-end, a mobile front-end, a batch-front-end, and iPad front-end, etc.
  • Database engines already have thousands of tested lines of code optimized to enforce referential integrity.

Can you really afford writing and testing referential integrity enforcing code when you have domain problem solving code to write?

Tulains Córdova
  • 39,201
  • 12
  • 97
  • 154
  • 2
    *"We no longer live in one back-end <-> one front-end world."* Did we ever? A few years ago, I worked on a database system that had programs written in at least two dozen different languages accessing it. Some of the programs had their first release in the 1970s. – Mike Sherrill 'Cat Recall' Oct 28 '16 at 14:35
2

If you don't validate your data integrity, constraints, relationships etc. at the database level that means it is much easier for anyone with production database access (through any other client including a DB access tool) to mess up your data.

It is great practice to enforce the strictest possible data integrity at the database level. Trust me, this will save you enormous headaches over time in any non-trivial system. You will also pick up application logic errors or business requirement errors and inconsistencies faster if careful thought is put into this.

As a side note to this, design your database in a way that is as normalized and atomic as possible. No "God" tables. Spend a lot of effort designing your database to be as simple as possible, ideally with many small tables that are individually very well defined, with a single responsibility and carefully validated on all columns. The database is the last guardian of your data integrity. It represents the Keep of the Castle.

Bradley Thomas
  • 5,090
  • 6
  • 17
  • 26
1

Most people are essentially saying "yes, in general thou shalt always define the relations in the database". But if disciplines in computer science were so easy, we would be called "Software Manual Readers" instead of "Software Engineers". I do actually agree that the constraints should go in the database, unless there is a good reason they shouldn't, so let me just provide a couple reasons that might be considered good in certain situations:

Duplicate Code

Sometimes a certain amount of functionality that could be handled by the database will naturally exist in application code. If adding something like constraints to the database would be redundant, it might be better not to duplicate the functionality, because you are violating DRY principles, and you might worsen the juggling act of keeping the database and application code in sync.

Effort

If your database is already doing what it needs to do without using advanced features, you might want to evaluate where your time, money, and effort should be placed. If adding constraints would prevent a catastrophic failure and thus save your company a lot of money, then it is probably worth it. If you are adding constraints that should hold, but are already guaranteed to never be violated, you are wasting time and polluting your code base. Guaranteed is the operative word here.

Efficiency

This is normally not a good reason but in some cases you might have a certain performance requirement. If application code can implement a certain functionality in a faster way than the database, and you need the extra performance, you might need to implement the feature in application code.

Control

Somewhat related to efficiency. Sometimes you need extremely fine grained control about how a feature is implemented, and sometimes having the database handle it hides it behind a black box that you need to open.

Closing Points

  • Databases are written in code. There's nothing magic they do that you can't do in your own code.
  • Nothing is free. Constraints, relations, etc. all use CPU cycles.
  • People in the NoSQL world get along just fine without traditional Relational features. In MongoDB for example, the structure of JSON documents is good enough to support an entire database.
  • Blind and ignorant use of advanced database features can't guarantee any benefits. You might accidentally make something work only to break it later.
  • You asked a very general question without listing specific requirements or constraints. The real answer to your question is "it depends".
  • You didn't specify if this was an enterprise scale problem. Other answers are talking about things like customers and data integrity, but sometimes those things aren't important.
  • I'm assuming you are talking about a traditional SQL Relational database.
  • My perspective comes from having moved away from using tons of constraints and foreign keys in small (up to 50 tables) projects, and not noticing any drawbacks.

The last thing I will say is that you will know if you shouldn't be placing the functionality in the database. If you're not sure, you are probably better off using the database features, because they usually work really well.

  • 1
    If people downvote well-thought answers because it disagrees with their dogma, the SE StackExchange becomes a worse place. – JounceCracklePop Oct 27 '16 at 22:57
  • 5
    This answer's premise that there are occasions where you might leave constraints out of the DB is valid, but the explanation is poor and misleads. While I concur that the database isn't the best place for *some* constraints, *no* relational database should go without **basic key and referential integrity constraints**. *None*. There is zero exception to this. Every database is going to need primary keys, and the vast majority will need foreign keys. Those should *always* be enforced by the database, even if it duplicates logic. The fact that you gloss over this is why I downvoted. – jpmc26 Oct 27 '16 at 23:57
  • 1
    *"Databases are written in code. There's nothing magic they do that you can't do in your own code."*, no, you can't enforce referential integrity in application code (and if you don't need to enforce it, why are you using a database server at all?). It's not about what code can do, it is about *where* it can be done. – hyde Oct 28 '16 at 05:31
0

As always, there are many answers. For me I found a simple rule (well it only works for a model-centric approach). Usually, I only focus on the different layers of applications.

If the model consists of several entities and there are dependencies between the entities, the persistance layer should reflect those dependencies with its possibilities. So if you're using a RDBMS, then you should also use foreign keys. Reason is simple. That way the data is always valid structurewise.

Any instance doing work on this persistance layer can rely on it. I'm assuming, that you're encapsulating this layer via interface (service). So here is the point where the design ends and the real world begins.

Looking at your points, especially cross-database references. In that case, yes there should not be a reference implemented in the RDBMS itself, but in the service. But before following this way, wouldn't it be better to consider this already during the design?

Means, if I know already, that there are parts which need to be stored in a different DB, then I can put them already there and define it as seperate model. Right?

You're also pointing out that implementing this in code is more flexible. Right, but doesn't that sound like you're dealing with an incomplete design? Ask yourself, why do you need more flexibility?

The performance issue, due to the integrity checks in DB is not real. The RDBMS can check such things much faster than any implementation by you. Why? Well, you have to deal with the media disruption, the RDBMS doesn't. And it can optimize such checks by using its statistics a.s.o.

So you see, it all comes back to design. Of course you can say now, but what if an unknown requirement is appearing, a game changer? Yes it might happen, but such changes should be designed and planned a.s.o.. ;o)

DHN
  • 149
  • 3
0

You have some very good answers but some more points

Data integrity is what a database is designed to do

Doing proper concurrency of like a FK delete at the application level would be horrendous

Expertise in data integrity is with a DBA

At the program level you insert, update, bulk update, bulk insert, bulk delete ...
Thin client, thick client, mobile client ....
Data integrity is not the expertise of a programmer - lots of duplicate code and someone will mess it up

Say you get hacked - you are in trouble either way but a hacker can do a lot of damage via a small hole if there is no integrity protection at the database

You may need to manipulate data directly via SQL or TSQL
No one is going to remember all the data rules

paparazzo
  • 1,937
  • 1
  • 14
  • 23
0

Your question doesn't make sense: if you can change the database, it's code, if you can't change the database, you'll have to create your constraints elsewhere.

A database which you can change is every bit as much code as any line of ruby, javascript, c# or ada.

The question about where to put a constraint in your system should boil down to reliability, cost and ease of development.

jmoreno
  • 10,640
  • 1
  • 31
  • 48
0

There are tons of good answers here. I'll add that if you have an app written in language Y, you can create database-constraint-like code in Y. And then someone wants to access your database using language Z, you'll have to write the same code again. God help you if the implementations aren't exactly the same. Or when a knowledgeable business user connects to your database using Microsoft Access.

My experience tells me that when people don't want to use database constraints, it's because they're actually trying to do something the wrong way. For example, they're trying to bulk load data, and they want to leave not-null columns null, for a while. They intend to "fix that later" because the situation that made the not-null constraint critical "can't possibly happen in this case." Another example could be when they're trying to shoe-horn two different types of data into the same table.

More experienced people will take a step back and find a solution that does not involve attempting to bypass a constraint. The solution could simply be the constraint is no longer proper because the business changed, of course.

Tony Ennis
  • 211
  • 1
  • 6