Constraints in a relational databases - Why not remove them completely?

Question

Is there any reason to build constraints between tables (inside SQLserver) nowadays? If so, when? Most applications in my area are built on object principles and tables are joined on demand. Demand are based on the need from the application. I won't load a bunch of constrained tables for a simple look-up, which in turn (after action) require one another simple lookup.

ORM tools such EntityContext, Linq2Data, NHibernate are also handle the constraints by themselves, at least you know what tables that need each other. Doing constraints inside the server is just about making (forcing) same changes twice?

This is usually not a question up for decision, but this database is designed quite different. The design looks regular good, mostly mirrored the objects used by the applications. What disturbing me is all constraints configured inside SQLserver with "not cascade". Which means that you have to play "seek and find" when coding new database queries. Some cases requires up to 10 levels of an exact order to make a single delete.

This surprises me and I'm not sure how to handle it.

In my simple world, that setting makes the constraints losing most of the purpose. OK if the database were accessed from hosts without knowledge of the design.

How would you act in this scenario?
Why not just remove all constrains from db and keep them at application level?

You were planning on always accessing the data through a single ORM tool? Or were you planning on having “fun” replicating all the constraints correctly across each ORM tool in use? — Donal Fellows, May 13 '11 at 09:30
Per my latest comment to Peter i have to agree. The point of relying all constraints to the code base (and remove them from db) was very narrow and are probably fully applicable to short-lived applications. Probably also for some RAD developers / projects. — Independent, May 13 '11 at 10:39
Minor nitpick: I think it gets a bit confusing when you call the foreign key connections between the tables "relations". The "relations" in a relational database are the tables themselves, not the connections. Especially when we then go on and talk about "relational design" -- does that mean tables, or does it mean foreign keys? — Thomas Padron-McCarthy, May 13 '11 at 11:47
Thank's. I call the "connections between the tables" for constraints. Hence, you problably right that I see "relational database" for the principles of table design (structure of tables). An even more precise description would be "design pattern", when relating to "relation versus object" database. — Independent, May 13 '11 at 11:52
Your database is going to outlive your application code. Also, your ORM is hurting your application performance and there's a good chance you'll end up wanting to bypass it at least in certain use cases. If you don't know it now, you'll know it eventually. http://samsaffron.com/archive/2011/03/30/How+I+learned+to+stop+worrying+and+write+my+own+ORM. Also, removing all constraints leaves your database completely unable to protect its own integrity when it is abused by apps other than yours, which could be anything from another actual app to an exec down the hall with Excel. — Craig Tullis, Apr 06 '14 at 01:15

Péter Török · Accepted Answer · 2011-05-13T11:55:18.020

47

Two general reasons not to remove contraints from DB:

It may be accessed by more apps, now or in the future, which may or may not use ORM. Even if the developers of those apps faithfully duplicate all the constraints there (which may be significantly more difficult using lower level non-ORM solutions), it is always extra work. And if not, even one small omission is enough to break schema integrity... which is something you don't want to risk. In most companies, the data stored in their DB is the lifeblood of their business, so its integrity must be ensured by any means. And the tried and proven best means to achieve this is to implement as many constraints in the DB as possible.
The query optimizer relies a lot on the constraints known on the DB level. If you remove constraints, query performance may start deteriorating. You may not immediately notice it, but one day it is going to hit you, and by then it may be too late to fix it easily. The nature of things is that DB performance tends to break down at peak load time, when there is the least possibility to make careful, well thought out design improvements, backed by exact performance measurements and detailed analysis to pinpoint the root causes.

Your concrete case sounds like the DB schema may have been originally generated by an ORM tool (or designed by someone not very experienced with the relational world), so it is suboptimal from the relational point of view. It is probably better to analyse and improve it towards a more "natural" relational design, while keeping it consistent with the ORM views. It may be useful to involve a DB expert in this analysis.

edited May 13 '11 at 11:55

answered May 13 '11 at 07:51

Péter Török

46,427
16
160
185

Thank's. We may talk about different things. This database is originally created by a (at least by himself) so called database expert. He was hired for that work. Why would a relational design be a improvement against object design? Relational has been "the thing" for descades, but the interactivity with applications has changed dramatically. – Independent May 13 '11 at 08:06
5

@Jonas, then talk to the guy about the perceived issues with his DB design. Relational and object oriented are two different worlds - neither is an "improvement" over the other per se, and both have their own place. Designing a C# app on relational principles is as big a mistake as designing a DB the OO way. – Péter Török May 13 '11 at 08:18
3

@Jonas, reflecting to your updates: if you need to write overly complex queries to achieve seemingly simple things against the DB schema, it is either a sign that the DB design is inadequate for its purpose - or that you are not skilled enough (please don't take offense, it is not obvious from your post how experienced you are with SQL. As a disclaimer, I myself am far from being an expert.) – Péter Török May 13 '11 at 08:31
Knowledge is always a hammer! Looking at this system there definitively ben a problem build tight queries. Not because of database design, more of software design. We had our discussions already. We're end in working with different philosophies. Like you kan be republican or democrat. – Independent May 13 '11 at 10:14
1

I probably have some expressions to learn, to make myself perceivable :). I re-readed the question and answers and have to reverse. There are definitively a strong point have DB as a master for all constraint. All systems need to be designed from that. A very narrow view to say that the code base would do the work. If every system can have thier own decision about constraints, there going to end in a high-chapparral with wrong suggested relations and whole tables orphaned. If not now, then it occurs later with other coders. – Independent May 13 '11 at 10:36
8

"It may be accessed by more apps, now or in the future." Not to mention some database administrator, running raw SQL queries to fix a problem with the database, while the users are waiting... – Thomas Padron-McCarthy May 13 '11 at 11:52
5

+1: if db is storing business data (not just app config etc), then the probability that the database will out live/ or be extended outside the current app is approaching 100% – Binary Worrier May 13 '11 at 15:15

score 28 · Answer 2 · answered May 13 '11 at 08:12

Applications can come and go but data lives forever. In my company the DB is over 30-40 years old, it will live on as long as the company exists. The applications change, developers come and go. It is better to have integrity and a good logical data model. That way someone can look at data and get meaningful understanding without having to go through a complex codebase. This also helps in reporting significantly. Also applications can and will have bugs and DB constraint is a guard against that. My default position is to have as much constraint (FK and check) as possible.
The only reason not to have a constraint would be if your design pattern doesn't allow that e.g. Table-per-hierarchy or performance problems.

I will say, you doing a very wise advice here. My opinion may match better to RAD development or whatever developing where applications have short life-time - Just for the sake of minimized maintanance while developing. — Independent, May 13 '11 at 11:46

score 15 · Answer 3 · answered May 13 '11 at 14:51

What disturbing me is all constraints configured inside SQLserver with "not cascade".

That is not disturbing to me, that means someone has shown good sense. Cascading deletes are often very bad for the database. Inthe first place, sometimes you want a delete to fail if you have data in related tables. For instance, if youhave a customer who has an order inthe past, you don't want him to be deleted or you lose the data about who the order was for and a cascade delete woudl get rid of the record which would messup you financial reporting.

You seem to to think that ease of developemeant is the most important thing. In the database world this is simply not true. Data integrity is the first most critical thing followed closely by performance and data security. If it takes longer to write the queries then so be it.

Database are typically acted on by many applications = one or more web sites or desktop applications, a reporting application, web-services, the query window, ETL processes, etc. If you do not enforce contraints at the database level you first lose the integrity of the data as one of those applications may not follow all the rules. Second, you have to code those contraints multiple times and rewrite them if you decide to use a different application later. Third, you cannot control in advance whether there will be a need to to do some sort of data maintenance task that will not happen through the application(fixing the data from a bad customer data import for instance or changing all the 10,000,000 records from one client to another client when the company is bought by a competitor). Typically application developers don't think about how the data may need to be manipulated over time and thus forget to add important constraints at the database level which is the first, most important place where they belong.

Thank's for reply. All processes and and application types you talk about, should talk to a DAL (which in turn would contain the constraints). BUT! Your point is perfect and your comment is good. Sidenote: Yes. I tend to try ways to ease development. To me, less complexity can stand less ways to do wrong. This isn't "want develope easier/faster", even if it could be - if it's handled wrong. Hence why i'm post this question! I would also see someones good sense if this non-cascade was chosed with sense, not 100% as in this scenario. I have to find out reasons. — Independent, May 13 '11 at 15:19
@Jonas, there can be performance reasons also. Depends onteh number of child records. OK if you are deleting small groups but if millions of records could get triggered you are better off doing batches and not locking up all the tables while the whole process happens. In general many dbas will not permit cascading deletes just for that reason as it can lock up a prod system if a delete affects too many records. — HLGEM, May 13 '11 at 16:34
No all the processes should not talk to a DAL. ETL processes typically don't nor things that need to happen at the database level that affect many records when some large change happens (like the client being bought out). Nor can you forbid anyone to ever use the query window to do a one-time change. I have never seen a database that didn't enforce constraits at the database level that did not have integrity problems over time. — HLGEM, May 13 '11 at 16:38

score 10 · Answer 4 · answered May 13 '11 at 12:32

I read somewhere once that said basically: The data is the key of your application. If you will only EVER access data through your UI (and I mean ever, as in now and forever, for all eternity... or the lifetime of your application, anyway) then you don't need database constraints. But there's ever a chance that something other than the app itself will need to touch data, for example a web service, public API, rake task/SQL job/cron/automated script, then you will save yourself a lot of potential trouble down the road by keeping the DB constraints.

I firmly believe this is the one area of software development where you should not apply DRY (and I'm fully expecting a bevy of downvotes for that statement). Your data is the heart and soul of your application - if it's ever corrupted beyond repair, that it: game over. It's worth it IMO to enforce the constraints everywhere they are needed. If that means in the form of Triggers and constraints on the DB level, server-side validations on the middleware, and client-side Javascript on the UI (for web apps), then it's IMO a necessary evil to ensure the data is always pristine.

score 6 · Answer 5 · answered May 13 '11 at 08:42

6

Do you know what ORM means? Object-relational mapping. Quoting Wikipedia "technique for converting data between incompatible type systems". Yup, relational and object models don't fit together. ORMs do a pretty good conversion, by respecting rules of both type systems. RDBMS are organized in such way, that you achieve data integrity by using constraints. In general, integrity is very nice thing to have, so ORMs tend to use them when creating data model for storing object data. Your ORM probably has a good reason to use "non cascading" constraints. And if this forces you to make complicated queries instead of just create/update/drop certain objects, then something is wrong with your ORM setup.

If you consider relational concept to be annoying, then why you don't use object database? Some time ago they were slow (which is why most people still use RDBMSs) but from what I've heard things changed a bit. You would get rid of all the relational nitpicks. Simply objects in, objects out.

answered May 13 '11 at 08:42

Jacek Prucia

2,264
1
17
15

The topic are about move out the constraint functionality from DB and rely on settings/developing within the code base (eg .net speaking: Entity/Linq2Sql). – Independent May 13 '11 at 10:04
Yes I know, but my point is that you need first understand why constraints are there in the first place and then why it might be a bad idea to drop them. – Jacek Prucia May 13 '11 at 10:28
Moved! Not dropped. I understand you regret the knowledge of the qustion, which it wasn't about. – Independent May 13 '11 at 10:42
You can't really move anything between incompatible systems. You are going to drop DB constraints, introduce application constraints and merely hope they will work the same (which might turn out to be true as well as false). Anyway my sincere appology if I misunderstood your question. – Jacek Prucia May 13 '11 at 11:55
Thank's! "Move" means litterary "move". Which means you create (good expression) application constraints at every system. At least each system that can't share same DAL. A very nice example was direct queries from a db admin that "fix something". No db constraints and lack of design knowledge can result in orphaned data or whorse, completely mocked up data. – Independent May 13 '11 at 12:01

score 6 · Answer 6 · 2011-05-13T12:55:49.683

6

Well that is what eBay did and they probably have one of the largest databases in the world:

http://www.dba-oracle.com/oracle_news/news_ebay_massive_oracle.htm http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf

Despite what has been said above about performance being increased by referential integrity, it can actually be degraded; which is why massive databases have been dropping their constraints and doing the work in the application layer. And as far as I can tell it is the only really good reason.

By dropping those constraints you essentially loose your safety net that keeps the data clean, and that brings about its own problems. So as with everything it is a balancing act. I guess that in general maintaining referential integrity is the right thing to do.

Having worked in a development environment with strong referential integrity, I know that from a developer's point of view that it can be a total pain; often in a development environment a bit of dirty data doesn't matter and working out how to delete a row might take an hour or more. However, it can also be very useful, as the constraints make the schema explicit.

edited May 13 '11 at 12:55

answered May 13 '11 at 12:46

Finally someone that understand me :-). You completely right, balance is a really big point here. Moving out constraints to application level may be a safe alternative, if done as a strategic point. It would be nice with some URL's to sites proven degraded performance due to strong constraints/integrity. – Independent May 13 '11 at 13:04
10

Yes and don't forget - **don't forget** - that Ebay, like Facebook and Amazon - is a gazillion times bigger than 99.99% of databases, and what is good for them is probably very different from what is good for your database. – Tony Andrews May 13 '11 at 13:24
2

ANd ebay, Facebook, Amazon probably do not use databases without constraints for their financial and accounting software or their inventory software or their HR data or anywhere where not losing data is critical. – HLGEM May 13 '11 at 16:59
2

If you have enough time, expertise and money you can eventually out program any RDBMS, web server or operating system to fit a specific need. – JeffO May 13 '11 at 17:31
1

eBay didn't do that until the sheer volume of data churn they were dealing with essentially outpaced the ability of the database servers to cope, and they had the millions to invest in their new architecture. If you're doing billions of transactions a day, then by all means get worked up about removing constraints and going to a totally queue-based, transactionless, massively scalable system like eBay. Otherwise, don't underestimate your database server, and don't leave your database subject to data corruption by removing all your constraints. – Craig Tullis Apr 06 '14 at 01:29

sunwukung · Answer 7 · 2011-05-14T11:26:12.863

First - my answer: No, you should not rely on the application alone to look after your data.

This points to a larger debate: ORMs have encouraged a culture of disdain for "direct" DB interaction, often at the expense of normalisation/referential integrity. Tables are forcibly mapped to arbitrary object hierarchies, at the expense of the design implicit in the relational model. The decoupling favoured by OOP is arguably sacrificed here as the application makes it's design felt in the data structure. While ORM has demonstrated great utility, it seems to be based on the abuse or mistrust of SQL.

New paradigms are (re)emerging, take Functional programming for example. If the dev team decides to embrace a new programming methodology then what implications will this have for data that has been structured according to the ORM's requirements?

I agree with @Jacek Prucia - I think ORM is a bad match for RDBMS, I'd personally opt for a DBAL on RDBMS, or go for an OODB with ORM.

+1 for speaking alternatives to the topic. The other side of the debate is of course: "How bad would some data be?" and the answer can be cancellation or billion insertion of money to someones million dollar bank account. As well as some orphaned data that are removed with good cleaning routines. The summary of this topic, looks like consistency to the cost of flexibility. Which in turn fully depends on the severity of the db's content and use. — Independent, May 13 '11 at 16:54

score 3 · Answer 8 · answered May 13 '11 at 15:43

Constraints are your only guarantee that you have consistency and data integrity at the database level. Sure, you can enforce constraints using your application code, but what if, in the future, you need to modify the data directly? You might understand how to maintain data integrity, but someone else might not. Keeping the constraints at the data level ensures that the integrity is ensured even when someone is monkeying around in places they don't understand.

Furthermore, let's say your application needs to be rewritten, but with the same database in place. All those constraints in code are just begging for bugs that prevent some entry whilst allowing erroneous data through.

When developing, keep it simple. Constraints let you do that. (That said, when a constraint throws an error, don't spit the same error back at the user. Make the error understandable.)

(As to the cascade issue: that's a good thing. I'd prefer to throw an error that certain other records have to be deleted first, rather than rely upon the cascade to get everything right. Cascades are nice in theory, but not necessarily so in practice.)

score 2 · Answer 9 · answered May 13 '11 at 15:18

One problem with constraints in a database are that they give the program limited information on what failed and how to fix it. This means that, for smooth handling, it is often necessary to repeat the constraint check in the application, and therefore the database constraint check is wasted effort.

This runs the risk of compromising data integrity, so we've got tradeoffs here. For important data, ensuring data integrity is almost always more important than performance, and it's far better to fail a transaction even if it looks arbitrary than to mess up the data.

To safely remove constraints, it is therefore vital to secure database access so that nothing can change the database without checking the constraints. This is not reliable when writing new applications or coming up with ad hoc ways of dealing with the data, since all it takes is one mistake and the database is corrupt.

Therefore, to dispense with database constraints, it is necessary to establish what can and what cannot be done with the database up front, so that all applications can be written, reviewed, and tested extensively. All database requirements must be established up front, and any change to database requirements will require extensive work. This is something of a frozen waterfall methodology, which works only in very specific cases. (Designing, implementing, and maintaining to requirements is much like walking on water. Something has to be frozen first, and if it isn't frozen quite enough the results can be disastrous.)

One case where it does work is the massive enterprise applications like PeopleSoft and SAP, where the application already does virtually everything, and there are carefully defined ways to extend it. There are other, very rare, possibilities.

So, unless you work on a very large enterprise project (and I wouldn't want to) or can walk on liquid water, leave those constraints in the database.

Thank's for reply. The constrains will be in the db for this project! I'm completely convinced :). I Will also have wider eyes when decide this on future projects and in discussions with other parts. — Independent, May 13 '11 at 17:06
Also consider that without the constraints, you're leaving it up to the application code itself to detect that it screwed up. That's the *same* application code that violated the constraint in your example, by the way, the constraint that saved your database from data inconsistency or corruption. Using constraints does not automatically mean lower performance either, by the way, and not using constraints leaves your database exposed so it can't protect itself. — Craig Tullis, Apr 06 '14 at 01:39

Constraints in a relational databases - Why not remove them completely?

9 Answers9

Linked