21

In my 10+ years of experience in the IT field, I have never used foreign keys in any of my projects and I never felt the need. I did work with professional databases that had foreign keys constraints.

I am now at a position where we are building a new application/database and I am thinking should I use foreign keys or not? This is going to be a professional product. I will also consider implementing this in my existing projects if I get a satisfactory answer.

This article on why to use foreign keys exactly addresses my concerns. It main crux is

  1. It maintains referential integrity (yes but can be maintained without it too)
  2. Easier Detective work (of course)
  3. Better Performance (I am not quite sure)

My question is, should I use foreign keys or can I live without them. What are strong pros and cons from a developer who worked in such scenarios.


Example: Now an important part of using foreign keys is added complexity that is added to design. For example, a simple delete may not work, or it may delete other records you are not aware of. Let's consider that scenario.

I have a database with user and user_comments tables.

create table user(
user_id int not null identity,
user_name varchar(50),
...
)

create table user_comment(
comment_id int not nul identity,
user_id int,
CONSTRAINT FK_USER_USERID FOREIGN KEY (user_id)     
    REFERENCES user (user_id)
    ON DELETE CASCADE    
    ON UPDATE CASCADE 
)

Here, if I delete a user, all his comments will automatically be deleted. I know I can change that behavior, but my question is, are Foreign keys worth using with their added complexity? What are pros and cons from SE.stackexchange users? Am I potentially looking at some horror stories? Can someone comment on how it improves performance?

jeffhale
  • 103
  • 4
TheTechGuy
  • 1,035
  • 1
  • 8
  • 16
  • What do you mean by added complexity? If you are unsure about cascading you could put "RESTRICT" there so that it only rejects queries which violate the integrity. – max630 Jul 23 '18 at 06:34
  • The same thing that you mentioned above, a new user does not know 'these things' on his own. In fact I stumbled upon some information only from reading the reference article (did not know delete can automatically delete other orphan records) – TheTechGuy Jul 23 '18 at 06:37

2 Answers2

35
  1. It maintains referential integrity (yes but can be maintained without it too)

You are technically correct that if you're able to maintain referential integrity yourself, you don't need the constraint to exist. But by that same logic, you don't need fire insurance as long as your house doesn't burn down, and you don't need health insurance as long as you don't get sick.

While technically correct, the underlying assertion than you can do it all flawlessly is simply a failure to recognize the possibility of you (or any other developer) making a mistake.

Accidentally breaking referential integrity without foreign keys works without any issue. But later on, when you want to retrieve the data, it blows up in your face.

  • Who set this data?
  • When did they set it?
  • Why did they set it to this value?

These question become very hard to answer.

Accidentally breaking referential integrity with foreign keys blows up in your face immediately.

  • Who set this data? You did.
  • When did they try to set it? Right now.
  • Why did they set it to this value? Since you're doing it right now, you're logically the best source to know what it is you're trying to do.

Troubleshooting the issue becomes so much easier when you're at the source of the problem already.

  1. Easier Detective work (of course)

I assume you mean the thing I just described.

  1. Better Performance (I am not quite sure)

Can someone comment how how it improves performance?

Foreign keys don't improve performance, at least not directly. The performance gain is achieved by the use of indexes. It just so happens that PKs and FKs are automatically indexed because they are very frequently used for searching, making them prime targets for search optimization.

In here, if I delete a user, all his comments will automatically be deleted.

This is not inherent to a foreign key. This is inherent to setting ON DELETE CASCADE on the foreign key. Cascaded deletes are a nice-to-have feature but they are not the core use case of foreign keys. The core use case is maintaining referential integrity.

My question is should I use foreign keys or can I live without it. What are strong pros and cons from a developer who worked in such scenarios.

my question is, is Foreign keys worth using with its added complexity

I'm not seeing the complexity you're talking about.

If you claim to already be capable of handling referential integrity, that means that I should be able to sneakily put a FK on your FK-less column, and you would be unable to notice that I put an FK on your column. There is no complexity from having the FK.

Setting up the FK is trivial. Yes, it requires an explicit SQL command, but the command is very copy/pastable:

CONSTRAINT unique_name FOREIGN KEY fk_column_name REFERENCES pk_table (pk_column_name)

While the lazy developer in me does wonder if naming a constraint is really necessary, the other information you need to add is logically always required to set up a relation between two columns. Other than the name, it's about as simple as it can be.

The performance gain from having an index on the column is inherent to having an FK on the column. Setting an index without a FK is about as complex as setting a FK:

CREATE INDEX unique_name ON fk_table_name (fk_column_name)

So again, I'm not seeing the added complexity from actually using a foreign key.

Flater
  • 44,596
  • 8
  • 88
  • 122
  • Should I be always be using it, is it good practice, can it get me into nightmare positions 'locked rows', 'rows cant be delete', etc etc. I found this relevant question [Why use Foreign Key constraints in MySQL?](https://stackoverflow.com/questions/3433975/why-use-foreign-key-constraints-in-mysql) and the last comment on accepted answer is interesting. – TheTechGuy Jul 23 '18 at 07:32
  • 5
    @Noname: Reading the linked article in that comment, this is a fringe case that is unique to MySQL/Linux. In all my (9) years of software development using SQL on Windows, I have never encountered issues that were caused by an FK (this includes working on bulk-data government projects). "Rows can't be deleted" is not an issue, it is a _good thing_. It enforces that you can't delete an entry while there are other entries who depend on it (which is what referential integrity is all about). The issue here is not in blocking the delete, but in thinking that the delete shouldn't be blocked. – Flater Jul 23 '18 at 07:38
  • 1
    @downvoter: care to elaborate? – Flater Jul 23 '18 at 12:23
  • Foreign keys can improve performance to, in the case of join elimination. – Richard Ward Nov 14 '18 at 11:29
  • 'But by that same logic, you don't need fire insurance as long as your house doesn't burn down, and you don't need health insurance as long as you don't get sick.' Great analogy! – Chen Ni May 31 '21 at 13:25
4

If you delete a user, why would you possibly want to keep their now-orphaned comments around?

The central advantage of a relational data store is being able to guarantee that such anomalies never happen. If you like and want such guarantees, then it's a good idea to implement them at the lowest possible level, i.e. build them into the database engine, because this is faster and safer than doing it yourself.

If you don't want them, then you are better off with a non-relational data store. But using one model of engine and then refuse to take advantage of one of its major benefits seems oddly pointless.

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
  • You are making sense of course because I had spammer user and I had to use join queries to delete their spam comments. But later I decided to keep the users in the database and mark them `spammer` so they can't comment again with the same ID. But my question is broader than this scenario. – TheTechGuy Jul 23 '18 at 06:34
  • 1
    Lots of forum software keeps posts by deleted users around. This is hardly a rare scenario. – Sebastian Redl Jul 23 '18 at 07:56
  • @Noname - you're the one who brought up that scenario!? – JᴀʏMᴇᴇ Jul 23 '18 at 08:54
  • 2
    @SebastianRedl While this is true, the preferred mechanism is to flag the user as deleted rather than have fragmented entities scattered across the database. – Robbie Dee Jul 23 '18 at 09:17
  • @RobbieDee by having 'close relationship' like in the example I quoted, one looses a bit of freedom I guess. That is my concern. – TheTechGuy Jul 23 '18 at 09:21
  • 1
    @Noname You lose freedom that you shouldn't have in the first place. Having full freedom implies you also have full freedom to break everything horribly. Foreign key constraints are a tool to avoid letting everything break - they force you to deal with any issues that will crop up instead of letting them slide now and failing later. – Delioth Jul 23 '18 at 18:59
  • I think @RobbieDee comment is to the point. We have to learn to do things slightly differently. May be another approach is to move deleted records to a different database, in case you want to check them later. I don't want to rely on 'computer logic' that since this business is deleted, all its reviews should also be deleted. – TheTechGuy Jul 24 '18 at 04:58
  • @Noname - you're not listening at all. Somebody has already mention delete cascade, and the reliance on computer logic to stop your 'freedom' of breaking things is recommended. – JᴀʏMᴇᴇ Jul 24 '18 at 08:05