10

Looking at DB tables created by a different developer I have noticed that whenever a table had a forein_key_id field/column, it was always an INDEX/KEY.

I am not sure if it was manually created, or automatically by some 3rd party software. However, I myself usually created keys using different principles. i.e. Just because it is a foreign key, I don't use that fact to also make it an INDEX. I would typically look at JOIN considerations and JOIN performance individually for a particular query first.

So I am curious to find out more about this -- do foreign keys have to be made into an INDEX? If yes, what is the rationale behind it?

Tulains Córdova
  • 39,201
  • 12
  • 97
  • 154
Dennis
  • 8,157
  • 5
  • 36
  • 68
  • 1
    MySQL [automatically creates an index on a foreign key](https://dev.mysql.com/doc/refman/5.7/en/create-table-foreign-keys.html)... not sure why you think it's not happening. The normal reason is that cascade deletes from the parent table can be blocking if there's no index and even updates/inserts on the child table in certain circumstances (no idea about MySQL for that last). – Ben Aug 22 '16 at 20:02
  • http://sqlblog.com/blogs/greg_low/archive/2008/07/29/indexing-foreign-keys-should-sql-server-do-that-automatically.aspx – Adrian Iftode Aug 22 '16 at 20:44

2 Answers2

12
  • Most, if not all, RDBMS automatically create an index on a FK because a FK means you will be doing joins using those columns.

  • It's standard RDBMS behavior based on the fact that you will be using those columns in searches.

  • A query with a join in it, even when absent a WHERE clause, is doing searches on the joined columns and comparing then with the PK of another table-

  • RDBMSs like to speed up joins. So they create an index on FK.

  • The optimizer, based on statistics, or based on access costs will decide whether or not to use such an index when executing a query.

Tulains Córdova
  • 39,201
  • 12
  • 97
  • 154
  • Thanks. One of my concerns was that i.e. I have 5-6 foreign keys, the table will have 5-6 indices. All of them are updated on each `INSERT`. If I only use 3 indices 90% of the time, it may be more beneficial to tweak or DROP the indices that are not being utilized. It's a performance/clutter consideration on my side – Dennis Aug 22 '16 at 21:00
  • 6
    Dropping the indexes also will slow down inserts, updates and deletes because then the engine will have to check referential integrity by sequentially scanning instead of index scanning or index look up. If you have 6 FK in a table that means it shouldn't be a transactional one but some kind of historical summary table. Maybe you should normalize a little bit more. – Tulains Córdova Aug 22 '16 at 21:08
  • 18
    It is most certainly _not_ standard RDBMS behavior. Oracle, PostgreSQL, SQL Server and Sybase don't, and I'm pretty sure DB2 doesn't, either. There are valid reasons not to index a foreign key column, and MySQL is the only RDBMS I know of that forces it. – Blrfl Aug 24 '16 at 13:56
  • 1
    SQLite also doesn't create index on a FK column automatically. Most of the sentences in this answer are false. – iwis Apr 30 '21 at 17:12
  • [When did SQL Server stop putting indexes on foreign key columns](https://www.sqlskills.com/blogs/kimberly/when-did-sql-server-stop-putting-indexes-on-foreign-key-columns/) TL;DR: it never did – MMalke Nov 12 '21 at 21:49
5

First, there is no automatic indexing of FOREIGN KEY in RDBMS except MySQL, which is a stupid behavior.

Second, in some cases indexing a FK create an included redundant index, especially when :

  • the table is an associating table
  • the table cames from an inherited design

Third, indexing systematically FK is not a good pratice, except if all your queries is just a join without having any other column used in any other part of the query for the children table.

Some explanations...

POINT 2

First example, associating table :

CREATE TABLE Orders (ord_id int PRIMARY KEY, ...)
CREATE TABLE Product (prd_id int PRIMARY KEY, ...)
CREATE TABLE order_details (ord_id int NOT NULL REFERENCES Orders (ord_id), prd_id int NOT NULL REFERENCES Product (prd_id), PRIMARY KEY (ord_id, prd_id), ...)

Adding an index on ord_id for order_details table is stupid, because the PRIMARY KEY has already an index (ord_id, prd_id) that can be use instead of the simple (ord_id) FK. index.

Second example, inherited table :

CREATE TABLE vehicles (vhc_id int PRIMARY KEY, ...)
CREATE TABLE vehicle_cars (vhc_id int PRIMARY KEY REFERENCES  vehicles (vhc_id , ...)

Adding an index on vhc_id is even more stupid, because the PRIMARY KEY has already an index (vh_idc) that is strictly the same.

POINT 3

CREATE TABLE customers (ctm_id int PRIMARY KEY, ...)
CREATE TABLE Orders (ord_id int PRIMARY KEY, ctm_id int NOT NULL REFERENCES customers (ctm_id), ...)

Adding an index on ctm_id column in the orders table will be used only in two cases :

  1. the index itself is sufficient to retrieve all the data for the join
  2. there is few rows returned from the orders table

Subpoint 2.1 example :

SELECT ctm_name, COUNT(*)
FROM   customers AS c
       JOIN Orders AS o 
          ON c.ctm_id = o.ctm_id
GROUP BY ctm_name;

Subpoint 2.2 example :

SELECT *
    FROM   customers AS c
           JOIN Orders AS o 
              ON c.ctm_id = o.ctm_id
WHERE ctm_id = 123; 

In any other query, there is great "luck" that the FK index won't be used because the cost of using the index in a "seek" way, then joining the index to the table to retrieve all other columns that are not in the index, will be much more than scanning the table.

SO, SYTEMATICALLY INDEXING FOREIGN KEY IS MOSTLY STUPID !

Remember that indexes is very costly in terms of transactions when the data is modified (INSERTs, UPDATEs, DELETEs, MERGEs, TRUNCATEs...). So, do not create useless indexes !

To be clear, the best choice for every index is to have a COVERING index, which means that the one and only index itself is sufficient for the whole query, including WHERE, ON (from JOINs), HAVING, GROUP BY, ORDER BY and SELECT. This being greatly facilitated by the introduction of the INCLUDE clause for indexes in some RDBMSs (SQL Server since version 2008 and PostGreSQL since version 11 in 2018)

Of course, the choice of indexing must result from the exploitation of the database and not from a dogma!

Some RDBMS (Microsoft SQL Server since the 2008 version) systematically offers a full diagnostic of indexes to be create, and it is very rare that indexes concerning only the foreign key column either reported by the diagnostic system... As an example, this query for Microsoft SQL Server :

SELECT * FROM sys.dm_db_missing_index_details;

Give the list of all indexes needed to help performances for queries really executed since the start of the SQL Server instance and some more details, like the potential gain, if the index is created...

SQLpro
  • 159
  • 1
  • 3
  • It's worth keeping in mind that if you ever delete a parent record, the database engine will need to check all the child tables for usage of that key value to enforce referential integrity. Without a FK index on those child tables (or an alternative index leading with that column or columns), this will result in table scans rather than an index seek. In some cases, this may be an acceptable tradeoff, but in other situations it could result in very poor performance. – G-Mac Apr 21 '23 at 22:29
  • Yes you are right... but deleting rows is not a common operation in a database. In general we keep the information, or else we delete it in large masses with other conditions. Using CASCADE mode is heresy! – SQLpro Apr 24 '23 at 08:23