Are Foreign Keys Bad?
The Problem
Mike Simpson's post on foreign keys raises some good points: http://www.slipjig.org/Mike/post/2007/11/Are-Foreign-Keys-Bad--You-Decide!.aspx. The main issue raised is how foreign keys cause deadlocks. In order to avoid deadlocks, you have to acquire locks on records in the same order, always. When you insert and update records related by foreign keys, you lock records from the parent tables to the child tables. To delete records, you lock records in the opposite order (child to parent tables) due to foreign key constraints, leading to potential deadlocks.
The Usual Solutions
There are three general approaches to deal with deadlocks:
- Add retry code to handle deadlocks. Typically this is a lot of work and error prone -- not to mention difficult to test. You generally don't see many developers doing this due to the effort involved.
- DELETE in the opposite order and allow deadlocks to occur. This isn't as bad as it seems. It is common to have a little validation in the database layer -- for instance for "The username must be unique" type validation. So you treat the deadlock like a validation error, report it to the user, and let the user hit the Save button again. Keep in mind that you may have a backend processes that can deadlock, which isn't ideal -- these backend processes don't have to DELETE in order to deadlock. Even if it just has INSERTs and UPDATEs, it still can deadlock with a user who is DELETing in the opposite order (or more generally, locking in a different order).
- "Logically delete" in the same order as INSERTs and UPDATEs to avoid deadlocks. Logical deletes are really updates where you set a field such as "IsDeleted" to true. The downside to this approach is all your SELECT statements have to filter out the "deleted" records, which could be error prone. The difference between this approach and the first approach though, is that this approach is much easier to test.
Don't Use Foreign Keys!?!?
Mike proposes another idea -- don't use foreign keys! Mike's good about coming up with ideas no one else thinks of, but in this case, is this going too far? Can you justify the tradeoff of data integrity for avoiding deadlocks? Personally, data integrity is more important. Despite this, I want to argue Mike's side a bit more, b/c well, there are never hard and fast rules in software, like "you must always use foreign keys".
Foreign keys causes additional locks that you may not be aware of -- beyond dictating the order you modify records. Let's say you have the following two tables: Player and Team. There is a foreign key from the Player table to the Team table. The Team table has the following records:
| TeamID | TeamName |
|---|---|
| 1 | Manchester United |
| 2 | Liverpool |
The Player table has the following record:
| PlayerID | TeamID | PlayerName |
|---|---|---|
| 1 | 2 | Gerrard |
So Gerrard plays for Liverpool. Two impossible things are about to happen: a) Manchester United is going to be relegated (so we need to delete the team), and Gerrard is going to play for Manchester United.
User A executes the following statement:
BEGIN TRAN;
DELETE FROM Team WHERE TeamID = 1;
Internally in SQL Server, the table looks like:
| TeamID | TeamName | |
|---|---|---|
| 1 | Manchester United | marked to be deleted and locked |
| 2 | Liverpool |
When User B executes the following statement, it will block, b/c it is attempting to read Team 1 (Man U), but the record is locked and can't be read.
BEGIN TRAN;
UPDATE Player SET TeamID = 1 WHERE PlayerID = 1;
The statement is blocked and waiting for the first user to commit the transaction. Foreign keys cause additional locks to be made. Not only that, but the locking goes from a child table to a parent table! This is in the opposite order we modify the records which can lead to deadlocks! (Even though the example above includes a DELETE on the team table, an UPDATE would lock exactly the same way in case you are thinking about doing logical deletes).
Mike's post talks about a potential new feature in SQL Server where constraints are checked when the transaction is committed, not when individual records are modified, but is it really the answer? It will delay the foreign checking until the transaction is committed, but while it is checking the constraint, it will lock the records. This causes the locking for the whole transaction to be in random order, which will cause deadlocks.
SELECTs Lock Too!
Another thing on deadlocks, SELECTs lock records too! And therefore can deadlock. With joins, it's anyone's guess the order in which it locks records (parents first, or children first). As it turns out, most of the deadlocks I've seen have come from SELECT statements. The best way to avoid SELECT statements that lock in SQL Server 2005 is to use READ_COMMITTED_SNAPSHOT. To enable it, run the following code:
ALTER DATABASE MyDatabase
SET READ_COMMITTED_SNAPSHOT ON
It works very much like READ_COMMITTED, however without locking records. The downside is that READ_COMMITTED_SNAPSHOT uses more I/O than READ_COMMITTED.
Personal Preference
To avoid deadlocks I have a bias towards the following approach. After reading the above, you know there is no silver bullet, but this is a good balance of deadlock avoidance, data integrity, and ease of programming:
- Use foreign keys
- Do logical deletes
- INSERT, UPDATE and Logically delete tables in the same order
- Use READ_COMMITTED_SNAPSHOT isolation level.
Excellent post Clay! In defense of my proposing removing foreign key constraints, I did state that “To date I have not found it necessary to go to this extreme.” Generally I prefer the approach Clay described at the end of his post. But I thought this was a good topic of discussion - it’s not something many developers consider in much detail.