Sorting uniqueidentifiers in SQL Server 2005

I had an issue recently where I needed to sort on a uniqueidentifier column and read the data in .Net. I found that .Net sorts Guids differently than SQL Server.

You can see for yourself. :) Run the following code.


DECLARE @t TABLE (
   g uniqueidentifier
); 

INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000000001' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000000010' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000000100' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000001000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000010000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000000100000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000001000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000010000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-000100000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-001000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-010000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0000-100000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0001-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0010-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-0100-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0000-1000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0001-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0010-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-0100-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0000-1000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0001-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0010-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-0100-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000000-1000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000001-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000010-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00000100-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00001000-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00010000-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '00100000-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '01000000-0000-0000-0000-000000000000' );
INSERT INTO @t ( g ) VALUES ( '10000000-0000-0000-0000-000000000000' ); 

SELECT * FROM @t order by g ;

It returns the data in the following bizarre order. Keep in mind the first row is the “smallest” number.

g
01000000-0000-0000-0000-000000000000
10000000-0000-0000-0000-000000000000
00010000-0000-0000-0000-000000000000
00100000-0000-0000-0000-000000000000
00000100-0000-0000-0000-000000000000
00001000-0000-0000-0000-000000000000
00000001-0000-0000-0000-000000000000
00000010-0000-0000-0000-000000000000
00000000-0100-0000-0000-000000000000
00000000-1000-0000-0000-000000000000
00000000-0001-0000-0000-000000000000
00000000-0010-0000-0000-000000000000
00000000-0000-0100-0000-000000000000
00000000-0000-1000-0000-000000000000
00000000-0000-0001-0000-000000000000
00000000-0000-0010-0000-000000000000
00000000-0000-0000-0001-000000000000
00000000-0000-0000-0010-000000000000
00000000-0000-0000-0100-000000000000
00000000-0000-0000-1000-000000000000
00000000-0000-0000-0000-000000000001
00000000-0000-0000-0000-000000000010
00000000-0000-0000-0000-000000000100
00000000-0000-0000-0000-000000001000
00000000-0000-0000-0000-000000010000
00000000-0000-0000-0000-000000100000
00000000-0000-0000-0000-000001000000
00000000-0000-0000-0000-000010000000
00000000-0000-0000-0000-000100000000
00000000-0000-0000-0000-001000000000
00000000-0000-0000-0000-010000000000
00000000-0000-0000-0000-100000000000

In the end, I decided to SELECT two bigint columns that indicate how SQL Server is sorting the data. This is CPU intensive, so it isn’t ideal, however it shows SQL Server’s strange sorting behaviour of the uniqueidentifier column.

CREATE FUNCTION dbo.GuidHigh
(
	@g uniqueidentifier
)
RETURNS bigint
AS
BEGIN

	DECLARE @s varchar(40);
	SET @s = @g;
	-- @s is in the format 3B3A8D04-5D0C-4E0C-AC69-EFC14EE7D849

	SET @s = REPLACE(@s, '-', '');
	-- @s is in the format 3B3A8D045D0C4E0CAC69EFC14EE7D849

	DECLARE @highA varchar(40);
	DECLARE @highB varchar(40);

	SET @highA = SUBSTRING(@s, 21, 12);
	SET @highB = SUBSTRING(@s, 17, 4);

	DECLARE @high varchar(40);
	SET @high = @highA + @highB;

	DECLARE @MinBigInt numeric(21,0);
	SET @MinBigInt = 9223372036854775808;

	RETURN CAST(dbo.[HexStrToNumeric](@high) - @MinBigInt as bigint);

END
GO

CREATE FUNCTION dbo.[GuidLow]
(
	@g uniqueidentifier
)
RETURNS bigint
AS
BEGIN

	DECLARE @s varchar(40);
	SET @s = @g;
	-- @s is in the format 3B3A8D04-5D0C-4E0C-AC69-EFC14EE7D849

	SET @s = REPLACE(@s, '-', '');
	-- @s is in the format 3B3A8D045D0C4E0CAC69EFC14EE7D849

	DECLARE @lowA varchar(40);
	DECLARE @lowB varchar(40);
	DECLARE @lowC varchar(40);
	DECLARE @lowD varchar(40);
	DECLARE @lowE varchar(40);
	DECLARE @lowF varchar(40);
	DECLARE @lowG varchar(40);
	DECLARE @lowH varchar(40);

	SET @lowA = SUBSTRING(@s, 15, 2);
	SET @lowB = SUBSTRING(@s, 13, 2);
	SET @lowC = SUBSTRING(@s, 11, 2);
	SET @lowD = SUBSTRING(@s, 9, 2);
	SET @lowE = SUBSTRING(@s, 7, 2);
	SET @lowF = SUBSTRING(@s, 5, 2);
	SET @lowG = SUBSTRING(@s, 3, 2);
	SET @lowH = SUBSTRING(@s, 1, 2);

	DECLARE @low varchar(40);
	SET @low = @lowA + @lowB + @lowC + @lowD + @lowE + @lowF + @lowG + @lowH;

	DECLARE @MinBigInt numeric(21,0);
	SET @MinBigInt = 9223372036854775808;

	RETURN CAST(dbo.[HexStrToNumeric](@low) - @MinBigInt as bigint);

END
GO

-- do not include "0x" in the parameter, just a string like "8E75EF35FF75A977"

CREATE FUNCTION dbo.[HexStrToNumeric](@hexstr varchar(16))
RETURNS numeric(21, 0) -- enough for 2^64
AS
BEGIN
    DECLARE @hex char(2), @i int, @count int, @result numeric(21, 0), @power numeric(21, 0);
    SET @result = 0;
    SET @count = LEN(@hexstr)
    SET @i = 1
    SET @power = 1;
    WHILE (@i <= @count)
    BEGIN
	SET @power = @power * 16;
        SET @i = @i + 1
    END;

    SET @i = 1
    WHILE (@i <= @count)
    BEGIN
 	SET @power = @power / 16;
        SET @hex = SUBSTRING(@hexstr, @i, 1)
        SET @result = @result + @power *
                CASE WHEN @hex LIKE '[0-9]'
                    THEN CAST(@hex as int)
                    ELSE CAST(ASCII(UPPER(@hex))-55 as int)
                END
        SET @i = @i + 1
    END
    RETURN @result
END
GO
Posted in SQL Server Development | 12 Comments

Are Foreign Keys Bad?

The Problem

Mike Simpson’s post on foreign keys raises some good points: http://www.slipjig.org/Mike/post/2007/11/Are-Foreign-Keys-Bad–You-Decide!.aspx. The main issue raised is how foreign keys cause deadlocks. In order to avoid deadlocks, you have to acquire locks on records in the same order, always. When you insert and update records related by foreign keys, you lock records from the parent tables to the child tables. To delete records, you lock records in the opposite order (child to parent tables) due to foreign key constraints, leading to potential deadlocks.

The Usual Solutions

There are three general approaches to deal with deadlocks:

  • Add retry code to handle deadlocks. Typically this is a lot of work and error prone — not to mention difficult to test. You generally don’t see many developers doing this due to the effort involved.
  • DELETE in the opposite order and allow deadlocks to occur. This isn’t as bad as it seems. It is common to have a little validation in the database layer — for instance for “The username must be unique” type validation. So you treat the deadlock like a validation error, report it to the user, and let the user hit the Save button again. Keep in mind that you may have a backend processes that can deadlock, which isn’t ideal — these backend processes don’t have to DELETE in order to deadlock. Even if it just has INSERTs and UPDATEs, it still can deadlock with a user who is DELETing in the opposite order (or more generally, locking in a different order).
  • “Logically delete” in the same order as INSERTs and UPDATEs to avoid deadlocks. Logical deletes are really updates where you set a field such as “IsDeleted” to true. The downside to this approach is all your SELECT statements have to filter out the “deleted” records, which could be error prone. The difference between this approach and the first approach though, is that this approach is much easier to test.

Don’t Use Foreign Keys!?!?

Mike proposes another idea — don’t use foreign keys! Mike’s good about coming up with ideas no one else thinks of, but in this case, is this going too far? Can you justify the tradeoff of data integrity for avoiding deadlocks? Personally, data integrity is more important. Despite this, I want to argue Mike’s side a bit more, b/c well, there are never hard and fast rules in software, like “you must always use foreign keys”.

Foreign keys causes additional locks that you may not be aware of — beyond dictating the order you modify records. Let’s say you have the following two tables: Player and Team. There is a foreign key from the Player table to the Team table. The Team table has the following records:

TeamID TeamName
1 Manchester United
2 Liverpool

The Player table has the following record:

PlayerID TeamID PlayerName
1 2 Gerrard

So Gerrard plays for Liverpool. Two impossible things are about to happen: a) Manchester United is going to be relegated (so we need to delete the team), and Gerrard is going to play for Manchester United.

User A executes the following statement:

BEGIN TRAN;

DELETE FROM Team WHERE TeamID = 1;

Internally in SQL Server, the table looks like:

TeamID TeamName
1 Manchester United marked to be deleted and locked
2 Liverpool  

When User B executes the following statement, it will block, b/c it is attempting to read Team 1 (Man U), but the record is locked and can’t be read.

BEGIN TRAN;

UPDATE Player SET TeamID = 1 WHERE PlayerID = 1;

The statement is blocked and waiting for the first user to commit the transaction. Foreign keys cause additional locks to be made. Not only that, but the locking goes from a child table to a parent table! This is in the opposite order we modify the records which can lead to deadlocks! (Even though the example above includes a DELETE on the team table, an UPDATE would lock exactly the same way in case you are thinking about doing logical deletes).

Mike’s post talks about a potential new feature in SQL Server where constraints are checked when the transaction is committed, not when individual records are modified, but is it really the answer? It will delay the foreign checking until the transaction is committed, but while it is checking the constraint, it will lock the records. This causes the locking for the whole transaction to be in random order, which will cause deadlocks.

SELECTs Lock Too!

Another thing on deadlocks, SELECTs lock records too! And therefore can deadlock. With joins, it’s anyone’s guess the order in which it locks records (parents first, or children first). As it turns out, most of the deadlocks I’ve seen have come from SELECT statements. The best way to avoid SELECT statements that lock in SQL Server 2005 is to use READ_COMMITTED_SNAPSHOT. To enable it, run the following code:

ALTER DATABASE MyDatabase
SET READ_COMMITTED_SNAPSHOT ON

It works very much like READ_COMMITTED, however without locking records. The downside is that READ_COMMITTED_SNAPSHOT uses more I/O than READ_COMMITTED.

Personal Preference

To avoid deadlocks I have a bias towards the following approach. After reading the above, you know there is no silver bullet, but this is a good balance of deadlock avoidance, data integrity, and ease of programming:

  • Use foreign keys
  • Do logical deletes
  • INSERT, UPDATE and Logically delete tables in the same order
  • Use READ_COMMITTED_SNAPSHOT isolation level.
Posted in SQL Server Development | 19 Comments

Hello world!

I thought that blogs would be a fad, but it seems to have out lasted the fad stage   I use blogs everyday in my work looking for answers that other people have already seen and answered.  I hope with this blog to either add to the knowledge on the web, or link to good explanations so that others can find them more easily.

Posted in Uncategorized | 5 Comments