SQL Server Hash Indexes
There are two problems with indexes on large nvarchar columns:
- You will likely hit the 900 byte limit in your index
- Indexing large data isn't efficient anyway.
A neat feature of SQL Server is the CHECKSUM() function which hashes your varchar/nvarchar values into a 4 byte number. You can then use this value in an index. For example if you have a Site table, add a calculated column, URLChecksum.
CREATE TABLE Site ( SiteID int NOT NULL, URL nvarchar(2083) NOT NULL, URLChecksum AS (checksum([URL])), CONSTRAINT [PK_Site] PRIMARY KEY CLUSTERED (SiteID) );
Next create an index on the hash and include the URL:
CREATE INDEX IX_Site ON Site (URLChecksum) INCLUDE (URL);
This index will make the following query faster:
SELECT SiteID FROM Site WHERE URLChecksum = CHECKSUM(N'http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65-cb53442d9e19&displaylang=en') AND URL = N'http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65-cb53442d9e19&displaylang=en';
This query will first "seek" the hash value in the index very quickly, since the hash values are just ints. Once it finds one or more matching hash values, it will check that the URLs match. Since the URLChecksum, URL, and SiteID values are included in the index, this query does not need to touch the Site table.