I had this case today when I had to add a new column to a table with a gazillion rows and also create an index on this column. In my mind, having a column with NULL values would make the creation of the index instantaneous. I was wrong! In order to create the index, the SQL engine still scans all the rows and for tables with a lot of rows it takes a long while. But it felt really stupid. I knew that the column was filled with NULL values, I didn't need the computing of any index when I create it, instead only on INSERT/UPDATE/DELETE operations. So I started to look into solutions.

Enter filtered indexes! In Microsoft SQL Server 2008 an option for filters on indexes was introduced. The index must not be clustered and the definition it just the same as before, only with a WHERE clause that applies the filter. This seems to be the right solution to my scenario.

In order to test this I created a table called Test with two columns, a and b, both nvarchar(255). I filled the table with ten million rows having values for a and no values for b. Then I created an index on b; it took about 30 seconds. Then I created an index on a; it took 50 seconds. I removed the indexes and created a filtered index on b on the condition that b is not null. The operation was instantaneous. Success!
And here are the actual operations with more exact values (check out the comments for extra SQL tips on speed):
CREATE TABLE Test(a NVARCHAR(255),b NVARCHAR(255))

DECLARE @i INT = 625000 -- ten million divided by 16

BEGIN TRAN -- if not using a transaction, SQL will create a transaction per insert!

WHILE @i>0
BEGIN

SET @i=@i-1
INSERT INTO Test(a) VALUES(CAST(@i AS NVARCHAR(255)))

END

COMMIT TRAN -- 625000 rows with a filled and b empty [8 seconds]

-- insert selects are a lot faster than the while above
INSERT INTO Test(a)
SELECT a+'x1' FROM Test -- 1250000 total rows [3 seconds to create another 625000 rows]

INSERT INTO Test(a)
SELECT a+'x2' FROM Test -- 2500000 total rows [7 seconds]

INSERT INTO Test(a)
SELECT a+'x3' FROM Test -- 5000000 total rows [18 seconds]

INSERT INTO Test(a)
SELECT a+'x4' FROM Test -- 10000000 total rows [46 seconds]

CREATE INDEX IXa ON Test(a) -- [27 seconds]

DROP INDEX IXa ON Test

CREATE INDEX IXb ON Test(b) -- [25 seconds, approximately the same]

DROP INDEX IXb ON Test

CREATE INDEX IXa ON Test(a) -- there are no null values in the a column [56 seconds, double for a filtered index with a useless filter]
WHERE a IS NOT NULL

DROP INDEX IXa ON Test

CREATE INDEX IXb ON Test(b) -- [0 seconds - instant!]
WHERE b IS NOT NULL

A lot of good things can come from using filtered indexes, like for example a unique index on non-null values (which was pretty much a pain in the ass to do otherwise), but there are also gotchas. One of the things you have to watch out for is using filtered indexes on numeric columns. In this case the SET ARITHABORT ON command must be used (or insure in some other way that the option is on for all SQL sessions - SQL Management Studio and code both!, otherwise errors might occur or the index might be ignored. Also there seem to be some bugs that have not been addressed even in SQL 2012, like when using MERGE or when using filtered indexes on column being or not being null.

An more detailed article on this feature can be found here: SQL University: Advanced Indexing – Filtered Indexes.

Comments

Be the first to post a comment

Post a comment