Index Performance archives - SQL Server with Mr. Denny

SQL Server with Mr. Denny:

Index Performance

Sep 17 2009   11:00AM GMT

Dates can easily be the hardest datatype to work with.



Posted by: mrdenny
T/SQL, Index Performance, Index Scan, Index Seek, Indexing

The datetime data type can be one of the hardest to work with when it comes to index optimization.  Most queries that use a datetime data type for filtering (part of the where clause) only want to match the date portion of the value.  Most people handle this via a convert function around the date column.  This causes the index that you create to become useless as the convert function causes the index to be scanned not seeked.

Continued »

Sep 9 2009   1:55AM GMT

Whitepaper on why database maintenance is important



Posted by: mrdenny
SQL Server, Index Performance, Statistics, Performance Problems, SQL Server 2005, SQL Server 2008, SQL Server 2000, White Paper

Today I’m publishing a white paper which I wrote for our Customer Service department to give to our customers. It explains why database maintenance is so important in keeping your database happy and healthy. Continued »


Mar 12 2009   11:00AM GMT

Back To Basics: Reading an Execution Plan



Posted by: mrdenny
SQL, SQL Server 2000, SQL Server 2005, Execution Plan, Back To Basics, SQL Server 2008, Index Performance

All to often when helping people look at query performance problems I’ll ask them to look at the execution plan, and see what it says.  Most query performance problems can be resolved simply by looking at the execution plan and seeing where you need to add an index.

All to often I (and others) then get asked, how to I look at the execution plan, and what does it mean?

Continued »


Jan 18 2009   1:04PM GMT

January 2009 Code Camp Presentations



Posted by: mrdenny
SoCal Code Camp, In Person Events, Index Performance, Service Broker

With the SoCal Code Camp this weekend, I’ve managed to get my slide decks put together early enough to get them posted well in advance this time. Continued »


Nov 24 2008   1:00PM GMT

Back To Basics: What’s the difference between a Scan and a Seek?



Posted by: mrdenny
SQL Server, Back To Basics, Index Performance, Index Scan, Index Seek, Table Scan

There are a few basic operations which SQL will perform when looking for the data that you need.  Here they are listed in the order of worst to best.

  • Table Scan
  • Clustered Index Scan
  • Index Scan
  • Clustered Index Seek
  • Index Seek

The basic rule to follow is Scans are bad, Seeks are good.

Continued »


Jul 21 2008   11:00AM GMT

A better way to index text data



Posted by: mrdenny
SELECT statement, T/SQL, Uni-code, Query tuning, Tables, Index Performance

Indexing text data (varchar, nvarchar, char, etc) is a good way to make it faster to find the data you are looking for.  However these indexes can end up being very hard on the disks behind the index, as well as the memory of the server.  This is because of the large amount of data being put in the index.

As an example, let’s say that we have a table like this.

CREATE TABLE Employee
(EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255))

Now assume that you want to be able to search by the EmailAddress field.  We will then want to index the EmailAddress field with a non-clustered index.  If we work for a company like AMD, then our email addresses will be pretty short (f.lastname@amd.com).  However if we work for a company like I work for then the email addresses are a bit longer (flastname@awarenesstechnologies.com).  Now when we index this column we will be putting the entire email address into the index, taking up a lot of space within the index; especially compared to a numeric value such as an integer.  This becomes doubly true if you are using a uni-code data type as each character requires two bytes of storage instead of the usual one.

This also becomes a problem if you are working on a system with URLs in the field to be indexes.  Depending on the length of the URL, the values may be longer than is allowed in an index which could then give you sorting problems on the indexes.

There are a couple of variations on this technique which I’ve seen.  The one I’ve used the most is to use the CHECKSUM function as part of a calculated column, and then index the calculated column.  This way you simply get the CHECKSUM of the value you want to find, and search the calculated column.  As we are now have an index made up of integers the index can fit a lot more data on each physical data page reducing the IO cost of the index seek as well as saving space on the disk.

So doing this turns our table into something like this.

CREATE TABLE Employee
(EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255),
EmailAddressCheckSum AS CHECKSUM(EmailAddress))

Now I wouldn’t recommend using this technique for each table you create.   I usually only recommend a technique like this when the value to be indexes won’t fit within the bounds of the index, or the table will be very large and searched often so the memory saved is worth the extra CPU time of having to hash the values before doing the lookup.

Now there are a couple of gotchas with this technique.  If you are check summing domain names, some characters don’t check sum correctly.  Also check summing a Unicode version of a string will give you a different result than the non-unicode version of the same string.

You can see that with these three SELECT statements.

SELECT CHECKSUM(’google.com’), CHECKSUM(’g-oogle.com’)
SELECT CHECKSUM(’google.com’), CHECKSUM(N’google.com’)
SELECT CHECKSUM(N’google.com’), CHECKSUM(N’g-oogle.com’)

As you can see the first one you get two different values as you would expect ( 1560309903 and 1560342303 respectively).  With the second query you get two very different values between the Unicode and character strings (1560309903 and -1136321484 respectively).  Based on the first query you would expect to get two different values for the third query, but you don’t.  With the Unicode strings the - appears to not count as part of the CHECKSUM giving you the same CHECKSUM value for both strings (-1136321484).

Another version of this technique which Kevin Kline talked about recently uses the HASHBYTES function of SQL Server 2005 to get the hash of a column and use that.  In his blog he’s talking about using it for partitioning a table, but that same technique can be used here as well.

CREATE TABLE Employee
(EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255),
EmailAddressCheckSum AS HASHBYTES(’SHA1′, EmailAddress)

This will however give you a longer string, therefor taking up more space within the index. However if working with long Unicode strings this may be a better option for you to use.

Denny