SQL Server with Mr. Denny:

SELECT statement

Nov 27 2008   12:15PM GMT

Back To Basics: Creating Indexes



Posted by: mrdenny
SELECT statement, Back To Basics, CREATE INDEX

After you have created your tables, and stored procedures you will have a basic idea of what queries are going to be running against your database.  If you haven’t already done so, its now time to start adding indexes to your database.  The catch with index is that there is such a thing as to much or a good thing.  As you add more and more indexes to your database your INSERT, UPDATE and DELETE commands will being to slow down, as each time you insert a record the insert is committed to the table, as well as to each index on the table. Continued »

Jul 21 2008   11:00AM GMT

A better way to index text data



Posted by: mrdenny
SELECT statement, T/SQL, Uni-code, Query tuning, Tables, Index Performance

Indexing text data (varchar, nvarchar, char, etc) is a good way to make it faster to find the data you are looking for.  However these indexes can end up being very hard on the disks behind the index, as well as the memory of the server.  This is because of the large amount of data being put in the index.

As an example, let’s say that we have a table like this.

CREATE TABLE Employee
(EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255))

Now assume that you want to be able to search by the EmailAddress field.  We will then want to index the EmailAddress field with a non-clustered index.  If we work for a company like AMD, then our email addresses will be pretty short (f.lastname@amd.com).  However if we work for a company like I work for then the email addresses are a bit longer (flastname@awarenesstechnologies.com).  Now when we index this column we will be putting the entire email address into the index, taking up a lot of space within the index; especially compared to a numeric value such as an integer.  This becomes doubly true if you are using a uni-code data type as each character requires two bytes of storage instead of the usual one.

This also becomes a problem if you are working on a system with URLs in the field to be indexes.  Depending on the length of the URL, the values may be longer than is allowed in an index which could then give you sorting problems on the indexes.

There are a couple of variations on this technique which I’ve seen.  The one I’ve used the most is to use the CHECKSUM function as part of a calculated column, and then index the calculated column.  This way you simply get the CHECKSUM of the value you want to find, and search the calculated column.  As we are now have an index made up of integers the index can fit a lot more data on each physical data page reducing the IO cost of the index seek as well as saving space on the disk.

So doing this turns our table into something like this.

CREATE TABLE Employee
(EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255),
EmailAddressCheckSum AS CHECKSUM(EmailAddress))

Now I wouldn’t recommend using this technique for each table you create.   I usually only recommend a technique like this when the value to be indexes won’t fit within the bounds of the index, or the table will be very large and searched often so the memory saved is worth the extra CPU time of having to hash the values before doing the lookup.

Now there are a couple of gotchas with this technique.  If you are check summing domain names, some characters don’t check sum correctly.  Also check summing a Unicode version of a string will give you a different result than the non-unicode version of the same string.

You can see that with these three SELECT statements.

SELECT CHECKSUM(’google.com’), CHECKSUM(’g-oogle.com’)
SELECT CHECKSUM(’google.com’), CHECKSUM(N’google.com’)
SELECT CHECKSUM(N’google.com’), CHECKSUM(N’g-oogle.com’)

As you can see the first one you get two different values as you would expect ( 1560309903 and 1560342303 respectively).  With the second query you get two very different values between the Unicode and character strings (1560309903 and -1136321484 respectively).  Based on the first query you would expect to get two different values for the third query, but you don’t.  With the Unicode strings the - appears to not count as part of the CHECKSUM giving you the same CHECKSUM value for both strings (-1136321484).

Another version of this technique which Kevin Kline talked about recently uses the HASHBYTES function of SQL Server 2005 to get the hash of a column and use that.  In his blog he’s talking about using it for partitioning a table, but that same technique can be used here as well.

CREATE TABLE Employee
(EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255),
EmailAddressCheckSum AS HASHBYTES(’SHA1′, EmailAddress)

This will however give you a longer string, therefor taking up more space within the index. However if working with long Unicode strings this may be a better option for you to use.

Denny


Mar 10 2008   10:00AM GMT

Back To Basics: The SELECT Statement



Posted by: mrdenny
SQL, SQL Server 2000, SQL Server 2005, SELECT statement, T/SQL, Back To Basics, SQL Server 2008

There are four basic commands in databases.  They are SELECT, INSERT, UPDATE and DELETE.  Probably the most important of these is the SELECT command.  The SELECT command is how the data in the database is retrieved and displayed.

(All these code samples can be run on all versions of Microsoft SQL Server from 7.0 up.)

Like a regular sentence there are a few basic parts of the SELECT statement.  First there is a list of columns that you want to see.  Then there is the FROM portion of the statement.  This is the table or tables which you will be getting the data from.  Then is the WHERE portion of the statement which is the data filter.

When you put all this together the SELECT statement will look something like this.

SELECT id, name
FROM sysobjects
WHERE type = ‘U’

If you were going to read this as a normal sentence it would read something like:
I want to see the "id" and "name" columns, from the "sysobjects" table, where the "type" column has a value of "U".

Now if we need to look at data from two different tables which is combined together we add a JOIN statement between the FROM and WHERE portions of the SELECT statement.

SELECT sysobjects.namesyscolumns.name
FROM sysobjects
JOIN syscolumns on sysobjects.id = syscolumns.id
WHERE sysobjects.type = ‘U’

You can see from the above statement that we are getting the table names and column names for all the user tables. When you look at the JOIN command you see that we are matching up the id column from both of the tables. Now in this example the column name is the same in both tables. This isn’t always the case. If you don’t know how the tables relate to each other, you can usually look at the foreign key constraints to see how the tables relate to each other.

If you want to get counts of data you can use the GROUP BY clause along with the COUNT() function. In this next statement we will count the number of columns in each table.

SELECT sysobjects.name, count(*)
FROM sysobjects
JOIN syscolumns on sysobjects.id = syscolumns.id
WHERE sysobjects.type = ‘U’
GROUP BY sysobjects.name

When ever you use a mathematical function such as MIN(), MAX(), AVG(), COUNT(), etc you have to add the GROUP BY clause to the SELECT statement so that SQL Server knows how to roll up the data.

Now as a last part of the SELECT statement to look at today, we are going to filter our GROUP BY query further, by showing all the tables that have over 5 columns. To do this we use the HAVING clause. Without the HAVING clause we would have to do a very complex query as a sub-query in the WHERE clause which would be very inefficient.

SELECT sysobjects.name, count(*)
FROM sysobjects
JOIN syscolumns on sysobjects.id = syscolumns.id
WHERE sysobjects.type = ‘U’
GROUP BY sysobjects.name
HAVING count(*) > 5

The HAVING phrase is very simple in it’s syntax. You put the math function that you want to use (in this case COUNT(*)) then <, >, <> or = and what ever you want to compare it to.

I hope that you find this post useful. I’ve barely scratched the surface of the SELECT statement, and what it can do. I encourage everyone to open up Books OnLine and read through the information on the SELECT statement. It includes more examples, and some of the other options which are available to you.

Denny