SQL Server with Mr. Denny


July 21, 2008  11:00 AM

A better way to index text data

Denny Cherry Denny Cherry Profile: Denny Cherry

Indexing text data (varchar, nvarchar, char, etc) is a good way to make it faster to find the data you are looking for.  However these indexes can end up being very hard on the disks behind the index, as well as the memory of the server.  This is because of the large amount of data being put in the index.

As an example, let’s say that we have a table like this.

CREATE TABLE Employee
(EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255))

Now assume that you want to be able to search by the EmailAddress field.  We will then want to index the EmailAddress field with a non-clustered index.  If we work for a company like AMD, then our email addresses will be pretty short (f.lastname@amd.com).  However if we work for a company like I work for then the email addresses are a bit longer (flastname@awarenesstechnologies.com).  Now when we index this column we will be putting the entire email address into the index, taking up a lot of space within the index; especially compared to a numeric value such as an integer.  This becomes doubly true if you are using a uni-code data type as each character requires two bytes of storage instead of the usual one.

This also becomes a problem if you are working on a system with URLs in the field to be indexes.  Depending on the length of the URL, the values may be longer than is allowed in an index which could then give you sorting problems on the indexes.

There are a couple of variations on this technique which I’ve seen.  The one I’ve used the most is to use the CHECKSUM function as part of a calculated column, and then index the calculated column.  This way you simply get the CHECKSUM of the value you want to find, and search the calculated column.  As we are now have an index made up of integers the index can fit a lot more data on each physical data page reducing the IO cost of the index seek as well as saving space on the disk.

So doing this turns our table into something like this.

CREATE TABLE Employee
(EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255),
EmailAddressCheckSum AS CHECKSUM(EmailAddress))

Now I wouldn’t recommend using this technique for each table you create.   I usually only recommend a technique like this when the value to be indexes won’t fit within the bounds of the index, or the table will be very large and searched often so the memory saved is worth the extra CPU time of having to hash the values before doing the lookup.

Now there are a couple of gotchas with this technique.  If you are check summing domain names, some characters don’t check sum correctly.  Also check summing a Unicode version of a string will give you a different result than the non-unicode version of the same string.

You can see that with these three SELECT statements.

SELECT CHECKSUM(‘google.com’), CHECKSUM(‘g-oogle.com’)
SELECT CHECKSUM(‘google.com’), CHECKSUM(N’google.com’)
SELECT CHECKSUM(N’google.com’), CHECKSUM(N’g-oogle.com’)

As you can see the first one you get two different values as you would expect ( 1560309903 and 1560342303 respectively).  With the second query you get two very different values between the Unicode and character strings (1560309903 and -1136321484 respectively).  Based on the first query you would expect to get two different values for the third query, but you don’t.  With the Unicode strings the – appears to not count as part of the CHECKSUM giving you the same CHECKSUM value for both strings (-1136321484).

Another version of this technique which Kevin Kline talked about recently uses the HASHBYTES function of SQL Server 2005 to get the hash of a column and use that.  In his blog he’s talking about using it for partitioning a table, but that same technique can be used here as well.

CREATE TABLE Employee
(EmployeeID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255),
EmailAddressCheckSum AS HASHBYTES('SHA1', EmailAddress)

This will however give you a longer string, therefor taking up more space within the index. However if working with long Unicode strings this may be a better option for you to use.

Denny

July 17, 2008  11:00 AM

Book: The Windows Manager’s Guide to iSCSI SANs

Denny Cherry Denny Cherry Profile: Denny Cherry

That’s right I said a book.  Well it’s actually an eBook but good enough right?  The best part is that it’s available for free.  It’s a three chapter eBook called “The Windows Manager’s Guide to iSCSI SANs”.  Each chapter was posted separately so you have to click through to each one.

Registration is required to view the chapters (don’t blame me).  I wrote chapter 2, I hope you find it useful.

Chapter 1: Leveraging SANs in Windows Servers

Chapter 2: Gearing up for SANs on a SQL Server

Chapter 3: Reducing Exchange Server Complexity with SANs

Denny


July 14, 2008  11:00 AM

Using batching to do large operations without filling the transaction log

Denny Cherry Denny Cherry Profile: Denny Cherry

Deleting large amounts of data from a table is usually an important task, but if you don’t have a maintenance window to work within then you can end up causing locking and blocking against the other processes which are accessing the database.  Not to mention you will cause a very large amount of data into the transaction log no matter what your transaction logging level is set to.

Say you have a table with a date column and you need to delete a million plus records.  Doing this in a single transaction will put all million transactions into your transaction log, plus cause any other processes which are trying to access the table to be blocked.

 However if we batch the transaction into many smaller transactions our transaction log will not fill up as we can backup the log using our normal log backup methods throughout the process, or if we use SIMPLE recovery on our database then transactions will be removed from the log automatically.

In SQL 2000 and below you have to set the ROWCOUNT session variable to a number above 0, which would cause SQL to delete the first n records that it comes across.  In SQL 2005 we can use the TOP parameter as part of our DELETE command having the same effect, but without having to reset the session variable.

In SQL 2000 or below you can use a syntax like this one.

DECLARE @c BIT, @d DATETIME
SET @d = dateadd(yy, -1, getdate())
SET @c = 0
SET ROWCOUNT = 1000
WHILE @c = 0
BEGIN
DELETE FROM Table
WHERE CreateDate

If you are using SQL 2005 you can use this very similar syntax.

DECLARE @c BIT, @d datetime
SET @d = dateadd(yy, -1, getdate())
SET @c = 0
WHILE @c = 0
BEGIN
DELETE TOP (1000) FROM Table
WHERE CreateDate

Both pieces of code are very similar. Declare a variable which tells the loop when to exit. Then start deleting the data. If no records are deleted, then set the variable to 1 causing us to exit the loop. Now this will usually take a little bit longer to complete than a single delete statement, but the system will continue to be responsive during the process.

The number of records which you are deleting should be adjusted based on the width of your records, the load on the database at the time of deletion, and the speed of your hard drives which hold the data files and transaction logs. I usually start at 1000 records and see how the system responds. For tables which a just a few numbers I'll put it up as high as 50k or 100k records. For very wide tables I'll drop it down to 100 or 500 records if the system can't handle 1000 records.

Denny

Update:
Sorry this post didn't look very good at first. The blog site seams to have eaten it, and I didn't notice until just now.


July 10, 2008  11:00 AM

My Experience with putting a CRL Procedure into SQL

Denny Cherry Denny Cherry Profile: Denny Cherry

A little while ago I came upon a situation where I needed to put a CRL procedure into SQL Server 2005.  Now I’ve been using SQL Server 2005 since the day it went RTM, and this is the first occasion that I’ve needed to put a CLR procedure into SQL.  Could I have done what I needed to without a CRL procedure?  Yeah probably, but the .NET code was written, and had been working.

My code is written in C# as that’s what the .NET guys here use.  There is one method within the C# class called HashPassword.  It takes a text string and makes a SHA1 Hash of it which we then store.  We had to make a couple of changes to the C# code to make it work within SQL Server.

The original code looked like this.

using System;
using System.Collections.Generic;
using System.IO;
using System.Security.Cryptography;
using System.Text;
 

namespace Rapid.Database.Security{
public class User
     {
     public static void HashPassword(String password, out String hash)
          {
          SHA1 sha1 = new SHA1CryptoServiceProvider();hash = 
          BitConverter.ToString(sha1.ComputeHash(UnicodeEncoding.Default.GetBytes(password.ToLower()))).Replace("-", "");

          }
     }
}

The changed C# code looks like this.

using System;
using System.Collections.Generic;
using System.IO;
using System.Security.Cryptography;
using System.Text;
using Microsoft.SqlServer.Server;

namespace Rapid.Database.Security

{
public class User
     {

     [SqlProcedure(Name=“HashPassword”)]
     public static void HashPassword(String password, out String hash)
          {
          SHA1 sha1 = new SHA1CryptoServiceProvider();hash = 
          BitConverter.ToString(sha1.ComputeHash(UnicodeEncoding.Default.GetBytes(password.ToLower()))).Replace("-", "");

          }
     }
}

Once these changes are made and the DLL recompiled we can attach the DLL to the SQL Server database as an assembly.  This is done with the CREATE ASSEMBLY command.  I have to use the EXTERNAL_ACCESS flag instead of the SAFE flag because my dll requires the use of the System.IO assembly which can’t be run as SAFE under SQL Server 2005.

CREATE ASSEMBLY [Rapid.Database.Security]AUTHORIZATION [dbo]
FROM ‘D:\Rapid.Database.Security.dll’
WITH PERMISSION_SET = EXTERNAL_ACCESS
GO

We can now create our procedure which needs to have one input and one output parameter to match the number of parameters within the C# procedure.  This is done with the CREATE PROCEDURE command just like creating any normal procedure.

CREATE PROCEDURE [dbo].[HashPassword]
@Password [nvarchar](50),
@PasswordHash [nvarchar](50) OUTPUT
WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [Rapid.Database.Security].[Rapid.Database.Security.User].[HashPassword]
GO

The three part name of the C# method are:

  1. Assembly name which SQL Server knows about (basically the name of the DLL without the file extension).
  2. The full name to the class, in our case the namespace then the class.  Our namespace is Rapid.Database.Security with the name of the class after that.
  3. The name of the method within the C# class.

Don’t forget to enable CLR within SQL Server using the sp_configure before trying to use the CLR procedure.

 At this point I am able to run the procedure and pass in text and get back the hash.

declare @PasswordHash varchar(50)
exec HashPassword @password=‘test’, @passwordHash=@PasswordHash OUTPUT
SELECT @PasswordHash

The value of this is: A94A8FE5CCB19BA61C4C0873D391E987982FBBD3

I hope that you find this useful. It took me and a .NET developer a couple of hours to get all this working correctly and loaded into the SQL Server correctly. I know that I’ll be referring back to this if I ever need to attach a CLR method into SQL again.

Do keep in mind that when we were doing this, we created this as a separate DLL for the SQL Server as to get the original DLL into the SQL Server, SQL wanted us to load up a bunch of other assemblies that some of the other methods required.

Denny


July 10, 2008  11:00 AM

SQL 2008 RC0 Intellisense doesn’t work against SQL 2005

Denny Cherry Denny Cherry Profile: Denny Cherry

I’ve been working with SQL 2008 RC0 for a couple of weeks now, and something I’ve noticed that really sucks.  The Intellisense doesn’t work against a SQL 2005 server any more.  I’ve been using the SQL 2008 UI exclusively for several months now (since the November CTP at least) and I’ve gotten quite used to the intellisense.

While jumping around the net looking to see who else was talking about this (and several people are) I found this Connect bug about it.  The notes from Eric @ Microsoft are that if the vote count shows that enough people want this fixed they’ll see what they can do.   The comments in the Connect bug indicate that pretty much everyone wants it back, even if it’s not perfect.

The only other person I really see complaining about the lack of downlevel support is Aaron Bertrand.

Be sure to vote on the connect bug and leave a comment here or there so that the developers get the idea that we want it back.

Denny


July 7, 2008  11:00 AM

Back To Basics: The RECEIVE command (and END CONVERSATION)

Denny Cherry Denny Cherry Profile: Denny Cherry

The RECEIVE command is the last of the Service Broker items to cover in the Back To Basics series.  The RECEIVE command is what is used to actually get the message out of the queue so that you can process the data within the message_body column.  The RECEIVE command works just like the SELECT statement for the most part.  It has a FROM clause and a WHERE clause (which most people don’t use, but it’s there if you need it).  After you receive the message you will usually want to run the END CONVERSATION command.  This will tell the remote machine that you have receive the message and that no more messages should be sent on this conversation.  I usually only send a single message per conversation, so I don’t put in any logic to see if this is the last message in the conversation.

The basic syntax is very straight forward.

DECLARE @handle UNIQUEIDENTIFIER
DECLARE @message_body XML
RECEIVE TOP (1) @handle=conversation_handle, @message_body=cast(message_body as XML)
FROM [QueueName]
END CONVERSATION @handleIt is required that you cast the message body to XML, as it's transmitted as binary data.  (If you are sending a single value or a binary blob then change the cast as needed.)  You don't have to write a second command to delete the message from the queue after you receive it.  The RECEIVE command handles the SELECT and DELETE in a single command.If you haven't been able to make to to one of my sessions, here is the sample source code which I use for the service broker session.  This ZIP file contains not only the sample code, but also the slide deck from the presentation which covers much of which I've covered in these blogs.  If you have any questions about the sample code please feel free to post those questions here so that everyone can see the answers.

I may take a little break from the Back To Basics posts for a little bit, but I haven’t really decided yet, so if you see more popping up on the site, I guess that means that I’ve decided to keep working the series for a bit.  If I do take a break, I will start back up shortly. 

I promise.

Denny


July 3, 2008  4:59 PM

My Back To Basics session was way more popular than I was expecting

Denny Cherry Denny Cherry Profile: Denny Cherry

At the last SoCal Code Camp I did several new sessions.  One of which was my Back To Basics session.  When I was setting up the session I figured that a few people would show, mostly because the wouldn’t have anything else to do for the first session.

What I got instead was a massive turnout.  There were even people standing at the back of the room (and the room held about 40 people).  As to the questions, they were all excellent, and there were so many of them.  Because there were so many questions we ended up only getting about half way through the material.

Because of this I wanted to announce that next time code camp happens (Woody tells me that there will be one in October this year up at USC) I’m going to schedule it as a two part session.  That way we can be sure to get through all the material that I want to cover, and have plenty of time for questions this time.  As soon as Woody opens up the site for scheduling I’ll be sure to get the both halves of the session posted so that we can get a good judge of what size room is needed.

Check back here and on the SoCal Code Camp website for additional scheduling details.

Denny


July 3, 2008  5:39 AM

Installing SQL Server on Windows Server 2008

Denny Cherry Denny Cherry Profile: Denny Cherry

I ran across an excellent post from Shawn Hernan of the SQL Server Development team (the guys that write SQL Server).  Shawn has put out an excellent blog post on what needs to be done to get SQL Server2005 or SQL Server 2008 installed on Windows Server 2008.

 You can find the post here.

Denny


June 30, 2008  11:00 AM

New Article: Secure SQL Server from SQL injection attacks

Denny Cherry Denny Cherry Profile: Denny Cherry

An article that I wrote about securing your SQL Server from SQL Injection attacks was just published.  It covers information from both a DBAs point of view and a .NET developers point of view.

Denny


June 28, 2008  8:57 PM

SoCal Code Camp Slide Decks and Sample Code

Denny Cherry Denny Cherry Profile: Denny Cherry

As promised to the folks at the Code Camp here are the slide decks and sample code from this weekends code camp.

Back To Basics: Getting Back To The Basics of SQL Server
SQL Server 2008 Resource Governor
What Do All These Undocumented Stored Procedures Do
Storage for the DBA
Virtual SQL Servers Should I or Shouldn’t I
SQL Server Clustering 101

Thanks for attending the Code Camp, and see you next time.

Denny


Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: