I was working on performance tuning the data deletion process that I use to remove data from our database. We delete massive amounts of data on a daily basis, so our data deletion process is pretty important to us. As we’ve grown our customer base the amount of data to be deleted every day has also grown, and the amount of time to delete that data has gone up as well.
We it has started to take stupid amounts of time to delete the data, so I started digging into the data deletion procedures.
The first thing I looked at was the actual delete statements. These seams ok, the actual deletes were happening very quickly (we process deletes in batches of 1000 records per batch to minimize locking of data). So next I looked at the part of the code where we select the records to be deleted. Looking at the execution plan, everything looked ok.
But this little chunk of code took about 50 minutes to run. Pretty bad when only returning 1000 numbers back from the database.
SELECT TOP (@BatchSize) a.PolicyIncidentId FROM PolicyIncident a WITH (NOLOCK) JOIN #ComputersToProcess ComputersToProcess ON a.ComputerId = ComputersToProcess.ComputerId WHERE CaptureTimestamp < ComputersToProcess.StartDeleteAt
The first thing that I did was put a primary key on the @ComputersToProcess table variable. That turned the table scan into a Clustered Index Scan, but didn’t do anything for performance.
The next thing I did was switch the table variable to a temp table (without a primary key). This really didn’t do anything to speed up the process as there is still no statistics on the data. However this time the execution plan actually shows you that there’s no statistic on the temp table.
Now, I didn’t want to put at non-clustered index on the table keeping the table as a heap, and a clustered index that wasn’t a primary key wasn’t going be any more effective than a primary key, so I put a primary key on the table. While the query cost percentage went up from 2% to 7% the actual run time went down from 50 minutes to just 1 second.
Now I didn’t make any other code changes to the procedures, just changing from the table variable to the temp table, and adding a primary key and this one little three line query went from an hour to a second. Its amazing how much such a small change can make things run smoother.
Now obviously this isn’t going to fix every problem. But in my case I’m putting a little over 190k rows into the table variable (now temp table) and this is just to much for the table variable to take. Keep in mind that with table variables the SQL Server has statistics, but it assumes only a single row per temp table, no matter how much data is actually in the table variable.
Last year the SQL PASS conference had an official bloggers table setup in the keynotes so that some people who were tapped as the official bloggers would blog and use twitter in real time during the keynotes. Somehow I got on this list last year. Apparently I didn’t show up hung-over enough and I was asked to sit at the table again.
So once again I’ll be blogging and tweeting live from the keynotes at PASS for your education/entertainment/shear horror.
On Tuesday I will be cutting out a little early to get setup for my spotlight session which starts right after the keynotes as I have a couple of different VMs that I need to power up for the demo.
A little while ago I went to the SSWUG office and recorded some sessions for the upcoming SSWUG vConference. One additional thing which I recorded was a short little interview video which they’ve already got posted up on Facebook.
Sometimes a project comes around that requires knowledge beyond the normal SQL Server knowledge. This is where having the extra knowledge can really make you standout. Recently I was talking to Allen Kinsel (blog | twitter) about IPv6 on a Windows Cluster which was being blocked by Symantic which was causing all sorts of problem.
I then mentioned that this would create all sorts of productions for Direct Access as it requires IPv6 to function. Which let to a quick back and forth about what Direct Access was and how it worked. Suddenly Wendy Pastrick (blog | twitter) comes into the conversation asking specifically about Direct Access. Apparently she has a new client which has many remote SQL instances installed on peoples laptops and those laptop use merge replication to sync up data with the central database. This is a perfect situation for Direct Access to be deployed.
What’s Direct Access?
Direct Access is a feature of Windows 2008 R2 and Windows 7 where the client computers can create an automatic SSL protected connection to the company network on demand without the user needing to initiate the connection.
How can it help?
The current solution that the company has to deploy requires that the user initiate a VPN connection then the user would need to start the SQL Replication job to begin the data transfer (or have the distribution agent setup to try over and over until it succeeds). Using Direct Access when the SQL Server attempts to connect to the distributor (I’m assuming a pull subscription here) the computer will see the attempt to request access to an internal server, so it’ll then connect to the direct access server effectively making a VPN connection, which would then allow the data transfer to complete without the user even knowing that the connection was needed.
Obviously Direct Access isn’t a feature that most DBAs would know about. Now that you know about this feature you can pitch it if you are in the need for a distributed merge replication solution that will allow for automatic replication of data without the remote user knowing that the replication needs to take place.
On Tuesday October 26, 2010 at 6:30pm (pacific time) I’ll be speaking at a Meet Up in Irvine, CA at the WorkBridge Associates offices (where I was originally going to be speaking this week. Because I’m going to be in Atlanta, GA next week instead of presenting from Irvine, I’ll be presenting over the web via Live Meeting instead.
Since I’ll be presenting over live meeting I’m able to invite anyone else who would like to attend as well to connect via Live Meeting. I’ll be presenting a new slide deck on the 26th called “Where should I be encrypting my data?”. In the deck I talk about all the various ways that data can be encrypted within your database application.
Hopefully I’ll see everyone there.
P.S. I know that a few people headed over to the WorkBridge office this week, sorry about that. The date was originally this week, but then it was moved, and not everyone was updated with the new date.
Sorry if you didn’t get the word, I thought that I had put out a post about it, but this weeks Meet UP that I was going to speak at was rescheduled for Next week. Same time, same place; at least for you. I’ll be in Atlanta, so I’ll be giving the session via Live Meeting. The host company will be setting up a web cam so I can see you guys, and I’ve got my web cam so that you can see me.
See you next week.
So you want to install Cisco Fabric Manager and/or Cisco Device Manager on a Windows 7 x64 computer. Awesome, good for you. Unfortunately like x64 VPN Cisco has in their infinite wisdom not released an x64 version of the Fabric Manger or Device Manager. This makes installing under x64 a lot harder. Continued »
Apparently some people (myself included) have reported that Windows Vista and Windows Server 2008 are loosing their default gateway settings after installing Service Pack 2 onto the machine. The basic symptom is that after putting in the default gateway everything works fine, until you reboot. The kicker is that after changing the default gateway the computer prompts you to reboot for basically no reason. Continued »
Apparently this weekend a small company which does something very important discovered the upper bounds of the INT datatype (or the equivalent in what ever database platform they are using). This makes it very clear that whoever designed the database for them didn’t do a very good job designing the database, because if they had they would have found this little problem a while ago and fixed it well in advance.
In case you didn’t click through to the slash dot article, or passed it to the actual article the company which holds the contracts with 49 states parole agencies for parolee GPS monitoring wasn’t able to record where the people being monitored were for about 12 hours. The /. article says that the had a little over 2 Billion records in the table. A little thinking and that sounds an awful like the upper bounds of a 32bit integer (aka the INT data type is you start at 1 instead of the lower bound of the data type). If the database designer had selected to use a 64bit integer (aka BIGINT) then the table would have been able to store 9,223,372,036,854,775,807 records (assuming they started at 1and not the lower bound of the data type).
Now I’ve got no idea how long the database has been collecting data, but how ever long it was, it probably wasn’t all that long (maybe 5 years tops) and using the 64bit integer would have let the system last for much, much longer.
Thus endith the rant.
This year I’ve got the pleasure of being accepted as a speaker at the SQL Connections conference in Las Vegas, NV the first week of November. Registration is still open for the conference. When you sign up for SQL Connections you also get access to ASP & Silverlight Connections, Visual Studio Connections, SharePoint Connections, Windows Connections, Exchange Connections and DOTNETNUKE Connections all for registering for SQL Connections.
There will be some top notch speakers at the conference this year (there are every year) including Todd McDermid (Blog | Twitter), Paul Randal (Blog | Twitter), Kimberly Tripp (Blog | Twitter), Allen White (Blog | Twitter), Buck Woody (Blog | Twitter) and Glenn Berry (Blog | Twitter) among others so it should be a great week.
While I’m pretty sure it’ll be a little smaller crowed than the PASS Summit (which is being held the next week) I’m sure it’ll be a great time with the group that will be there.