Using batching to do large operations without filling the transaction log
Posted by: mrdenny
Deleting large amounts of data from a table is usually an important task, but if you don’t have a maintenance window to work within then you can end up causing locking and blocking against the other processes which are accessing the database. Not to mention you will cause a very large amount of data into the transaction log no matter what your transaction logging level is set to.
Say you have a table with a date column and you need to delete a million plus records. Doing this in a single transaction will put all million transactions into your transaction log, plus cause any other processes which are trying to access the table to be blocked.
However if we batch the transaction into many smaller transactions our transaction log will not fill up as we can backup the log using our normal log backup methods throughout the process, or if we use SIMPLE recovery on our database then transactions will be removed from the log automatically.
In SQL 2000 and below you have to set the ROWCOUNT session variable to a number above 0, which would cause SQL to delete the first n records that it comes across. In SQL 2005 we can use the TOP parameter as part of our DELETE command having the same effect, but without having to reset the session variable.
In SQL 2000 or below you can use a syntax like this one.
DECLARE @c BIT, @d DATETIME
SET @d = dateadd(yy, -1, getdate())
SET @c = 0
SET ROWCOUNT = 1000
WHILE @c = 0
BEGIN
DELETE FROM Table
WHERE CreateDate
If you are using SQL 2005 you can use this very similar syntax.
DECLARE @c BIT, @d datetime
SET @d = dateadd(yy, -1, getdate())
SET @c = 0
WHILE @c = 0
BEGIN
DELETE TOP (1000) FROM Table
WHERE CreateDate
Both pieces of code are very similar. Declare a variable which tells the loop when to exit. Then start deleting the data. If no records are deleted, then set the variable to 1 causing us to exit the loop. Now this will usually take a little bit longer to complete than a single delete statement, but the system will continue to be responsive during the process.
The number of records which you are deleting should be adjusted based on the width of your records, the load on the database at the time of deletion, and the speed of your hard drives which hold the data files and transaction logs. I usually start at 1000 records and see how the system responds. For tables which a just a few numbers I’ll put it up as high as 50k or 100k records. For very wide tables I’ll drop it down to 100 or 500 records if the system can’t handle 1000 records.
Denny
Update:
Sorry this post didn’t look very good at first. The blog site seams to have eaten it, and I didn’t notice until just now.


