DELETE Statement archives - SQL Server with Mr. Denny

SQL Server with Mr. Denny:

DELETE statement

May 4 2009   11:00AM GMT

How can I remove duplicate records in my tables?



Posted by: mrdenny
T/SQL, CTE, DELETE statement, SQL Server 2008, SQL Server 2005

All to often we end up with duplicate rows in a table.  The best way to keep duplicate rows out of the database is to not let them in.  But assume that they are there.  This bit of sample code shows how to delete those duplicate rows quickly and easily in a single statement.  No temp tables required (I use a temp table to put the data into for example purposes).  This code is for SQL 2005 and up as it uses some features which were introduced in SQL Server 2005.  SQL Server 2000 would require a totally different technique.

CREATE TABLE #DuplicateRows /*Create a new table*/
(Col1 INT,
Col2 INT,
Col3 INT)

INSERT INTO #DuplicateRows /*Load up duplicate rows*/
SELECT 1,1,1
UNION ALL
SELECT 1,1,1
UNION ALL
SELECT 1,1,1
UNION ALL
SELECT 2,2,2
UNION ALL
SELECT 2,2,2
UNION ALL
SELECT 2,2,2

SELECT *
FROM #DuplicateRows; /*Check that the data is actually hosed*/

WITH Cleaning AS (SELECT ROW_NUMBER() OVER(ORDER BY Col1, Col2, Col3) as row,
Col1,
Col2,
Col3
FROM #DuplicateRows)

DELETE FROM Cleaning /*Delete the rows which are duplicates*/
WHERE Row NOT IN (SELECT row FROM  (SELECT Col1, Col2, Col3, MIN(row) row
FROM Cleaning a
GROUP BY Col1, Col2, Col3) b)

SELECT * /*Check the table to see that it is clean*/
FROM #DuplicateRows

DROP TABLE #DuplicateRows /*Clean up the table*/

Hopefully you find this code useful.

Denny

Jul 14 2008   11:00AM GMT

Using batching to do large operations without filling the transaction log



Posted by: mrdenny
SQL, DELETE statement, Query tuning, Tables

Deleting large amounts of data from a table is usually an important task, but if you don’t have a maintenance window to work within then you can end up causing locking and blocking against the other processes which are accessing the database.  Not to mention you will cause a very large amount of data into the transaction log no matter what your transaction logging level is set to.

Say you have a table with a date column and you need to delete a million plus records.  Doing this in a single transaction will put all million transactions into your transaction log, plus cause any other processes which are trying to access the table to be blocked.

 However if we batch the transaction into many smaller transactions our transaction log will not fill up as we can backup the log using our normal log backup methods throughout the process, or if we use SIMPLE recovery on our database then transactions will be removed from the log automatically.

In SQL 2000 and below you have to set the ROWCOUNT session variable to a number above 0, which would cause SQL to delete the first n records that it comes across.  In SQL 2005 we can use the TOP parameter as part of our DELETE command having the same effect, but without having to reset the session variable.

In SQL 2000 or below you can use a syntax like this one.

DECLARE @c BIT, @d DATETIME
SET @d = dateadd(yy, -1, getdate())
SET @c = 0
SET ROWCOUNT = 1000
WHILE @c = 0
BEGIN
DELETE FROM Table
WHERE CreateDate

If you are using SQL 2005 you can use this very similar syntax.

DECLARE @c BIT, @d datetime
SET @d = dateadd(yy, -1, getdate())
SET @c = 0
WHILE @c = 0
BEGIN
DELETE TOP (1000) FROM Table
WHERE CreateDate

Both pieces of code are very similar. Declare a variable which tells the loop when to exit. Then start deleting the data. If no records are deleted, then set the variable to 1 causing us to exit the loop. Now this will usually take a little bit longer to complete than a single delete statement, but the system will continue to be responsive during the process.

The number of records which you are deleting should be adjusted based on the width of your records, the load on the database at the time of deletion, and the speed of your hard drives which hold the data files and transaction logs. I usually start at 1000 records and see how the system responds. For tables which a just a few numbers I’ll put it up as high as 50k or 100k records. For very wide tables I’ll drop it down to 100 or 500 records if the system can’t handle 1000 records.

Denny

Update:
Sorry this post didn’t look very good at first. The blog site seams to have eaten it, and I didn’t notice until just now.


Apr 21 2008   9:00AM GMT

Back To Basics: The DELETE Statement



Posted by: mrdenny
SQL, DELETE statement

When you have data in your table that you need to remove the DELETE statement is the way to do that.  Using the DELETE statement without any WHERE clause will remove all the data from your table.  The WHERE clause works the same way as the WHERE clause for SELECT and UPDATE statements.

DELETE FROM Employee
WHERE EmployeeId = 4

You can also JOIN to a table when you delete data from a table which allows you to use the second table within the filter.  This requires a slightly different syntax than your normal DELETE syntax.

DELETE e
FROM Employee e
JOIN Department ON Employee.DepartmentId = Department.DepartmentId
  AND Department.DepartmentName = ‘Sales’

As you can see you need to set an alias for the table you are deleting from, and put that alias between the DELETE and FROM words.

Denny

P.S. Sorry this didn’t come up sooner, I could have sworn that I wrote it and published it back when I started the series.