Posted by: Denny Cherry
CTE, DELETE statement, SQL Server 2005, SQL Server 2008, T/SQL
All to often we end up with duplicate rows in a table. The best way to keep duplicate rows out of the database is to not let them in. But assume that they are there. This bit of sample code shows how to delete those duplicate rows quickly and easily in a single statement. No temp tables required (I use a temp table to put the data into for example purposes). This code is for SQL 2005 and up as it uses some features which were introduced in SQL Server 2005. SQL Server 2000 would require a totally different technique.
CREATE TABLE #DuplicateRows /*Create a new table*/
INSERT INTO #DuplicateRows /*Load up duplicate rows*/
FROM #DuplicateRows; /*Check that the data is actually hosed*/
WITH Cleaning AS (SELECT ROW_NUMBER() OVER(ORDER BY Col1, Col2, Col3) as row,
DELETE FROM Cleaning /*Delete the rows which are duplicates*/
WHERE Row NOT IN (SELECT row FROM (SELECT Col1, Col2, Col3, MIN(row) row
FROM Cleaning a
GROUP BY Col1, Col2, Col3) b)
SELECT * /*Check the table to see that it is clean*/
DROP TABLE #DuplicateRows /*Clean up the table*/
Hopefully you find this code useful.