Deduplicating data with a CTE

Ask SSC logoTypical.  You answer a question only to find that the answer you provided isn’t what the original poster was looking for by reading the dozen or so comments that arrived on the question while you were figuring out your answer…  So I’ve taken my answer out and will put it here instead.  🙂

Question

How can I deduplicate data?

My Answer

Use a CTE (Common Table Expression), and the Row_Number() function, and delete from the CTE where that row number is greater than 1.

In more detail…

Let’s work this through with a sample. A ludicrously simple sample.

DECLARE @DeDupe TABLE (f1 VARCHAR(100), i1 INT)
INSERT INTO @DeDupe
SELECT 'Fred', 1 UNION ALL
SELECT 'Fred', 1 UNION ALL
SELECT 'John', 1 UNION ALL
SELECT 'John', 2
SELECT * FROM @DeDupe

At this point, we have a (very) small table with a duplicate row. So now we’ll use a CTE and that ROW_NUMBER function mentioned above and add the following to our query above:

WITH DeDupeTemp AS (
SELECT f1, i1, ROW_NUMBER() OVER (
       PARTITION BY f1,i1 ORDER BY f1,i1) AS RowNum
       FROM @DeDupe)

SELECT * FROM DeDupeTemp

As you can see, that final SELECT gives us an extra column, the row number of the line within that particular combination of fields.

We now have an easy way of deleting the data we don’t need. Replace the final line of the above with:

DELETE FROM DeDupeTemp WHERE RowNum > 1
SELECT * FROM @DeDupe


Hurrah! No duplicate data.  Manipulating the CTE like this has directly affected the base table.

Homework

The next step is to wrap that code up into a stored procedure that takes the name of a table as a parameter, prepares dynamic T-SQL statements and executes them to deduplicate the named table.

Advertisements
This entry was posted in SQLServerPedia Syndication and tagged , , . Bookmark the permalink.

2 Responses to Deduplicating data with a CTE

  1. Boneist says:

    Surely the homework is to add a unique constraint to prevent duplicates from being added in the first place?! *{;-)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s