Question

  Asked: May 6 2008   1:47 PM GMT
  Asked by: Harleq


Search for duplicates in a collumn and delete the row on which it occurs


Excel formulas, Excel functions, Excel macros, Excel help, Microsoft Excel

Hi,

I have a spreadsheet with list of identifiers in collumn 1, this is auto generated and due to the source info some of them duplicated, what I want to do is find the duplicates and remove the entire row they appear on.

I know this is possible using clipboard and filters but I would rather avoid user error and use a macro.

Any help will be appreciated

Thanks

Subscribe to Alerts! Get questions and answers delivered to your Inbox.


E-mail me updates on this question



   SUBSCRIBE

hidden modal window

Answer Wiki (Improve, edit or add to this answer)


 RATE THIS ANSWER
0
Click to Vote:
  •   0
  •  0



Hi Harleq,

What I would probably do is use VBA to sort the rows, then start at cell (1,1). For each cell, look to see if the next row has the same value. If it does, delete the row (the current row or the next one, depending on how you want to do it). If it doesn't, increment to the next row. (Don't increment if you deleted a row; the deletion acts as an increment anyway).

If you don't want the information re-sorted, I think your best bet is to increment over the rows, storing each row's identifier cell's value in an array. For each row, if the ID is already in that array, delete the row; if not, add the ID to the array and go to the next row. If you have a large data set you're working on, you may want to keep the array sorted or hashed to make all those lookups quicker.

HTH, let me know if you need more specific help.

OR, Assume A = ID column Then -(1) Insert another column (B)
(2) Sort on column (A)
(3) In column row 2 - enter equation = A2-A1 and copy down
Any row with a zero in Column B is a duplicate.
Then, if a massive spreadsheet (1) Save as a .CSV file (2) Load the .CSV file (3) Sort on Column B and all the bad rows will sort to the top
  • AddThis Social Bookmark Button

Browse more Questions and Answers on Development and Microsoft Windows.

Looking for relevant Development Whitepapers? Visit the SearchSQLServer.com Research Library.


Discuss This Answer


You must be logged-in to discuss a question. Log-in/Register

YuvalShavit  |   May 12 2008  4:34PM GMT

That’s a great idea, Nozlop. If the ID column is not purely numeric (ie, if an ID could be something like “A81BL”) then you can replace the =A2-A1 with something like =IF(A2=A1,1,0). That will enter a 1 if the ID column is equal to the one above it (ie, there’s a duplicate), and 0 if not.

Also, instead of saving as a separate file and sorting, you can select the whole of column B (where the new equation is) and copy. Then paste special and have it paste values (instead of “all”). Column B will now have the values it had before, but without the associated equations behind them. Now you can sort on column B and delete all the duplicates.