Duplicate records

0 pts.
Tags:
Oracle 8i
What is the best way to detect duplicate records within a large database where the primary key is different?
ASKED: May 24, 2004  6:45 AM
UPDATED: May 24, 2004  3:43 PM

Answer Wiki

Thanks. We'll let you know when a new response is added.

What do you mean by “the primary key is different”?

Duplicates are defined with respect to some key definition. The standard way of identifying duplicates is:

SELECT … FROM table t1
WHERE EXISTS
(SELECT 1 FROM table t2
WHERE t2.key_col1 = t1.key_col1
……………
AND t2.key_colN = t1.key_colN
AND t2.rowid != t1.rowid)
ORDER BY
t1.key_col1, …, t1.key_colN
/

This assumes that the “key” columns aren’t defined as a (primary or unique) key constraint in the database (but there should be defined an index on the columns if the table is large!)

If the “key” columns are defined as a key constraint (and you’re getting duplicate errors when trying to INSERT), CREATE a temp table AS SELECT from the table in question (plus CREATE an index on the key columns), and perform the query above on the temp table.

Regards,
ActorJack

Discuss This Question: 4  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • 17051966
    I am assuming that the table with duplicate rows does not have a primary key. If a PK exists, there cannot be duplicate rows. The following code using correlated subquery identifies the duplicate rows and then deletes all of them except one. I first create a table without a PK, insert four duplicate rows into it, identify them, and then delete all but one from the table. 08:33:34 TIMS.DBA8:SQL>create table employees 08:33:53 2 (empno number, 08:34:08 3 last_name varchar2 (20), 08:34:21 4 deptno number) 08:34:32 5 tablespace userdata; Table created. Elapsed: 00:00:01.27 08:34:40 TIMS.DBA8:SQL>insert into employees values (123, 'Smith', 22); 1 row created. Elapsed: 00:00:00.10 08:35:14 TIMS.DBA8:SQL>insert into employees values (123, 'Smith', 22); 1 row created. Elapsed: 00:00:00.01 08:35:24 TIMS.DBA8:SQL>insert into employees values (123, 'Smith', 22); 1 row created. Elapsed: 00:00:00.01 08:35:27 TIMS.DBA8:SQL>insert into employees values (123, 'Smith', 22); 1 row created. Elapsed: 00:00:00.01 08:35:30 TIMS.DBA8:SQL>select * from employees; EMPNO LAST_NAME DEPTNO ---------- -------------------- ---------- 123 Smith 22 123 Smith 22 123 Smith 22 123 Smith 22 4 rows selected. Elapsed: 00:00:00.02 08:35:41 TIMS.DBA8:SQL>select rowid, empno, last_name, deptno from employees; ROWID EMPNO LAST_NAME DEPTNO ------------------ ---------- -------------------- ---------- AAAKKeACRAAABIbAAA 123 Smith 22 AAAKKeACRAAABIbAAB 123 Smith 22 AAAKKeACRAAABIbAAC 123 Smith 22 AAAKKeACRAAABIbAAD 123 Smith 22 4 rows selected. Elapsed: 00:00:00.01 08:36:42 TIMS.DBA8:SQL>select empno, rowid from employees a 08:37:19 2 where rowid > 08:37:25 3 (select min (rowid) from employees 08:37:43 4 where empno = a.empno); EMPNO ROWID ---------- ------------------ 123 AAAKKeACRAAABIbAAB 123 AAAKKeACRAAABIbAAC 123 AAAKKeACRAAABIbAAD 3 rows selected. Elapsed: 00:00:00.05 08:37:56 TIMS.DBA8:SQL>delete from employees a 08:38:21 2 where 08:38:25 3 rowid > 08:38:31 4 (select min (rowid) from employees 08:38:51 5 where empno = a.empno); 3 rows deleted. Elapsed: 00:00:00.02 08:39:05 TIMS.DBA8:SQL>select rowid, empno, last_name, deptno from employees; ROWID EMPNO LAST_NAME DEPTNO ------------------ ---------- -------------------- ---------- AAAKKeACRAAABIbAAA 123 Smith 22 1 row selected. Elapsed: 00:00:00.01
    0 pointsBadges:
    report
  • Qzx7lz
    Try this... select c1 from t1 where c1 in (select c1 from t1 group by c1 having count(*) > 1);
    0 pointsBadges:
    report
  • Qzx7lz
    Try this... select c1 from t1 where c1 in (select c1 from t1 group by c1 having count(*) > 1);
    0 pointsBadges:
    report
  • MrGriz
    If you are "SURE" you have duplicates and there is no way to distinguish the data in the two records then use a GROUP BY statement to find records HAVING a COUNT(*) > 1 (identical info While you are at it, find the MAX(ROWID) of those records and DELETE that ROWID - no more duplicate!
    0 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following