What do you mean by "the primary key is different"?
Duplicates are defined with respect to some key definition. The standard way of identifying duplicates is:
SELECT ... FROM table t1
WHERE EXISTS
(SELECT 1 FROM table t2
WHERE t2.key_col1 = t1.key_col1
...............
AND t2.key_colN = t1.key_colN
AND t2.rowid != t1.rowid)
ORDER BY
t1.key_col1, ..., t1.key_colN
/
This assumes that the "key" columns aren't defined as a (primary or unique) key constraint in the database (but there should be defined an index on the columns if the table is large!)
If the "key" columns are defined as a key constraint (and you're getting duplicate errors when trying to INSERT), CREATE a temp table AS SELECT from the table in question (plus CREATE an index on the key columns), and perform the query above on the temp table.
Regards,
ActorJack
Last Wiki Answer Submitted: May 24, 2004 8:09 am by ActorJack0 pts.
If you live outside the United States, by submitting your email address you consent to having your personal data transferred to and processed in the United States.
I am assuming that the table with duplicate rows does not have a primary key. If a PK exists, there cannot be duplicate rows. The following code using correlated subquery identifies the duplicate rows and then deletes all of them except one. I first create a table without a PK, insert four duplicate rows into it, identify them, and then delete all but one from the table.
Elapsed: 00:00:01.27
08:34:40 TIMS.DBA8:SQL>insert into employees values (123, ‘Smith’, 22);
1 row created.
Elapsed: 00:00:00.10
08:35:14 TIMS.DBA8:SQL>insert into employees values (123, ‘Smith’, 22);
1 row created.
Elapsed: 00:00:00.01
08:35:24 TIMS.DBA8:SQL>insert into employees values (123, ‘Smith’, 22);
1 row created.
Elapsed: 00:00:00.01
08:35:27 TIMS.DBA8:SQL>insert into employees values (123, ‘Smith’, 22);
1 row created.
Elapsed: 00:00:00.01
08:35:30 TIMS.DBA8:SQL>select * from employees;
EMPNO LAST_NAME DEPTNO
———- ——————– ———-
123 Smith 22
123 Smith 22
123 Smith 22
123 Smith 22
4 rows selected.
Elapsed: 00:00:00.02
08:35:41 TIMS.DBA8:SQL>select rowid, empno, last_name, deptno from employees;
ROWID EMPNO LAST_NAME DEPTNO
—————— ———- ——————– ———-
AAAKKeACRAAABIbAAA 123 Smith 22
AAAKKeACRAAABIbAAB 123 Smith 22
AAAKKeACRAAABIbAAC 123 Smith 22
AAAKKeACRAAABIbAAD 123 Smith 22
4 rows selected.
Elapsed: 00:00:00.01
08:36:42 TIMS.DBA8:SQL>select empno, rowid from employees a
08:37:19 2 where rowid >
08:37:25 3 (select min (rowid) from employees
08:37:43 4 where empno = a.empno);
Elapsed: 00:00:00.05
08:37:56 TIMS.DBA8:SQL>delete from employees a
08:38:21 2 where
08:38:25 3 rowid >
08:38:31 4 (select min (rowid) from employees
08:38:51 5 where empno = a.empno);
3 rows deleted.
Elapsed: 00:00:00.02
08:39:05 TIMS.DBA8:SQL>select rowid, empno, last_name, deptno from employees;
If you are “SURE” you have duplicates and there is no way to distinguish the data in the two records then
use a GROUP BY statement
to find records HAVING a COUNT(*) > 1 (identical info
While you are at it, find the MAX(ROWID) of those records and DELETE that ROWID –
I am assuming that the table with duplicate rows does not have a primary key. If a PK exists, there cannot be duplicate rows. The following code using correlated subquery identifies the duplicate rows and then deletes all of them except one. I first create a table without a PK, insert four duplicate rows into it, identify them, and then delete all but one from the table.
08:33:34 TIMS.DBA8:SQL>create table employees
08:33:53 2 (empno number,
08:34:08 3 last_name varchar2 (20),
08:34:21 4 deptno number)
08:34:32 5 tablespace userdata;
Table created.
Elapsed: 00:00:01.27
08:34:40 TIMS.DBA8:SQL>insert into employees values (123, ‘Smith’, 22);
1 row created.
Elapsed: 00:00:00.10
08:35:14 TIMS.DBA8:SQL>insert into employees values (123, ‘Smith’, 22);
1 row created.
Elapsed: 00:00:00.01
08:35:24 TIMS.DBA8:SQL>insert into employees values (123, ‘Smith’, 22);
1 row created.
Elapsed: 00:00:00.01
08:35:27 TIMS.DBA8:SQL>insert into employees values (123, ‘Smith’, 22);
1 row created.
Elapsed: 00:00:00.01
08:35:30 TIMS.DBA8:SQL>select * from employees;
EMPNO LAST_NAME DEPTNO
———- ——————– ———-
123 Smith 22
123 Smith 22
123 Smith 22
123 Smith 22
4 rows selected.
Elapsed: 00:00:00.02
08:35:41 TIMS.DBA8:SQL>select rowid, empno, last_name, deptno from employees;
ROWID EMPNO LAST_NAME DEPTNO
—————— ———- ——————– ———-
AAAKKeACRAAABIbAAA 123 Smith 22
AAAKKeACRAAABIbAAB 123 Smith 22
AAAKKeACRAAABIbAAC 123 Smith 22
AAAKKeACRAAABIbAAD 123 Smith 22
4 rows selected.
Elapsed: 00:00:00.01
08:36:42 TIMS.DBA8:SQL>select empno, rowid from employees a
08:37:19 2 where rowid >
08:37:25 3 (select min (rowid) from employees
08:37:43 4 where empno = a.empno);
EMPNO ROWID
———- ——————
123 AAAKKeACRAAABIbAAB
123 AAAKKeACRAAABIbAAC
123 AAAKKeACRAAABIbAAD
3 rows selected.
Elapsed: 00:00:00.05
08:37:56 TIMS.DBA8:SQL>delete from employees a
08:38:21 2 where
08:38:25 3 rowid >
08:38:31 4 (select min (rowid) from employees
08:38:51 5 where empno = a.empno);
3 rows deleted.
Elapsed: 00:00:00.02
08:39:05 TIMS.DBA8:SQL>select rowid, empno, last_name, deptno from employees;
ROWID EMPNO LAST_NAME DEPTNO
—————— ———- ——————– ———-
AAAKKeACRAAABIbAAA 123 Smith 22
1 row selected.
Elapsed: 00:00:00.01
Try this…
select c1 from t1
where c1 in (select c1 from t1
group by c1
having count(*) > 1);
Try this…
select c1 from t1
where c1 in (select c1 from t1
group by c1
having count(*) > 1);
If you are “SURE” you have duplicates and there is no way to distinguish the data in the two records then
use a GROUP BY statement
to find records HAVING a COUNT(*) > 1 (identical info
While you are at it, find the MAX(ROWID) of those records and DELETE that ROWID –
no more duplicate!