Duplicate record deletion using SQL/400

2460 pts.
Tags:
AS/400
SQL
SQL/400
Hi,
How do I delete duplicate records using SQL/400 with SQL query please?
Thanks!


Software/Hardware used:
AS400
0

Answer Wiki

Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

Discuss This Question: 16  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.
  • philpl1jb

    This should get you there:

    https://search400.techtarget.com/tip/Delete-duplicate-records-using-SQL-in-iSeries-400

    the trick is in getting the where clause to truly identify identical records.

    54,090 pointsBadges:
    report
  • ToddN2000
    Try it using the info that philp1jb provided. If you are still having problems, post your code here and we will take a look. The key is making sure you are identifying a duplicate on enough columns. If not you may end up deleting more data than you want.
    126,110 pointsBadges:
    report
  • 6r
    SQL query mentioned in link seems bit complex not understandable
    are there 2 files f1 and f2?

    Delete from File1 F1 where RRN(f1) > (select MIN(RRN(F2)) from File1 F2 where F2.Field1 = F1.Field1)

    If there is more than one field as key add them as part of the WHERE clause. Note that F1 and F2 are pointing to the same file.

    2,460 pointsBadges:
    report
  • SimonD
    This page might help -> Removing duplicate records from a file
    80 pointsBadges:
    report
  • philpl1jb

    To do this process in SQL requires accessing the file with duplicates twice so you can match up dups.  When it says File1 F1

    It's giving that access the working name of F1 .. and fields in that access of the file are qualified as F1.fldname

    File1 F2 is the second path into the data and any fields from this access are named F2.fldname

    So the logic says

    Delete from File1 F1

    What a record is found in File1 F2 that isn't the same record (where RRN(f1) > min(RRN(F2)

    and

    where the fields of interest in access F1 = F2

    Delete from File1 F1 where RRN(f1) > (select MIN(RRN(F2)) from File1 F2 where F2.Field1 = F1.Field1)

    54,090 pointsBadges:
    report
  • CharlieBrowne

    Here is a simple SQL that I use. All you need to do is change the file name and the WHERE clause that is specify the data elements that identify duplicate records.

    DELETE FROM CBROWNE/U$LNMACHIN A                     
    WHERE RRN(A) >                                       
    (SELECT MIN(RRN(B)) FROM CBROWNE/U$LNMACHIN B        
    WHERE A.MI_PCORD# = B.MI_PCORD#)                     

    62,385 pointsBadges:
    report
  • 6r
    Let's say there is a file ID6,in jkjk2 lib.
    and Part ID and Quantity are it's two fields.
    and below is the set of records:-
     PARTID  QUANTITY                 
         1     400                    
         2     380                    
         3     100                    
         1     10                     
         1     20                     
         2     10                     
         2     20                     
         2     20                     
         1     10                     
         1     10                     
     ********  End of data  ********  
                                      
    When running the SQL query  to duplicate these records:-
    DELETE FROM jkjk2/id6 A              
    WHERE RRN(A) >                       
    (SELECT MIN(RRN(B)) FROM jkjk2/ID6 B 
    WHERE A.PARTID = B.PARTID)           

    Result:-

                                                 
     Position to line  . . . . .                 
     ....+....1....+.                            
     PARTID  QUANTITY                            
         1     400                               
         2     380                               
         3     100                               
     ********  End of data  ********             
           
    so why is it deleting other records for part id 1 whose quantities are like 10,20.
    where as in PF definition have not specified any unique key.
    main intension is only duplicate records should be deleted but seems this query is not working as desired.

                                                                            
    Thanks 
    2,460 pointsBadges:
    report
  • 6r
    any updates on same please?

    Thanks.
    2,460 pointsBadges:
    report
  • 6r
    and for below sql query:-

    Delete from File1 F1 where RRN(f1) > (select MIN(RRN(F2)) from File1 F2 where F2.Field1 = F1.Field1)

    where should this file 2 F2 come from with respect to current scenarion where there is only one file id6 (which just have 2 fields part id and quantity so or this where does this query fit to delete duplicate records ?
    2,460 pointsBadges:
    report
  • CharlieBrowne

    Try this

    DELETE FROM jkjk2/id6 A              
    WHERE RRN(A) >                       
    (SELECT MIN(RRN(B)) FROM jkjk2/ID6 B 
    WHERE A.PARTID = B.PARTID and a.quantity = b.quantity)           

    Result:-
    62,385 pointsBadges:
    report
  • ToddN2000
    When checking for duplicates you need to compare all the columns that you want to eliminate. AS an example if you were check for duplicates in a warehouse as well the it would be..

    WHERE A.PARTID = B.PARTID and a.quantity = b.quantity and a.WH = b.WH 

    If also by a company...
    WHERE A.PARTID = B.PARTID and a.quantity = b.quantity and a.WH = b.WH and a.Company = b.Company
    126,110 pointsBadges:
    report
  • 6r
    Thanks a lot below SQL query giving perfect result:-

    DELETE FROM jkjk2/id6 A              
    WHERE RRN(A) >                       
    (SELECT MIN(RRN(B)) FROM jkjk2/ID6 B 
    WHERE A.PARTID = B.PARTID and a.quantity = b.quantity)           

    Thanks.


    2,460 pointsBadges:
    report
  • 6r
    one generic query here suppose if there are many fields let's say 40-50 fields then each time do we need to a.field1= b.field1 and b.field1=b.field2 will habe to do when A,B,C,D will stop after 26 alphabet how wil make equal these what next after z.field26=?.field26 and after 26th whilch letter to make dot after to equal these fields?

    Thanks
    2,460 pointsBadges:
    report
  • PGMBOB
    I tried an experiment.
    The last step would be to link the duplicates for the deletion.
    I created a file and inserted the following.
    you could also run the delete removing the maximum RRN.
    Run it several times and no duplicates will be left.


    FIELD1 FIELD2 FIELD3 FIELD4

    a a a a
    a a a a
    a a a a
    a a a a
    b b b b
    b b b b
    b b b b
    b b b b
    b b b b
    c c c c
    d d d d

    Duplicates:
    SELECT field1,field2,field3,field4 ,count(*) FROM FILEA GROUP BY
    field1,field2,field3,field4 ORDER BY field1,field2,field3,field4

    FIELD1 FIELD2 FIELD3 FIELD4 COUNT ( * )
    a a a a 4 dup
    b b b b 5 dup
    c c c c 1 not dup
    d d d d 1 not dup

    Only duplicates
    SELECT field1,field2,field3,field4 ,count(*) FROM FILEA GROUP BY
    field1,field2,field3,field4
    HAVING count(*) > 1
    ORDER BY
    field1,field2,field3,field4

    FIELD1 FIELD2 FIELD3 FIELD4 COUNT ( * )
    a a a a 4
    b b b b 5

    Here are the minimum RRN and maximum RRN of the duplicates
    SELECT min(rrn(filea)),max(rrn(filea)), field1,field2,field3,field4
    FROM FILEA GROUP BY field1,field2,field3,field4 HAVING count(*) > 1
    ORDER BY field1,field2,field3,field4
    Good Luck
    PGMBOB
    1,235 pointsBadges:
    report
  • philpl1jb

    6r .. you are only comparing 2 access paths to the same file, path a and path b.  you can compare as many fields as you need to determine if it's unique.

    DELETE FROM jkjk2/id6 A              
    WHERE RRN(A) >                       
    (SELECT MIN(RRN(B)) FROM jkjk2/ID6 B 
    WHERE A.PARTID = B.PARTID and a.quantity = b.quantity)   
    and A.Shelf = B.Shelf and A.Color = B.Color and A.Size = B.Size and A.Width = B.Width and A.DateCode = B.DateCode


    54,090 pointsBadges:
    report
  • philpl1jb

    But if you were working with an SQL that needed 27 files, that would be no problem .. we named the path a and b but you could name the paths Bob and Harry

    From MyFile1.Bob Join MyFile2.Harry

    on Bob.ID = Harry.ID

    54,090 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.

Following

Share this item with your network: