Removing duplicate rows with unique ids

Discussion in 'MySQL' started by Baz@rr, May 12, 2006.

  1. #1
    Hi guys.

    Being a complete know-nowt when it comes to databases I'm looking for a bit of help here.

    I have a table with 81,000 entries, but I know there are a lot of duplicates in there. What I'm looking to do is root out the duplicates and keep just one instance of the row.

    The only problem is that the rows all have unique ids, but it doesn't matter if these remain intact or if they're reset or what have you.

    I can manage to pull up a list of the duplicates using this:

    SELECT title, postcode, count(*)
    FROM idx_link
    GROUP BY title, postcode
    HAVING count(*) > 1

    I have about 12 fields in the table. Dunno if that makes a difference!
     
    Baz@rr, May 12, 2006 IP
  2. TheGuy

    TheGuy Peon

    Messages:
    138
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    What Database are you using?
    I assume it to be MySQL.

    Do you have multiple instances of title and postcode both?

    The problem with this is, say if you have same title for 3 different rows. Ex:

    'this is a title', 'somepostcode', 'somefield','someanotherfield' ...
    'this is a title', 'somepostcode1', 'somefield1','someanotherfield1' ...
    'this is a title', 'somepostcode2', 'somefield2','someanotherfield2' ...

    Now, if you wanna keep only one row, you will delete rest of the 2 rows. If this is OK with you and you don't care which row out of all the multiple row is saved, try this:

    select max(postcode) from table group by title;

    This would give you all the unique rows from the table. You can write a temporary php/perl/python/or whatever script, which would get the rows and save them in another temp table. Then you can drop the original table and rename the new table as idx_link. Note: moving about 80,000 records might take a while, depending on your server capabilities.

    I don't think there is any way to do this in MySQL directly, but it wouldn't hurt checking mysql documentation.
     
    TheGuy, May 12, 2006 IP
  3. Baz@rr

    Baz@rr Well-Known Member

    Messages:
    140
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    123
    #3
    Er...my head just melted.

    Lots of records have some duplicate information, but are unique entries. For example there are dozens with the same postcode, and dozens with the same title, but many of these may well be unique. If something matches both the title and postcode, then it's a duplicate entry and can be removed, leaving one instance of it in the table.

    Make sense?
     
    Baz@rr, May 12, 2006 IP
  4. rosytoes

    rosytoes Peon

    Messages:
    230
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #4
    backup your database and try this:
    delete t1 FROM idx_link t1,idx_link t2 WHERE t1.title=t2.title AND t1.postcode=t2.postcode AND t1.id<t2.id
    Code (markup):
    RC
     
    rosytoes, May 13, 2006 IP
  5. Baz@rr

    Baz@rr Well-Known Member

    Messages:
    140
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    123
    #5
    Tried that on my test db which is identical to my live one. It first of all asked me to confirm if I really wanted to do it, then when I said yes I got:

    #1054 - Unknown column 't1.id' in 'where clause'

    :(
     
    Baz@rr, May 14, 2006 IP
  6. rosytoes

    rosytoes Peon

    Messages:
    230
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #6
    replace id with the actual field name of your idx_link table id (the autoincrement one)

    RC
     
    rosytoes, May 14, 2006 IP
  7. Baz@rr

    Baz@rr Well-Known Member

    Messages:
    140
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    123
    #7
    Duuuh. Even I should have worked that one out!

    Excellent, RC, thank you - that worked a treat.
     
    Baz@rr, May 14, 2006 IP