Hi guys. Being a complete know-nowt when it comes to databases I'm looking for a bit of help here. I have a table with 81,000 entries, but I know there are a lot of duplicates in there. What I'm looking to do is root out the duplicates and keep just one instance of the row. The only problem is that the rows all have unique ids, but it doesn't matter if these remain intact or if they're reset or what have you. I can manage to pull up a list of the duplicates using this: SELECT title, postcode, count(*) FROM idx_link GROUP BY title, postcode HAVING count(*) > 1 I have about 12 fields in the table. Dunno if that makes a difference!
What Database are you using? I assume it to be MySQL. Do you have multiple instances of title and postcode both? The problem with this is, say if you have same title for 3 different rows. Ex: 'this is a title', 'somepostcode', 'somefield','someanotherfield' ... 'this is a title', 'somepostcode1', 'somefield1','someanotherfield1' ... 'this is a title', 'somepostcode2', 'somefield2','someanotherfield2' ... Now, if you wanna keep only one row, you will delete rest of the 2 rows. If this is OK with you and you don't care which row out of all the multiple row is saved, try this: select max(postcode) from table group by title; This would give you all the unique rows from the table. You can write a temporary php/perl/python/or whatever script, which would get the rows and save them in another temp table. Then you can drop the original table and rename the new table as idx_link. Note: moving about 80,000 records might take a while, depending on your server capabilities. I don't think there is any way to do this in MySQL directly, but it wouldn't hurt checking mysql documentation.
Er...my head just melted. Lots of records have some duplicate information, but are unique entries. For example there are dozens with the same postcode, and dozens with the same title, but many of these may well be unique. If something matches both the title and postcode, then it's a duplicate entry and can be removed, leaving one instance of it in the table. Make sense?
backup your database and try this: delete t1 FROM idx_link t1,idx_link t2 WHERE t1.title=t2.title AND t1.postcode=t2.postcode AND t1.id<t2.id Code (markup): RC
Tried that on my test db which is identical to my live one. It first of all asked me to confirm if I really wanted to do it, then when I said yes I got: #1054 - Unknown column 't1.id' in 'where clause'