What strategies for deduping the queue

Discussion in 'Directories' started by sarahk, Aug 28, 2009.

  1. #1
    I'm just working through some of my old and somewhat neglected directories and deduping them. After deleting 4000+ records of spam (simple database queries did the trick there) I'm up to the more complex and subtle dedupes. Things like submitting once with the www and once without. Once with the http and once without. I've even got .net and .com dupes. That's real estate directories for you :)

    My submission form will stop some of this in the future but are there any good tricks for the cleanup that I'm facing?
     
    sarahk, Aug 28, 2009 IP
  2. Obelia

    Obelia Notable Member

    Messages:
    2,083
    Likes Received:
    171
    Best Answers:
    0
    Trophy Points:
    210
    #2
    Have you used a DISTINCT query on most of the fields? Dupes can have similar urls and titles, but keywords, descriptions and meta descriptions might also throw up a few you haven't noticed yet.
     
    Obelia, Aug 28, 2009 IP
  3. robjones

    robjones Notable Member

    Messages:
    4,256
    Likes Received:
    405
    Best Answers:
    1
    Trophy Points:
    290
    #3
    Sarah - You might try asking Brad (bldarter) over at Dmoz... he spent an inordinate amount of time doing the same thing over there as I recall.
     
    robjones, Aug 28, 2009 IP