I'm just working through some of my old and somewhat neglected directories and deduping them. After deleting 4000+ records of spam (simple database queries did the trick there) I'm up to the more complex and subtle dedupes. Things like submitting once with the www and once without. Once with the http and once without. I've even got .net and .com dupes. That's real estate directories for you My submission form will stop some of this in the future but are there any good tricks for the cleanup that I'm facing?
Have you used a DISTINCT query on most of the fields? Dupes can have similar urls and titles, but keywords, descriptions and meta descriptions might also throw up a few you haven't noticed yet.
Sarah - You might try asking Brad (bldarter) over at Dmoz... he spent an inordinate amount of time doing the same thing over there as I recall.