I have a table in which I'd like a column to have no duplicate data. I'm familiar with the old standard of using HAVING and GROUP BY to identify exact matches, and the actual hard UNIQUE constraint for future data. This is proving to not be good enough though. What I'd like is to produce a query/algorithm to identify extremely similar data (ie. if only 2-3 characters are off in a 200 character string, I need to kill one of those rows), so that I can scrub up my data a bit better. It's making my brain spin. Any hints
you could try to build your query in such manner to check every 10th - 15th - 20th word for example, if they are or not similar. Or build a procedure/function to do that.