Log in or Sign up

identify rows with 'extremely similar' text

Discussion in 'Databases' started by qwidjib0, Feb 28, 2011.

qwidjib0 Active Member

Messages:

94

Likes Received:

2

Best Answers:

0

Trophy Points:

93

#1

I have a table in which I'd like a column to have no duplicate data. I'm familiar with the old standard of using HAVING and GROUP BY to identify exact matches, and the actual hard UNIQUE constraint for future data. This is proving to not be good enough though.

What I'd like is to produce a query/algorithm to identify extremely similar data (ie. if only 2-3 characters are off in a 200 character string, I need to kill one of those rows), so that I can scrub up my data a bit better. It's making my brain spin. Any hints

qwidjib0, Feb 28, 2011 IP
pallmall Peon

Messages:

104

Likes Received:

3

Best Answers:

0

Trophy Points:

0

#2

you could try to build your query in such manner to check every 10th - 15th - 20th word for example, if they are or not similar. Or build a procedure/function to do that.

pallmall, Mar 2, 2011 IP

(You must log in or sign up to reply here.)