Here is my situation, a rather large software company that utilizes Google for their internal search function has asked me to help build a "learning" database that will allow results to be posted in search even if it is not exact match. We currently have a database of many of the common misspellings "those typed in enough to get lots of attention " but they want a much more extensive list of common misspellings and also synonyms for so that they are able to match and offer results much more comprehensively. We are currently looking for all sites, techniques, software and suggestions as to harvesting these common misspellings and synonyms and your help is greatly appreciated. If anyone knows of such resources for misspellings etc please email me, post or IM. Moving forward they want to build a much larger database that can incorporate some sort of "'learning" function so that we can constantly and dynamically update the database and hope to have it all (Or mostly) automated. I am thinking of programming it so that once a kw has been entered past a threshold value or number of times it is flagged but my system still would require manual review of the flagged terms in order to match them up with the appropriate product and this could be daunting as there are thousands of products. Any thoughts on streamlining this process would be appreciated as well Cheers SEO Guy
MSN's search uses this capability, pretty well. I'd try to piggy-back their technology, perhaps writing a script with all of the words in the english dictionary for starters and have it create misspellings based off of closeby keys, addditional letters , and other similar ideas.. anyway you do it, it will be a chore. HTH
another source BESIDES siteowners - with their access_log as source of misspelled words entered to find their site - are the many spell checkers avaiable on the market whenever a spellcheck utility offers a selection of corrected word it has a matched misspell in its db .. including spell checkers from linux world and browsers of course ! if that company goes PUBLIC with name and URL and offers a tool to enter common website relevant misspelllings for EASY submission ( by email !! ? ), then i may also submit - and other site owners may be as well. depends on WHO the db/SE owns and if for pay inclusion of FREE
A pair of places you can look for help with such matters are two related but distinct usenet groups: alt.usage.english and alt.english.usage Their memberships overlap a fair bit, but each is worth trying. I have seen all sorts of strange language-related data and databases that one or another regular there knew of.