1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Misspellings and synonyms

Discussion in 'Programming' started by SEO Guy, May 1, 2004.

  1. #1
    Here is my situation, a rather large software company that utilizes Google for their internal search function has asked me to help build a "learning" database that will allow results to be posted in search even if it is not exact match.

    We currently have a database of many of the common misspellings "those typed in enough to get lots of attention " but they want a much more extensive list of common misspellings and also synonyms for so that they are able to match and offer results much more comprehensively. We are currently looking for all sites, techniques, software and suggestions as to harvesting these common misspellings and synonyms and your help is greatly appreciated. If anyone knows of such resources for misspellings etc please email me, post or IM.

    Moving forward they want to build a much larger database that can incorporate some sort of "'learning" function so that we can constantly and dynamically update the database and hope to have it all (Or mostly) automated. I am thinking of programming it so that once a kw has been entered past a threshold value or number of times it is flagged but my system still would require manual review of the flagged terms in order to match them up with the appropriate product and this could be daunting as there are thousands of products. Any thoughts on streamlining this process would be appreciated as well
    Cheers
    SEO Guy
     
    SEO Guy, May 1, 2004 IP
  2. schlottke

    schlottke Peon

    Messages:
    2,185
    Likes Received:
    63
    Best Answers:
    0
    Trophy Points:
    0
    #2
    MSN's search uses this capability, pretty well. I'd try to piggy-back their technology, perhaps writing a script with all of the words in the english dictionary for starters and have it create misspellings based off of closeby keys, addditional letters ;) , and other similar ideas.. anyway you do it, it will be a chore.

    HTH
     
    schlottke, May 1, 2004 IP
  3. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #3
    another source BESIDES siteowners - with their access_log as source of misspelled words entered to find their site -

    are the many spell checkers avaiable on the market
    whenever a spellcheck utility offers a selection of corrected word it has a matched misspell in its db ..
    including spell checkers from linux world and browsers of course !

    if that company goes PUBLIC with name and URL and offers a tool to enter common website relevant misspelllings for EASY submission ( by email !! ? ), then i may also submit - and other site owners may be as well.

    depends on WHO the db/SE owns and if for pay inclusion of FREE
     
    hans, May 1, 2004 IP
  4. Owlcroft

    Owlcroft Peon

    Messages:
    645
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #4
    A pair of places you can look for help with such matters are two related but distinct usenet groups:

    alt.usage.english

    and

    alt.english.usage
    Their memberships overlap a fair bit, but each is worth trying. I have seen all sorts of strange language-related data and databases that one or another regular there knew of.
     
    Owlcroft, May 3, 2004 IP
  5. nlopes

    nlopes Guest

    Messages:
    103
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    One common option is ude PHP with Pspell that uses a dictionary to make corrections and sugestions.
     
    nlopes, May 9, 2004 IP