Google gets Patent for "Similarity-Engine"

Discussion in 'Google' started by saintdw, Jan 7, 2007.

  1. #1
    Don't search harder, search smarter

    According to a filing issued to the U.S. Patent and Trademark Office, patent 7,158,961, Google is working on deploying a "similarity-engine."

    Google's patent filing says:

    "From the search engine's perspective, one problem in cataloging the large number of available web pages is that multiple ones of the web documents are often identical or nearly identical. Separately cataloging similar documents is inefficient and can be frustrating for the user if, in response to a request, a list of nearly identical documents is returned. Accordingly, it is desirable for the search engine to identify documents that are similar or "roughly the same" so that this type of redundancy in search results can be avoided"

    According to Google, the similarity-engine will be based on creating and calculating differences and sums in vectors. Using hashes and what Google calls "sketches," its engine will be able to compare differences in text as well as images. The similarity-engine will take an object, create an vector for it, and compare the vector to that of another object.

    Further into Google's filing, the search giant also describes the use of its similarity-engine in other applications. Besides web documents, the engine can be used to compare regular text documents, spreadsheets, presentations and other commonly used office productivity data.

    "The concepts described could also be implemented based on any object that contains a series of discrete elements," the filing emphasized.

    I wonder how this will affect the search in near future. :eek:

    Added link:
     
    saintdw, Jan 7, 2007 IP
  2. 2003m2003

    2003m2003 Well-Known Member

    Messages:
    863
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    138
    #2
    Other search engine won't able to use this so call 'similarity engine' feature and I think that is not good for search engine development.
     
    2003m2003, Jan 7, 2007 IP
  3. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Sounds interesting.. have a link to the document?

    1. Seems to be further Duplicate content filtering
    2. Just because a Patent is filed, doesn't mean it get's used. That is an assumption
     
    thegypsy, Jan 7, 2007 IP
  4. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
  5. adnan

    adnan Peon

    Messages:
    1,614
    Likes Received:
    82
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I can tell u this much.

    No matter what u do.

    Humans are better.

    There will always be some type of issue going on, imo.
     
    adnan, Jan 7, 2007 IP
  6. aaron_nimocks

    aaron_nimocks Im kind of a big deal Staff

    Messages:
    5,563
    Likes Received:
    627
    Best Answers:
    0
    Trophy Points:
    420
    #6
    aaron_nimocks, Jan 7, 2007 IP
  7. sunmoon

    sunmoon Peon

    Messages:
    466
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Google is ever learning and getting new patents. I think this 'similarity' thing is very true. So, content is king i guest. If your site have unique content, then your vector will be different. BUT what happens if somebody duplicate you and create hundreds of YOU? You get wash away by your own content.
     
    sunmoon, Jan 7, 2007 IP