Don't search harder, search smarter According to a filing issued to the U.S. Patent and Trademark Office, patent 7,158,961, Google is working on deploying a "similarity-engine." Google's patent filing says: "From the search engine's perspective, one problem in cataloging the large number of available web pages is that multiple ones of the web documents are often identical or nearly identical. Separately cataloging similar documents is inefficient and can be frustrating for the user if, in response to a request, a list of nearly identical documents is returned. Accordingly, it is desirable for the search engine to identify documents that are similar or "roughly the same" so that this type of redundancy in search results can be avoided" According to Google, the similarity-engine will be based on creating and calculating differences and sums in vectors. Using hashes and what Google calls "sketches," its engine will be able to compare differences in text as well as images. The similarity-engine will take an object, create an vector for it, and compare the vector to that of another object. Further into Google's filing, the search giant also describes the use of its similarity-engine in other applications. Besides web documents, the engine can be used to compare regular text documents, spreadsheets, presentations and other commonly used office productivity data. "The concepts described could also be implemented based on any object that contains a series of discrete elements," the filing emphasized. I wonder how this will affect the search in near future. Added link:
Other search engine won't able to use this so call 'similarity engine' feature and I think that is not good for search engine development.
Sounds interesting.. have a link to the document? 1. Seems to be further Duplicate content filtering 2. Just because a Patent is filed, doesn't mean it get's used. That is an assumption
I can tell u this much. No matter what u do. Humans are better. There will always be some type of issue going on, imo.
Google is ever learning and getting new patents. I think this 'similarity' thing is very true. So, content is king i guest. If your site have unique content, then your vector will be different. BUT what happens if somebody duplicate you and create hundreds of YOU? You get wash away by your own content.