I recently wrote a few articles on LSI, latent semantic indexing, and it was quite fascinating. Does anyone know how much Google cares about it? Is it extremely important?
Although Google's algorithm Patent DOES NOT specifically mention LSI, it does mention a similar system. Here is paragraph from their patent: "The system is further adapted to identify phrases that are related to each other, based on a phrase's ability to predict the presence of other phrases in a document. More specifically, a prediction measure is used that relates the actual co-occurrence rate of two phrases to an expected co-occurrence rate of the two phrases. Information gain, as the ratio of actual co-occurrence rate to expected co-occurrence rate, is one such prediction measure. Two phrases are related where the prediction measure exceeds a predetermined threshold. In that case, the second phrase has significant information gain with respect to the first phrase. Semantically, related phrases will be those that are commonly used to discuss or describe a given topic or concept, such as 'President of the United States' and 'White House.' For a given phrase, the related phrases can be ordered according to their relevance or significance based on their respective prediction measures."
Latent Semantic Indexing (LSI) is a unique information retrieval method developed that improves your ability to find applicable information. Using a powerful and fully automatic statistical algorithms LSI can retrieve relevant documents even when they do not share any words with your query — concepts replace keywords to improve retrieval. Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant.
Google does not use LSI because LSI does not give good results on large non-homogeneous document collections. You will find that those who believe that Google is using LSI do not know what LSI actually is... Information retrieval experts like Dr Garcia and others have been debunking the 'Google uses LSI' myth for years. - Michael