Latent Semantic Indexing

Discussion in 'Search Engine Optimization' started by hirengohel, Nov 22, 2010.

  1. #1
    How to Use Latent Semantic Indexing ( LSI ) Technology Principles to Improve Your Website Search Engine Ranking

    please replay..........
     
    hirengohel, Nov 22, 2010 IP
  2. nitivation

    nitivation Peon

    Messages:
    63
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    LATENT SEMANTIC INDEXING


    Regular keyword searches approach a document collection with a kind of accountant mentality: a document contains a given word or it doesn't, with no middle ground. We create a result set by looking through each document in turn for certain keywords and phrases, tossing aside any documents that don't contain them, and ordering the rest based on some ranking system. Each document stands alone in judgement before the search algorithm - there is no interdependence of any kind between documents, which are evaluated solely on their contents.

    Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.

    When you search an LSI-indexed database, the search engine looks at similarity values it has calculated for every content word, and returns the documents that it thinks best fit the query. Because two documents may be semantically very close even if they do not share a particular keyword, LSI does not require an exact match to return useful results. Where a plain keyword search will fail if there is no exact match, LSI will often return relevant documents that don't contain the keyword at all.

    To use an earlier example, let's say we use LSI to index our collection of mathematical articles. If the words n-dimensional, manifold and topology appear together in enough articles, the search algorithm will notice that the three terms are semantically close. A search for n-dimensional manifolds will therefore return a set of articles containing that phrase (the same result we would get with a regular search), but also articles that contain just the word topology. The search engine understands nothing about mathematics, but examining a sufficient number of documents teaches it that the three terms are related. It then uses that information to provide an expanded set of results with better recall than a plain keyword search.

    Ignorance is Bliss

    We mentioned the difficulty of teaching a computer to organize data into concepts and demonstrate understanding. One great advantage of LSI is that it is a strictly mathematical approach, with no insight into the meaning of the documents or words it analyzes. This makes it a powerful, generic technique able to index any cohesive document collection in any language. It can be used in conjunction with a regular keyword search, or in place of one, with good results.

    Before we discuss the theoretical underpinnings of LSI, it's worth citing a few actual searches from some sample document collections. In each search, a red title or astrisk indicates that the document doesn't contain the search string, while a blue title or astrisk informs the viewer that the search string is present.

    In an AP news wire database, a search for Saddam Hussein returns articles on the Gulf War, UN sanctions, the oil embargo, and documents on Iraq that do not contain the Iraqi president's name at all.
    Looking for articles about Tiger Woods in the same database brings up many stories about the golfer, followed by articles about major golf tournaments that don't mention his name. Constraining the search to days when no articles were written about Tiger Woods still brings up stories about golf tournaments and well-known players.
    In an image database that uses LSI indexing, a search on Normandy invasion shows images of the Bayeux tapestry - the famous tapestry depicting the Norman invasion of England in 1066, the town of Bayeux, followed by photographs of the English invasion of Normandy in 1944.
    In all these cases LSI is 'smart' enough to see that Saddam Hussein is somehow closely related to Iraq and the Gulf War, that Tiger Woods plays golf, and that Bayeux has close semantic ties to invasions and England. As we will see in our exposition, all of these apparently intelligent connections are artifacts of word use patterns that already exist in our document collection.
     
    nitivation, Nov 23, 2010 IP
  3. hirengohel

    hirengohel Peon

    Messages:
    16
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    hirengohel, Nov 23, 2010 IP
  4. contentboss

    contentboss Peon

    Messages:
    3,241
    Likes Received:
    54
    Best Answers:
    0
    Trophy Points:
    0
    #4
    how words are related to each other is very important when trying to deduce the 'sense' of an article. Our own product is the only context sensitive rewriter on the market, and it is VERY important.
     
    contentboss, Nov 23, 2010 IP
  5. T.J.

    T.J. Peon

    Messages:
    281
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    In simpler terms you can think of just using terms that are similar to your main keyword in the context of your webpage/article.

    For example if money is your main keyword, similar keywords could include: cash, revenue, income, monetary, ect.
     
    T.J., Nov 23, 2010 IP
    Jim4767 likes this.
  6. andykeating

    andykeating Peon

    Messages:
    27
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    I found the good article check this out..latentsemanticindexing[dot]com
     
    andykeating, Nov 24, 2010 IP
  7. keyideas

    keyideas Peon

    Messages:
    90
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Latent Semantic Indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular Value Decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts
     
    keyideas, Nov 24, 2010 IP
  8. chestercaldwel

    chestercaldwel Peon

    Messages:
    326
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Google as a leading search engine provider is working to improve its search results by continuously updating its algorithms to provide its visitors with relevant, useful and fresh information. One of the technologies, Google uses in its algorithm to provide users with relevant information is Latent Semantic Indexing (LSI).
    Latent Semantic Indexing (LSI) is an information retrieval technology that retrieves information from websites and information repositories based on the vector space model of document classification. Through the use of LSI technology, relevant information can be retrieved from a collection of documents based on its concept, even if the document does not have the search word.LSI helps search engine to retrieve data based on the context of the search query.
    Google has long been using Latent Semantic Indexing for adsense to display relevant advertisement on adsense publishers websites. Undeniable evidence suggests that Google is also giving LSI more emphasis in its search algorithm to provide its visitors with relevant information.
     
    chestercaldwel, Nov 24, 2010 IP
  9. Rajeev123

    Rajeev123 Peon

    Messages:
    100
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Latent Semantic Indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular Value Decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.
     
    Rajeev123, Nov 24, 2010 IP
  10. M&A_Advisory

    M&A_Advisory Guest

    Messages:
    51
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    great contribution, Nativation.
     
    M&A_Advisory, Dec 6, 2010 IP