With Googles newest Algorithm, is True LSI possible

Discussion in 'Google' started by jsolochek, Jan 21, 2007.

  1. #1
    LSI is something that needs human editors, fuzzy logic and a whole lot more. Do you think Google, without human intervention, will be able to successfully integrate LSI when lloking at sites and deciding whether or not the content is original or not?

    Software can be set up so that it appears to be pretty smart and I think it may be possible someday but I think technology will have to bu much greater than it is now.
     
    jsolochek, Jan 21, 2007 IP
  2. thetafferboy83

    thetafferboy83 Active Member

    Messages:
    312
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    70
    #2
    I'm pretty sure Google use LSI tech now. I don't think it really needs human intevention.
     
    thetafferboy83, Jan 22, 2007 IP
  3. MattUK

    MattUK Notable Member

    Messages:
    6,950
    Likes Received:
    377
    Best Answers:
    0
    Trophy Points:
    275
    #3
    LSI doesn't have anything to do with original content, it's to do with the relationships between words. Something that could be done automatically and I'm sure is already being used.
     
    MattUK, Jan 22, 2007 IP
  4. thetafferboy83

    thetafferboy83 Active Member

    Messages:
    312
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    70
    #4
    I think he meant dup content was a problem when looking at relationships between words on the scale of the web? However, the short is I agree with you. Google has its duplicate content thing sorted, so I don't see how it would affect any LSI algos.
     
    thetafferboy83, Jan 22, 2007 IP
  5. jsolochek

    jsolochek Peon

    Messages:
    17
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    This stuff could only be automated if we put a database together of just nouns, another one for adverbs, then for adjectives, then for verbs and then a program that know sentence structure. And this would only solve the LSI question in English and would not take into account the way different people pronounce words differently.

    Can you imagine how large of a hard drive system would have to be in place just to do this for the English Language?

    The only real way to do this effectively is with Humans, one for each type of accent or pronuciation.
     
    jsolochek, Jan 22, 2007 IP
  6. saneinsight

    saneinsight Guest

    Messages:
    159
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    jsolochek, you are way off track. Ever heard of distributed computing, load balancing etc etc? It's not a case of just having one hard drive. You'd have to have hundreds of machines to balance the load. This makes scaling possible.

    Have you ever head of database tables? I'm guessing not, as you think a new database for each part of speech will do the job.

    Sentence structure which is known as syntax is only a small part of linguistics, thats been around for decades. see http://en.wikipedia.org/wiki/Linguistics

    You ain't got a cat in hell's chance of using humans to replace a LSI system. Google do use LSI in their algo. With billions of pages out their, do you really think a team of people will be able to replace a LSI system? There is no human intervention.
     
    saneinsight, Jan 22, 2007 IP
  7. jsolochek

    jsolochek Peon

    Messages:
    17
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I am not some newbie with a limited understanding of what
    I Talk about. I used to be a 4GL programmer back in the days of Dbase III and informix SQL. I know what a table is and such and I know of the original concept of what latent semantec indexing was meant to do.

    There are so many sites out there these days that have junk content, a lot of duplicate content that there had to be a way that these sites could be differentiated from the sites that provided good information.

    I remember all those Adsense sites that were given away by marketers and these software site generators that generated pages and pages of text but if you read these you could tell that they were sites put together using Keyword Spamming, they were not readable by Humans, Even stupid ones.

    LSI was to be away of distinguishing good content from Junk.

    With LSI you just can't have one DBF with a lot of tables. Not possible when considering that Sentence structure must be observed.
     
    jsolochek, Jan 22, 2007 IP
  8. andre75

    andre75 Peon

    Messages:
    1,203
    Likes Received:
    45
    Best Answers:
    0
    Trophy Points:
    0
    #8
    WTH is LSI?

    In my mind, Fuzzy Logic is BS.

    The way to go would be a combination of
    Artificial Intelligence: this is what we have right now, the data is supplied by humans and the algorithms are done by humans, the machine cannot learn. Somehow misleading term (not that intelligent)
    and
    Neural Networks / Machine Learning: The network adapts itself to a new problem and can learn. A human tells the Machine what are good sites and what are bad sites (learning process) and the network can identify those by itself. A feedback function (punishment and reward) keeps it updated over time and will improve its performance.

    Fuzzy logic is nothing but shades of gray where you previously only had black and white.

    All of these approaches require a human input (to tell what is a good result and what not). Otherwise the machine will over time learn what it likes best, but we want it to do what we like best.
    So human input is always the key until you can have machines that truly understand what they read.
     
    andre75, Jan 22, 2007 IP
  9. edr

    edr Guest

    Messages:
    784
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Interesting conversation.

    The bottom line really always seems to remain the same: produce original, quality content and the engines will reward you.
     
    edr, Jan 22, 2007 IP
  10. jsolochek

    jsolochek Peon

    Messages:
    17
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    True LSI and/or Fuzzy logic, in our current state of technology, still is going to need human intervention.
     
    jsolochek, Jan 22, 2007 IP
  11. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Interesting stuff… ok.. for me the LSI, regardless of if it’s in use or not (outside of the original intentions = AdWords), it really isn’t up to the task. It’s more of a relevance filter/scoring system if anything.

    All the stuff around Phrase Based Indexing and Retrieval has my attention. Even the recent ‘Similarity Engine’ patent had a great deal of influence from this…

    Anyways, at the bottom of that article is a TON of links to the various Phrase Based optimization stuff… worth thinking about in CONJUNCTION with the LSI technologies…

    I see a mixture/layering of the 2 approaches IMHO
     
    thegypsy, Jan 22, 2007 IP
  12. Jack Squat

    Jack Squat Peon

    Messages:
    63
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #12
    i would just like to search for a one or two word search in google and not be greeted with a wikipedia result in the top 5. I even tried "adult" keywords and was served wiki pages. :mad:
    try it, i bet that wiki is in 95% of all one word searches
     
    Jack Squat, Jan 22, 2007 IP