how search engines work

Discussion in 'All Other Search Engines' started by dcole07, Jul 25, 2006.

  1. #1
    How does a search engine take the search term and find it in it's DB?

    Small search engine's do it by looking in the whole DB for the sets of words and which ever line matches the closest wins, but for google to search billions of URLS with hunders of words wouldn't be the best way! So how do the big ones do it?

    small search engines DB looks like:
    URL --- all the text...
    URL --- all the text...
    URL --- all the text...
    URL --- all the text...
    ...
    then find the line with the greatest match. Google would have to go though something like 400 GB with that meathod each time you searched! That would take a long time... so it can't be done like that!
     
    dcole07, Jul 25, 2006 IP
  2. websiteideas

    websiteideas Well-Known Member

    Messages:
    1,406
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    130
    #2
    Half of the data is thrown out after each query because the data is organized by value. Look up divide and conquer algorithm on Google for more information.
     
    websiteideas, Jul 25, 2006 IP
  3. wheel

    wheel Peon

    Messages:
    477
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I don't think that's quite how it's done. SE's have to 'build the index.' which boils down to starting with what you suggested (URL->text), then building a reverse index that matches text to url (instead of url to text as you've suggested). So if you search on 'red widgets', it simply looks up 'red widgets' in it's index and is given a list of URL's that contain that term, because that list is already built.

    This can then be distributed, so the query is sent to a bunch of different computers that each handle a set of URL's. Each computer then simultaneously returns all the URLS's that it knows of that contain the term 'red widgets'. Then just sort all of the URL's by score and you're done. As easy and as difficult as that.
     
    wheel, Jul 26, 2006 IP
  4. dcole07

    dcole07 Peon

    Messages:
    135
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Well I'm making a php search engine script and my search engine has a reverse index (not a normal index... URL --> text)

    Now I'm on the last step where I get the search term and find the data in the DB. I'm having trouble coming up with a script that will understand (), OR, AND...

    so I just wanted to ask about other search engine while I though on it abit.

    Does anyone know how to open part of a file with php? Like open the first 10 lines or 100 bytes. (I don't want to open the file and get the first 10 lines... I want to open part of the file!
     
    dcole07, Jul 26, 2006 IP