How does a search engine take the search term and find it in it's DB? Small search engine's do it by looking in the whole DB for the sets of words and which ever line matches the closest wins, but for google to search billions of URLS with hunders of words wouldn't be the best way! So how do the big ones do it? small search engines DB looks like: URL --- all the text... URL --- all the text... URL --- all the text... URL --- all the text... ... then find the line with the greatest match. Google would have to go though something like 400 GB with that meathod each time you searched! That would take a long time... so it can't be done like that!
Half of the data is thrown out after each query because the data is organized by value. Look up divide and conquer algorithm on Google for more information.
I don't think that's quite how it's done. SE's have to 'build the index.' which boils down to starting with what you suggested (URL->text), then building a reverse index that matches text to url (instead of url to text as you've suggested). So if you search on 'red widgets', it simply looks up 'red widgets' in it's index and is given a list of URL's that contain that term, because that list is already built. This can then be distributed, so the query is sent to a bunch of different computers that each handle a set of URL's. Each computer then simultaneously returns all the URLS's that it knows of that contain the term 'red widgets'. Then just sort all of the URL's by score and you're done. As easy and as difficult as that.
Well I'm making a php search engine script and my search engine has a reverse index (not a normal index... URL --> text) Now I'm on the last step where I get the search term and find the data in the DB. I'm having trouble coming up with a script that will understand (), OR, AND... so I just wanted to ask about other search engine while I though on it abit. Does anyone know how to open part of a file with php? Like open the first 10 lines or 100 bytes. (I don't want to open the file and get the first 10 lines... I want to open part of the file!