Or more generically, any number of different data mining or machine learning algorithms. But you are correct - gathering the data is the most painful part.
Manish It seems we know a lot more than you might think - remember, this is all programmed by humans - we are actually discussing replicating what they have done. Were we now able to have a go at applying a weighting to all the suggestions and able to programme it, and supposing it worked to a certain degree, we would truly have back-engineered Google. And that is where it would become very interesting. Webmasters would be able to fine-tune their websites to achieve optimum rankings.
The answer to what a search engine (page rank) looks for is easy to answer if you've seen the answer. like this: Which of the following 7 do you HAVE to give your POWs! Separate hygiense facilities and sleeping quarters for men and women Luxurious office for the camp commander Room for religous services canteen Infirmary Extra secure isolation cells Guard towers Gallows Basketball court Library Torture chamber Extra barbed wire Gun factory The Geneva Convention --- There are many you would get but some are just things you wouldn't think of... I have my own search engine but I'm revamping it. I was searching the internet for things I could include in the ranking and I ended up here. My old search engine only had simple ranking and page scanning so any page could be on any terms list... Right now I'm finishing my layout plan and starting the final sketch as to how each thing will work. I have 3 methods I have not seen in any open source search engine and in therory they should improve the total size of the index, along with script mis-functions and what not. The methods should allow the index to hold millions of pages. The old search engine of mine could only hold 8-12 thousand but the methods are ALOT different. ideas on any part of a search engine would help. I should be close to done with the script by the end of June.
And we haven't even begun to discuss the other side of the coin .. evaluating the results. How does G (and others) determine how close to the target they are .. what sort of feedback mechanisms are in place .. how do they "know" what is considered relevant results for the sufer?
I remember reading that Google does not like to manually adjust their results, so if they see something they don't like, they work on their search algorithm but I guess this can have wide-ranging effect/ripple effect which sometimes causes blips in results. Even when they see the results for a particular search, how do they know that the best is listed first, etc. What makes a site the best - most Google friendly? Take the search 'cola' - Google show coca-cola, then virgindrinks then pepsi. Is that how you would list them? Who is to decide which is best, surely Pepsi should be ahead of Virgin? Complicated - maybe there is not just one right way of doing things?
back engeneering sounds interesting it's like finding out from what coke is made out but then when you know you'll probably stop drinking lol | |' | | | | | | | | | | | | | V I prefer to drink without knowing