To be more specific I have heard of folks engineering and reverse engineering to get at least a shot in the dark on what Google's algo is to build better sites that rank. Anyone have good suggestions, tools, or a few tablespoons of help in this area?
Make two sites on nearly identical domains: www.QFKJSJLQIOSSDK2R1.com www.QFKJSJLQIOSSDK2R2.com Do X to one of them, nothing to the other one. Have everything else be the same. Get a link to both of them from the same page, and get them indexed. See which ranks as #1 for QFKJSJLQIOSSDK2R
Now this is definetly a good start but what can you monitor to get a deeper look. As in logging different aspects of what the spiders picked up, etc... In my head I want to say that just from looking as serach results sometimes the spiders pick up text from different areas of a page for certain keywords. Highly competitve keywords just have the title tag data for the site and some keywords have a snippet of text the spider picked up with the keywords in it in the search results. Any thoughts on that?
There are many factors in goolge's algorithm, like meta tags, backlinks, content, domain, etc. You just keep one of this factor different, and others the same, and then wait for the ranking. Wow, this takes a lot time. Anyway, google's algorithm is not so easy to research, or others can copy another google.
No one knows the algo of google. That's why they change their algo frequently so that no one can know. Just optimized your site well and get more backlinks
We've created a genetic algorithm that can calculate how much a factor weights in google's ranking algo. We're still creating tools for gathering all learning data it needs. However, even if you were able to find this numbers it wont make it easier to rank for competitive terms as there are certain factors like aging links, domain age that can't be easily manipulated. (WH style)
Definitely makes sense. But there is always a way. What's this mystical tool you speak of Seo Wannabe? It's sounds like your on the right track. I was talking to some computer science guys who aren't in the SEO game but they were talking about creating a mimic of the GoogleBot and researching what the GoogleBot does via server logs, etc... Once you see the pattern then you can "re-create" the process the bot takes to index, rank you, etc... and go from there. I know it sounds like I am answering myself but in reality I still don't know jack about computer science. Anyone else care to chime in?
I'll worship whoever gets to know google's algo... Kidding aside, I think all it comes down to with Google's algo is as subzero said --- quality backlinks!
Anyone interested in a ZIP with a TON of Google search Patents dating back to 2003? ( I have earlier.. but don't trust them much anymore) It is a series of layers. Each new addition to the architechture/algos - further refine what was in place before. The best U can so is understand the mind set to the search engineer through studying technical docs and ..go figure.. search engineering and document handling ( indexing and retrieval) theory.... Me ol M8 Matt - is the worlds most famous librarian
I don't really see how researching how Googlebot spiders the site can give you an idea about the ranking algo. Basically the way we do it is take a 100 keyword set and make a google snapshot for the first xxx results for each keyword. Then you have, lets say, 100 factors involved in google's ranking method. The hard part is to collect data for this 100 factors. Example: all backlinks for each of the domains included in your snapshot Most factors can be fetched through google,yahoo and msn, plus some other web services. Once you have all this data is just a matter of hours till you can have a good view about the importance of each factor in rankings. This is not going to be 100% accurate tho but its good enough to get you in top 5 if you can manipulate all factors.
I most definitely like what your saying but I still think the spider is important because when you think about it that is the tool they are using to grab all of these "factors". There has to be a pattern of some sort as sites that stay atop good keywords get spidered almost every hour, whereas the ones that don't get spidered every other month or so. Using your method have you seen anything that would be of use other than SEO basics?
I'm curious how you can deduct a ranking algo just from spidering patterns. Your spidering logs will tell you just what pages googlebot crawls nothing about keywords, keyword placement, backlings, age. The fact that top sites are spidered more often is an indirect result of other factors. They're not spidered because they are in top nor they're in top because they're spidered often.
Please note : Google has a dynamic algo. They use a mixture of different algo sets. Its so difficult to duplicate or research apart from what they generally tell you. They study what webmasters are doing to artificially create page rankings and develop an algo to neutralize it to protect searchers' and advertisers' interest which benefits everyone as webmasters will only get money if people will access Google and if advertisers place ads. It has been rumored that they are even taking the duration of a domain name into account as spammers like buying one year expiry domains.If this is true, it doesnt mean any one who buys a one year domain will be penalized. Know that the also is always a ranking system with grades and points scored for each factor such as links, content, interlinking of pages etc Thay are also looking into bulding semantics into their content spidering system so that sentences that dont make sense will be looked at critically. All these are rumors.
dOOd .. that's SOOOOOO 2004 - Aging filters ( link maturity and age/authority maturity) - Not really a 'new' thing. This started (likely) in 2003 - Look into Latent Semantic Idexing. You may now want to look into 'Phrase Based Indexing and Retrieval' Or read this article on Phrase Based Optmization