Can anybody tell me what factors determine the frequency of search engine bots crawling a site? I'm talking about a plain html site with no dynamic content that is updated perhaps once a week. Can changes to meta tags or robots.txt make a difference? T
Meta-tags & Robots.txt have no additional influence on how many times googlebot will crawl. Factors that dertermine googlebot crawling would be links as thats what it uses to get to your website. Change of content often would make it crawl more often I believe, however links is the main and biggest factor.
past a certain level of incoming links, say pr4 or so, the frequency of on-page changes becomes the most important factors I've noticed in how often a page is crawled. another thing to think about with links, is how often the pages linking to you are updated (and crawled).
The more links, the more ways for the bots to find you. Also if you do alot of updates they tend to come around more. The only time I would recommend using a high ranking pay link is if you want to be crawled quickly. If you buy a pr7 link you will get crawled right away and more frequently.
I've also heard rumors that a really high pr, on-topic link will get you out of the sandbox more quickly but I haven't seen aything to back that up.
its a computer program and all programs have bugs, there has to be an "exploitable" bug in the sandbox code.
hehe, will do. although i bet if i found a real way to avoid the sandbox i could make myself a rich man.
Everyone forgot one key ingredient: Page Freshness. - Robots.txt: is important, as if you have it set to have no crawlers to your site...then they won't crawl you pages. - Links: This is semi-important. You can have 1 PR1 link to your site and have the crawlers come back daily. - Page Freshness: Adding new content daily to your pages is really what is going to keep the crawlers coming back. There are several ways in doing this, like adding new articles, article rotation, RSS feed rotation, etc... I have several sites that I just get one link to, set up a RSS feed via Carp, and then the crawlers come to the site daily...every day. True, you don't need to have rotating content, just get some massive incoming links to your site, like news related or high profile sites and the crawlers will come quite often too, but the latter is easy. I've done both methods.