I anlayzed a few months of logs with a filter to include only Googlebot identified as Googlebot/2. All other visitors were excluded. This is a clean HTML site with no javascript (other than urchin), one css file, and no 301s. These are some basic observations: Gbot hits the domain root / every day one to four times Gbot hits about 10% of the inside pages everyday On a 10-day cycle, Gbot hits about half of the inside pages (semi-deep crawl) The inside page the Gbot hits the most gets no referrals from Google search Most paths are one file only Gbot comes at all times of day evenly spread over all times Gbot comes more on Sundays that the other days, but not a lot more Gbot never reads image files Gbot reads robots.txt at least once per day, but not every visit Gbot reads pdf, dmg, and reg files, but not css or exe files Gbot does not use a referrer HTTP errors are limited to 404 Gbot keeps looking for a file called googlesyndication.com in several directories No magic revelations, but maybe someone can put this together with some other observation to learn something.
Well collected data. Would be good to see someone take this further and maybe create a datasite detailing the bots and their activites where people can submit their own data to as well.
did you see this post: http://forums.digitalpoint.com/showthread.php?t=68988 where the writer says that Gbot is now crawling his css fiels, and where he made <h> tags smaller in the css it was noticed by Gbot and penalized? I wonder if that is really true...?
In what way should I understand that? Is it really looking for a file that is called "googlesyndication.com"? Or is it looking for files from googlesyndication.com that are included by JavaScript (meaning it's looking for AdSense)?
I find out something else regarding the gbot 2. There are times that it grabs the same page (the observation was on my home page) several times in a time frame of about 5 minutes. I think that this is something like a 'update frequency' test. The time this happened there were many members in the site (the site is a forum) and they all posting like crazy. So propably the bot got this as a site that updates very frequently. From the next day gbot started to visiting all the site, grabbed the sitemap, and in general is very active in my site. I think that this is no coincidence.