On an average website of mine, without anything special, no new links, with about 200 visitors per day, googlebot started a real war: It has read today over 70k pages; whereas normally it rarelly reads over 500 pages per day. It runs in sequences of 10-20 simultaneous requests (see other thread about this behaviour) and then stops for minutes. It follows a special pages-crawling pattern, reading pages that are so far as I know, unique in the industry. Besides the usual "you're lucky/this is good news" answers, I wonder if you experienced such hypes and what can justify such an abnormal behaviour. Would somebody speculate and say that somehow Google has found a way to spider "unique content only"? Or is there something really wrong with it; I'd appreciate an answer Oh yes, and the homepage PR of that website is 2, if this matters anymore.
You might want to double check the IP address(es) being used by the spider. I've never seen a real Googlebot suck down more than 1 page per second. They are pretty good about throttling the spider back to not kill people's servers.
there're other bots look like googlebot. They claim they're googlebog2.1 compatible. but they're not real googlebot.
actually this is googlebot: http://www.whois.sc/66.249.66.205 and they run in batches of up to 20 queries per second. could be something wrong with one machine there. I emailed them about this, im sure they'll never read it ----- added: concluding the day, here's my spiders report for today: 173832 Googlebot 754 358 msnbot 25 ia_archiver 11 Yahoo! Slurp 4 DigExt 1 NaverBot Googlebot visited the website almost a thousands times more than usual. Let me know if you see this happening somewhere else
My site was indexed for 11k pages yesterday as well, which is slightly higher than I am used to. (Last month was 130k for the entire month.) I will be watching it tonite to see if the spider revisits, since last month it ate 3GB of bandwidth. DS
The last 24 hours one of my sites recieved 102000 pageloads from the googlebot shattering the record from two days ago which was 50K. Before this week I have never seen more than 30K in a single day averageing about 10K the last two months.
it's all over webmasterworld too.. some people are actually banning google for the time being. I've had an increase (it's over a gig a day at the moment), but I haven't had to do anything that drastic yet
I looked a little closer, and it's just the new version of Googlebot that is doing it. The one that supports HTTP 1.1. As noted in this thread, it spiders differently than the old one (instead of lots of different IPs at once, it spiders with a single IP address in a more constant manner). But I think it's only recent that they cranked up the speed on it.
Perhaps it's like the new employee who doesn't trust that his predecessor did the job correctly and is re-checking all his old work...
This is happening to a site of mine too. Googlebot 2.1 HTTP 1.1 version is reading a large site of mine in explosive pulses, then pauses for 20 seconds and repeats. Fortunetly its using the HTTP protocol version 1.1 with GZIP compression enabled so bandwidth use isn't too extreme. I can imagine a lot of database driven websites will cripple under this onslaught. But if any of these 1000s of pages get into the index i'm happy .
Some have been saying that google appears to be rebuilding their index from the ground up. This may be part of that process if true. I personally don't see that happening, but nowadays who knows.
I doubt it as well... It's just the new bot (different spidering pattern as well as supporting zlib compression via HTTP/1.1).
I doubt they have to rebuild their index. some people are saying that they're trying to start crawling deeper and faster because yahoo and msn (and even some others) will be competiting even more soon... I'd buy that, it seems at least moderately likely.
Perhaps related is I have a googlebot that is getting "stuck" and keeps revisiting a URL (with a parameter) that it doesn't exist - the IP address varies, but does reverse lookup as coming from googlebot.com, so I wonder if something "burped" a little bit in their code. I usually see a few of these (when files move, etc.), but this has been going on for a few days now - I try to keep my web error logs fairly clean, so it jumps right out.
Hmmm... I've seen Slurp do that, especially when Yahoo first started spidering after it dumped Google... but never Googlebot, personally...
Google Bot! We love you ---------- Quality Translations, Medical Translation, Link Exchange Translation, Teachers in London Swap Links
This seems to be the new bot's behavior. Most Google forums have reported the same extended crawls since the new bot was released. Of course, none of us knows the true reasons why. My theory is that there are some new page attributes that will ultimately play a role in overall ranking, and that there is insufficient existing data on those attributes. I would not categorize it as a ‘whole new’ index. Rather, if true, I suspect it could be categorized as an enhancement to the current index. The reason I feel this theory is a good candidate is that it is obvious that the whole link popularity thing is out of control with every website trying to secure 1000s of links, link farms, link managers, link lists of link swaps. . . undermining the popularity concept. Lots of folks are saying that as a result, Google is now focusing on related (themed) links. But, it seems to me that they can’t go too much farther with links, beyond checking relevancy. So, where do they go to find better ways to rank? If it’s not off-page attributes, it seems only logical that they would look at on-page once again. If they did that, and if they found some new page attributes that were measurable and valuable, they would need to re-crawl all of the pages in the index and gather stats for these new attributes. Just a theory. . . but as the thread author requested, it is a possible explanation.