hoopla! googlebot running crazy

Discussion in 'Google' started by kuhleen, Oct 29, 2004.

Page 1 of 2

kuhleen Peon

Messages:

21

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#1

On an average website of mine, without anything special, no new links, with about 200 visitors per day, googlebot started a real war: It has read today over 70k pages; whereas normally it rarelly reads over 500 pages per day.
It runs in sequences of 10-20 simultaneous requests (see other thread about this behaviour) and then stops for minutes. It follows a special pages-crawling pattern, reading pages that are so far as I know, unique in the industry.

Besides the usual "you're lucky/this is good news" answers, I wonder if you experienced such hypes and what can justify such an abnormal behaviour. Would somebody speculate and say that somehow Google has found a way to spider "unique content only"? Or is there something really wrong with it; I'd appreciate an answer

Oh yes, and the homepage PR of that website is 2, if this matters anymore.

kuhleen, Oct 29, 2004 IP
ian_ok Peon

Messages:

551

Likes Received:

11

Best Answers:

0

Trophy Points:

0

#2

Congrats...wish it would do more than 1 - 3 pages of my site

Ian

ian_ok, Oct 29, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#3

You might want to double check the IP address(es) being used by the spider. I've never seen a real Googlebot suck down more than 1 page per second. They are pretty good about throttling the spider back to not kill people's servers.

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Oct 29, 2004 IP
dejaone Well-Known Member

Messages:

992

Likes Received:

30

Best Answers:

0

Trophy Points:

143

#4

there're other bots look like googlebot. They claim they're googlebog2.1 compatible. but they're not real googlebot.

dejaone, Oct 29, 2004 IP
kuhleen Peon

Messages:

21

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#5

actually this is googlebot:

http://www.whois.sc/66.249.66.205

and they run in batches of up to 20 queries per second. could be something wrong with one machine there.
I emailed them about this, im sure they'll never read it

----- added:

concluding the day, here's my spiders report for today:

173832 Googlebot
754
358 msnbot
25 ia_archiver
11 Yahoo! Slurp
4 DigExt
1 NaverBot

Googlebot visited the website almost a thousands times more than usual. Let me know if you see this happening somewhere else

kuhleen, Oct 29, 2004 IP
kuhleen Peon

Messages:

21

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#6

this thread continues here: http://www.webmasterworld.com/forum3/26461.htm

kuhleen, Nov 2, 2004 IP
darqSHADOW Peon

Messages:

58

Likes Received:

1

Best Answers:

0

Trophy Points:

0

#7

My site was indexed for 11k pages yesterday as well, which is slightly higher than I am used to. (Last month was 130k for the entire month.) I will be watching it tonite to see if the spider revisits, since last month it ate 3GB of bandwidth.

DS

darqSHADOW, Nov 2, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#8

I have a Googlebot going crazy now too... different IP... 66.249.65.112

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Nov 3, 2004 IP
jontelofot Peon

Messages:

53

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#9

The last 24 hours one of my sites recieved 102000 pageloads from the googlebot shattering the record from two days ago which was 50K.
Before this week I have never seen more than 30K in a single day averageing about 10K the last two months.

jontelofot, Nov 4, 2004 IP
disgust Guest

Messages:

2,417

Likes Received:

133

Best Answers:

0

Trophy Points:

0

#10

it's all over webmasterworld too.. some people are actually banning google for the time being.

I've had an increase (it's over a gig a day at the moment), but I haven't had to do anything that drastic yet

disgust, Nov 4, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#11

I looked a little closer, and it's just the new version of Googlebot that is doing it. The one that supports HTTP 1.1. As noted in this thread, it spiders differently than the old one (instead of lots of different IPs at once, it spiders with a single IP address in a more constant manner). But I think it's only recent that they cranked up the speed on it.

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Nov 4, 2004 IP
minstrel Illustrious Member

Messages:

15,082

Likes Received:

1,243

Best Answers:

0

Trophy Points:

480

#12

Perhaps it's like the new employee who doesn't trust that his predecessor did the job correctly and is re-checking all his old work...

minstrel, Nov 4, 2004 IP
xml Peon

Messages:

254

Likes Received:

2

Best Answers:

0

Trophy Points:

0

#13

This is happening to a site of mine too. Googlebot 2.1 HTTP 1.1 version is reading a large site of mine in explosive pulses, then pauses for 20 seconds and repeats.

Fortunetly its using the HTTP protocol version 1.1 with GZIP compression enabled so bandwidth use isn't too extreme.

I can imagine a lot of database driven websites will cripple under this onslaught.

But if any of these 1000s of pages get into the index i'm happy .

xml, Nov 4, 2004 IP
WilliamC Well-Known Member

Messages:

252

Likes Received:

27

Best Answers:

0

Trophy Points:

118

#14

Some have been saying that google appears to be rebuilding their index from the ground up. This may be part of that process if true. I personally don't see that happening, but nowadays who knows.

WilliamC, Nov 4, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#15

I doubt it as well... It's just the new bot (different spidering pattern as well as supporting zlib compression via HTTP/1.1).

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Nov 4, 2004 IP
disgust Guest

Messages:

2,417

Likes Received:

133

Best Answers:

0

Trophy Points:

0

#16

I doubt they have to rebuild their index. some people are saying that they're trying to start crawling deeper and faster because yahoo and msn (and even some others) will be competiting even more soon... I'd buy that, it seems at least moderately likely.

disgust, Nov 4, 2004 IP
hulkster Peon

Messages:

1,705

Likes Received:

93

Best Answers:

0

Trophy Points:

0

#17

Perhaps related is I have a googlebot that is getting "stuck" and keeps revisiting a URL (with a parameter) that it doesn't exist - the IP address varies, but does reverse lookup as coming from googlebot.com, so I wonder if something "burped" a little bit in their code. I usually see a few of these (when files move, etc.), but this has been going on for a few days now - I try to keep my web error logs fairly clean, so it jumps right out.

hulkster, Nov 4, 2004 IP
minstrel Illustrious Member

Messages:

15,082

Likes Received:

1,243

Best Answers:

0

Trophy Points:

480

#18

Hmmm... I've seen Slurp do that, especially when Yahoo first started spidering after it dumped Google... but never Googlebot, personally...

minstrel, Nov 4, 2004 IP
Tid Peon

Messages:

51

Likes Received:

1

Best Answers:

0

Trophy Points:

0

#19

Google Bot! We love you

----------
Quality Translations,
Medical Translation, Link Exchange Translation, Teachers in London Swap Links

Tid, Nov 5, 2004 IP
longcall911 Peon

Messages:

1,672

Likes Received:

87

Best Answers:

0

Trophy Points:

0

#20

digitalpoint said:

I doubt it as well... It's just the new bot (different spidering pattern as well as supporting zlib compression via HTTP/1.1).
Click to expand...

This seems to be the new bot's behavior. Most Google forums have reported the same extended crawls since the new bot was released. Of course, none of us knows the true reasons why. My theory is that there are some new page attributes that will ultimately play a role in overall ranking, and that there is insufficient existing data on those attributes.

I would not categorize it as a â€˜whole newâ€™ index. Rather, if true, I suspect it could be categorized as an enhancement to the current index. The reason I feel this theory is a good candidate is that it is obvious that the whole link popularity thing is out of control with every website trying to secure 1000s of links, link farms, link managers, link lists of link swaps. . . undermining the popularity concept.

Lots of folks are saying that as a result, Google is now focusing on related (themed) links. But, it seems to me that they canâ€™t go too much farther with links, beyond checking relevancy. So, where do they go to find better ways to rank? If itâ€™s not off-page attributes, it seems only logical that they would look at on-page once again.

If they did that, and if they found some new page attributes that were measurable and valuable, they would need to re-crawl all of the pages in the index and gather stats for these new attributes.

Just a theory. . . but as the thread author requested, it is a possible explanation.

longcall911, Nov 5, 2004 IP