1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

hoopla! googlebot running crazy

Discussion in 'Google' started by kuhleen, Oct 29, 2004.

  1. #1
    On an average website of mine, without anything special, no new links, with about 200 visitors per day, googlebot started a real war: It has read today over 70k pages; whereas normally it rarelly reads over 500 pages per day.
    It runs in sequences of 10-20 simultaneous requests (see other thread about this behaviour) and then stops for minutes. It follows a special pages-crawling pattern, reading pages that are so far as I know, unique in the industry.

    Besides the usual "you're lucky/this is good news" answers, I wonder if you experienced such hypes and what can justify such an abnormal behaviour. Would somebody speculate and say that somehow Google has found a way to spider "unique content only"? Or is there something really wrong with it; I'd appreciate an answer

    Oh yes, and the homepage PR of that website is 2, if this matters anymore.
     
    kuhleen, Oct 29, 2004 IP
  2. ian_ok

    ian_ok Peon

    Messages:
    551
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Congrats...wish it would do more than 1 - 3 pages of my site

    Ian
     
    ian_ok, Oct 29, 2004 IP
  3. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #3
    You might want to double check the IP address(es) being used by the spider. I've never seen a real Googlebot suck down more than 1 page per second. They are pretty good about throttling the spider back to not kill people's servers.
     
    digitalpoint, Oct 29, 2004 IP
  4. dejaone

    dejaone Well-Known Member

    Messages:
    992
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    143
    #4
    there're other bots look like googlebot. They claim they're googlebog2.1 compatible. but they're not real googlebot.
     
    dejaone, Oct 29, 2004 IP
  5. kuhleen

    kuhleen Peon

    Messages:
    21
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    actually this is googlebot:

    http://www.whois.sc/66.249.66.205

    and they run in batches of up to 20 queries per second. could be something wrong with one machine there.
    I emailed them about this, im sure they'll never read it

    ----- added:

    concluding the day, here's my spiders report for today:

    173832 Googlebot
    754
    358 msnbot
    25 ia_archiver
    11 Yahoo! Slurp
    4 DigExt
    1 NaverBot

    Googlebot visited the website almost a thousands times more than usual. Let me know if you see this happening somewhere else
     
    kuhleen, Oct 29, 2004 IP
  6. kuhleen

    kuhleen Peon

    Messages:
    21
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    kuhleen, Nov 2, 2004 IP
  7. darqSHADOW

    darqSHADOW Peon

    Messages:
    58
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    My site was indexed for 11k pages yesterday as well, which is slightly higher than I am used to. (Last month was 130k for the entire month.) I will be watching it tonite to see if the spider revisits, since last month it ate 3GB of bandwidth.

    DS
     
    darqSHADOW, Nov 2, 2004 IP
  8. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #8
    I have a Googlebot going crazy now too... different IP... 66.249.65.112
     
    digitalpoint, Nov 3, 2004 IP
  9. jontelofot

    jontelofot Peon

    Messages:
    53
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    The last 24 hours one of my sites recieved 102000 pageloads from the googlebot shattering the record from two days ago which was 50K.
    Before this week I have never seen more than 30K in a single day averageing about 10K the last two months.
     
    jontelofot, Nov 4, 2004 IP
  10. disgust

    disgust Guest

    Messages:
    2,417
    Likes Received:
    133
    Best Answers:
    0
    Trophy Points:
    0
    #10
    it's all over webmasterworld too.. some people are actually banning google for the time being.

    I've had an increase (it's over a gig a day at the moment), but I haven't had to do anything that drastic yet
     
    disgust, Nov 4, 2004 IP
  11. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #11
    I looked a little closer, and it's just the new version of Googlebot that is doing it. The one that supports HTTP 1.1. As noted in this thread, it spiders differently than the old one (instead of lots of different IPs at once, it spiders with a single IP address in a more constant manner). But I think it's only recent that they cranked up the speed on it.
     
    digitalpoint, Nov 4, 2004 IP
  12. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #12
    Perhaps it's like the new employee who doesn't trust that his predecessor did the job correctly and is re-checking all his old work...
     
    minstrel, Nov 4, 2004 IP
  13. xml

    xml Peon

    Messages:
    254
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #13
    This is happening to a site of mine too. Googlebot 2.1 HTTP 1.1 version is reading a large site of mine in explosive pulses, then pauses for 20 seconds and repeats.

    Fortunetly its using the HTTP protocol version 1.1 with GZIP compression enabled so bandwidth use isn't too extreme.

    I can imagine a lot of database driven websites will cripple under this onslaught.

    But if any of these 1000s of pages get into the index i'm happy :D.
     
    xml, Nov 4, 2004 IP
  14. WilliamC

    WilliamC Well-Known Member

    Messages:
    252
    Likes Received:
    27
    Best Answers:
    0
    Trophy Points:
    118
    #14
    Some have been saying that google appears to be rebuilding their index from the ground up. This may be part of that process if true. I personally don't see that happening, but nowadays who knows.
     
    WilliamC, Nov 4, 2004 IP
  15. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #15
    I doubt it as well... It's just the new bot (different spidering pattern as well as supporting zlib compression via HTTP/1.1).
     
    digitalpoint, Nov 4, 2004 IP
  16. disgust

    disgust Guest

    Messages:
    2,417
    Likes Received:
    133
    Best Answers:
    0
    Trophy Points:
    0
    #16
    I doubt they have to rebuild their index. some people are saying that they're trying to start crawling deeper and faster because yahoo and msn (and even some others) will be competiting even more soon... I'd buy that, it seems at least moderately likely.
     
    disgust, Nov 4, 2004 IP
  17. hulkster

    hulkster Peon

    Messages:
    1,705
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Perhaps related is I have a googlebot that is getting "stuck" and keeps revisiting a URL (with a parameter) that it doesn't exist - the IP address varies, but does reverse lookup as coming from googlebot.com, so I wonder if something "burped" a little bit in their code. I usually see a few of these (when files move, etc.), but this has been going on for a few days now - I try to keep my web error logs fairly clean, so it jumps right out.
     
    hulkster, Nov 4, 2004 IP
  18. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #18
    Hmmm... I've seen Slurp do that, especially when Yahoo first started spidering after it dumped Google... but never Googlebot, personally...
     
    minstrel, Nov 4, 2004 IP
  19. Tid

    Tid Peon

    Messages:
    51
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
  20. longcall911

    longcall911 Peon

    Messages:
    1,672
    Likes Received:
    87
    Best Answers:
    0
    Trophy Points:
    0
    #20
    This seems to be the new bot's behavior. Most Google forums have reported the same extended crawls since the new bot was released. Of course, none of us knows the true reasons why. My theory is that there are some new page attributes that will ultimately play a role in overall ranking, and that there is insufficient existing data on those attributes.

    I would not categorize it as a ‘whole new’ index. Rather, if true, I suspect it could be categorized as an enhancement to the current index. The reason I feel this theory is a good candidate is that it is obvious that the whole link popularity thing is out of control with every website trying to secure 1000s of links, link farms, link managers, link lists of link swaps. . . undermining the popularity concept.

    Lots of folks are saying that as a result, Google is now focusing on related (themed) links. But, it seems to me that they can’t go too much farther with links, beyond checking relevancy. So, where do they go to find better ways to rank? If it’s not off-page attributes, it seems only logical that they would look at on-page once again.

    If they did that, and if they found some new page attributes that were measurable and valuable, they would need to re-crawl all of the pages in the index and gather stats for these new attributes.

    Just a theory. . . but as the thread author requested, it is a possible explanation.
     
    longcall911, Nov 5, 2004 IP