Googlebot gone haywire

Discussion in 'Site & Server Administration' started by Bliss, Feb 3, 2006.

  1. #1
    In the last two weeks I'm having a very hard time with the Googlebot. Once a couple of hours, the bot from 64.233.178.136 starts accesing one single URL in one of my sites with anything from 60 to 150 threads at once. It literally brings down my server for 5 to 10 minuntes. I've tried denying the access to that single URL both via robots.txt and htaccess, redirecting it, nothing works. Today I'll be inserting its IP into the firewall and not even let it reach the sites any more. Does anybody know how will this impact the overall Google indexing process? Will this have a chain reaction on the other Googlebots as well?
    Any help or suggestion will be greatly appreciated. TIA
     
    Bliss, Feb 3, 2006 IP
  2. rehash

    rehash Well-Known Member

    Messages:
    1,502
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    150
    #2
    google never visited me from that ip
    you have 2 choices:
    1) firewall
    2) contact google
     
    rehash, Feb 3, 2006 IP
  3. NetMidWest

    NetMidWest Peon

    Messages:
    1,677
    Likes Received:
    151
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I assume you are speaking of one specific url, and that google keeps coming back to that one url...

    Interesting. I don't suppose you would post it? Unlinked, perhaps?
     
    NetMidWest, Feb 3, 2006 IP
  4. Interlogic

    Interlogic Peon

    Messages:
    451
    Likes Received:
    67
    Best Answers:
    0
    Trophy Points:
    0
  5. Interlogic

    Interlogic Peon

    Messages:
    451
    Likes Received:
    67
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Here's the last few steps of the tracert too:

    7 10 ms 9 ms 10 ms 213.242.106.37
    8 20 ms 21 ms 21 ms so-4-1-0.bbr1.London1.Level3.net [4.68.128.113]
    9 91 ms 91 ms 92 ms as-3-0.bbr1.Washington1.Level3.net [64.159.3.254]
    10 203 ms 214 ms 204 ms ae-22-56.car2.Washington1.Level3.net [4.68.121.179]
    11 93 ms 92 ms 93 ms 4.79.228.26
    12 93 ms 92 ms 91 ms 66.249.95.123
    13 107 ms 105 ms 105 ms 66.249.95.149
    14 105 ms 105 ms 106 ms 72.14.238.153
    15 108 ms 108 ms 109 ms 72.14.238.178
    16 106 ms 105 ms 106 ms 64.233.178.136
     
    Interlogic, Feb 3, 2006 IP
  6. nddb

    nddb Peon

    Messages:
    803
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #6
    No, it is a google IP. The reason RIPE doesn't show anything is because RIPE handles european IPs, not American IPs, and your traceroute looks weird because it's hopping, apparently, from London to Washington, from ARIN :
    ----------------

    Is this a phpbb forum? Is it getting a url like this every time it hits : www.url.com/forum/index.php?SID=<stuff here> ?

    The SID is always unique, and a new one is generated everytime Googlebot hits the page, and the SID is also in all the URLs of the page. This will cause google bot to think it's unique enough to hit over and over. I have had gbot hit a forum index thousands of times in a row. There's a patch for phpbb for it, if this is the case.

    Also, it may take googlebot a while to re-read robots.txt, it should stop crawling it after that, but it may take a few days, I would search to logs for robots.txt and see if it's got it yet. If gbot can bring down your server, you may want to set crawl-delay in robots.txt as well. The other day, google bot hit a page on average every 0.8 seconds in a 24 hour period, something like 109k pages, so it can be quick, but it also obeys crawl-delay from what I understand, but crawling slower may mean it's not hitting pages as fast as it normally would, up to you, and not indexing them as fast.

    Hope that's some decent info for you.
     
    nddb, Feb 3, 2006 IP
  7. Interlogic

    Interlogic Peon

    Messages:
    451
    Likes Received:
    67
    Best Answers:
    0
    Trophy Points:
    0
    #7
    D'oh! Usually Ripe gives me the google details.. Whoops!
     
    Interlogic, Feb 3, 2006 IP
  8. nddb

    nddb Peon

    Messages:
    803
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Yea, RIPE should refer you to ARIN at least. Just like ARIN refers you to RIPE or APNIC, very odd!!
     
    nddb, Feb 3, 2006 IP
  9. Bliss

    Bliss Peon

    Messages:
    107
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Thanks for the replies. The specific URL is http://www.linux360.ro/forum/timesharing-gt-schedfifo-vt9309.html. In the last week alone I had 5444 hits from the bot on that URL, according to Webalizer (it's even an old thread, nothing that could be so popular among the users).
    It's not a PHP session ID issue, I checked the raw Apache access logs on the server and it accesses the URL exactly as it is, no parameters or appendices. So it's just something about that URL or the content on that specific page that drives it crazy.
     
    Bliss, Feb 3, 2006 IP
  10. Ankit

    Ankit Peon

    Messages:
    25
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #10
    :)

    I have created a new website. sphereinfo.com

    Could anybody tell me, How much time google take to crawl a new website. Googlebot is coming continuously but shows no pages.

    Please also give me some tips for its promotion. How I will get visitors & pagerank as soon as possible.

    Thanks in advance

     
    Ankit, Feb 3, 2006 IP
  11. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #11
    You will get the best response to this if you post in the websites appraisals section. (I posted a more helpful response in your other thread :) )
     
    mad4, Feb 3, 2006 IP
  12. NetMidWest

    NetMidWest Peon

    Messages:
    1,677
    Likes Received:
    151
    Best Answers:
    0
    Trophy Points:
    0
    #12
    @Bliss -
    What about serps? Are you getting any clickthru to this page from Google.com? Are there some specific search phrases that this page ranks well for, outside of the code sections?
    The post is from October, I see no cache, links, or indexing of the page as you have posted it.
     
    NetMidWest, Feb 3, 2006 IP