In the last two weeks I'm having a very hard time with the Googlebot. Once a couple of hours, the bot from 64.233.178.136 starts accesing one single URL in one of my sites with anything from 60 to 150 threads at once. It literally brings down my server for 5 to 10 minuntes. I've tried denying the access to that single URL both via robots.txt and htaccess, redirecting it, nothing works. Today I'll be inserting its IP into the firewall and not even let it reach the sites any more. Does anybody know how will this impact the overall Google indexing process? Will this have a chain reaction on the other Googlebots as well? Any help or suggestion will be greatly appreciated. TIA
I assume you are speaking of one specific url, and that google keeps coming back to that one url... Interesting. I don't suppose you would post it? Unlinked, perhaps?
It looks like it isn't google bot at all: http://www.ripe.net/fcgi-bin/whois?form_type=simple&full_query_string=&searchtext=64.233.178.136
Here's the last few steps of the tracert too: 7 10 ms 9 ms 10 ms 213.242.106.37 8 20 ms 21 ms 21 ms so-4-1-0.bbr1.London1.Level3.net [4.68.128.113] 9 91 ms 91 ms 92 ms as-3-0.bbr1.Washington1.Level3.net [64.159.3.254] 10 203 ms 214 ms 204 ms ae-22-56.car2.Washington1.Level3.net [4.68.121.179] 11 93 ms 92 ms 93 ms 4.79.228.26 12 93 ms 92 ms 91 ms 66.249.95.123 13 107 ms 105 ms 105 ms 66.249.95.149 14 105 ms 105 ms 106 ms 72.14.238.153 15 108 ms 108 ms 109 ms 72.14.238.178 16 106 ms 105 ms 106 ms 64.233.178.136
No, it is a google IP. The reason RIPE doesn't show anything is because RIPE handles european IPs, not American IPs, and your traceroute looks weird because it's hopping, apparently, from London to Washington, from ARIN : ---------------- Is this a phpbb forum? Is it getting a url like this every time it hits : www.url.com/forum/index.php?SID=<stuff here> ? The SID is always unique, and a new one is generated everytime Googlebot hits the page, and the SID is also in all the URLs of the page. This will cause google bot to think it's unique enough to hit over and over. I have had gbot hit a forum index thousands of times in a row. There's a patch for phpbb for it, if this is the case. Also, it may take googlebot a while to re-read robots.txt, it should stop crawling it after that, but it may take a few days, I would search to logs for robots.txt and see if it's got it yet. If gbot can bring down your server, you may want to set crawl-delay in robots.txt as well. The other day, google bot hit a page on average every 0.8 seconds in a 24 hour period, something like 109k pages, so it can be quick, but it also obeys crawl-delay from what I understand, but crawling slower may mean it's not hitting pages as fast as it normally would, up to you, and not indexing them as fast. Hope that's some decent info for you.
Thanks for the replies. The specific URL is http://www.linux360.ro/forum/timesharing-gt-schedfifo-vt9309.html. In the last week alone I had 5444 hits from the bot on that URL, according to Webalizer (it's even an old thread, nothing that could be so popular among the users). It's not a PHP session ID issue, I checked the raw Apache access logs on the server and it accesses the URL exactly as it is, no parameters or appendices. So it's just something about that URL or the content on that specific page that drives it crazy.
I have created a new website. sphereinfo.com Could anybody tell me, How much time google take to crawl a new website. Googlebot is coming continuously but shows no pages. Please also give me some tips for its promotion. How I will get visitors & pagerank as soon as possible. Thanks in advance
You will get the best response to this if you post in the websites appraisals section. (I posted a more helpful response in your other thread )
@Bliss - What about serps? Are you getting any clickthru to this page from Google.com? Are there some specific search phrases that this page ranks well for, outside of the code sections? The post is from October, I see no cache, links, or indexing of the page as you have posted it.