I'm happy with the SEO on my main site but the bandwidth the Googlebot is using seems excessive. For December it was approx 45% of my total bandwidth. I don't want to lose SE position as nearly 60% of my traffic comes from the Google search engine but December - Googlebot - Hits: 665816+21 - Bandwidth: 31.66 GB Should I take this as a vote of confidence from the G-man or just ban the Google image bot and slow down the search with webmaster tools? Is this just the cost of being regularly crawled? Cheers BP
Yes, don't limit the Google bot. Hopefully it is a good sign of things to come for your site and non-Googlebot traffic will pick up soon. The only time I wouldn't agree with my own suggestion above is if there are virtually no referrals coming from Google, just the bots. e.g. 99% of your bandwidth is from the Googlebot, 1% is direct traffic e.g. yourself, and there is no traffic coming from Google. Then it's probably time to shut down the site.
It’s normal to have a high Google crawl rate before a PR update, but 31.66GB of date is quite unbelievable. Do you some huge files on your server?
I have a couple of thousand pdfs but they aren't accessible by search engines as they are protected by a download script. The site does have around 20K pages indexed but still... Could this is due to my change over to Drupal in October? Does Google suddenly think I'm updating the pages more often than I am? Cheers BP
Hi, Crawl rate is set to normal, but there does seem to be have been a steady increase in the number of pages crawled since my redesign. There are 20400 pages indexed but I've been careful to exclude registration pages etc in robots.txt. Cheers BP
I'd rather not It's not so much the bandwidth as the load on the server I'm concerned about. All my sites are php based and while I'm signed up for 5 times the bandwidth I need, I can see me needing more RAM and a faster CPU pretty soon. Obviously I don't mind if it's the price you pay for entry into Google Land, I was just shocked at the bandwidth they used on just one of my sites, albeit the most established one. Cheers BP
That's pretty excessive for the amount of pages, i've got sites with 5 times the volume of pages and most are around the 2GB mark so far for this month so 17GB is pretty heavy. Do this query in Google allinurl:www.yoursite.com + .pdf And see if your download script is working correctly, also if your .pdf's are in a specific folder do an exclusion for that directory with Robots.txt
Thanks for the tip. Unfortunately it only came up with the one small pdf that isn't in the protected folder. I already went through the last 3 months download logs looking for rogue bots but Google hasn't been in there. I've set GeositeCrawler on my site to see what URLs I can afford to add to robots.txt (downloads is already in there). GoogleBot took another 2GB last night! Cheers BP
Yeah somethings not right, for 17GB (2GB in 24 hours) and only 20k pages is very excessive. Ok i just logged in and had a look at a few of my larger sites, and: 190,000 page site - 3.4GB with 326,529 Hits 500,000 page site - 8.2GB with 844,287 Hits So.. As comparison, yours is consuming much more bandwidth compared to the ratio of hits 665,816 yet 31GB As you can see my 500k page site gets more page hits but only 1/4 of the bandwidth. To me it appears Google is getting caught up on something with your site, and looping or trying to download video content or something but it's hard to tell without viewing the site. Is you code fairly valid? I did have something like this a few years ago, and the culprit was some badly formatted code.
Hi, It seems to check out with HTML and CSS validator, except for the odd display fix for IE. I even have alt text for the thousands of images. I'm think I'm going to have a long, hard look at the Rewrite Rules once Geosite has finished (it's up to about 38000 pages so far). Cheers BP
No problems, feel free to PM me the URL if you cant work it out and want me to have a quick look over it.
Thanks for the help. I think I may have got to the bottom of it I spent ages getting my robots.txt file just right last year but since then I've migrated to Drupal (from a mix of static files and my own php generated pages) and the Drupal generated robots file is woefully inadequate. The GSiteCrawler has showed up thousands of pages that are being crawled pointlessly and I'm pretty sure I have things tuned in now. I just need to wait for Google to grab my new file to see if it's done the trick. Another strange thing is that my Google search engine traffic and adsense income has increased by 25% since the redesign but my PR has dropped by 2. I'm not too worried though, its just weird! Cheers BP