1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Googlebot Bandwidth

Discussion in 'Search Engine Optimization' started by blackpudding, Jan 14, 2008.

  1. #1
    I'm happy with the SEO on my main site but the bandwidth the Googlebot is using seems excessive. For December it was approx 45% of my total bandwidth. I don't want to lose SE position as nearly 60% of my traffic comes from the Google search engine but :eek:

    December - Googlebot - Hits: 665816+21 - Bandwidth: 31.66 GB

    Should I take this as a vote of confidence from the G-man or just ban the Google image bot and slow down the search with webmaster tools? Is this just the cost of being regularly crawled?

    Cheers
    BP
     
    blackpudding, Jan 14, 2008 IP
  2. LinkBliss

    LinkBliss Peon

    Messages:
    697
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Yes, don't limit the Google bot. Hopefully it is a good sign of things to come for your site and non-Googlebot traffic will pick up soon.

    The only time I wouldn't agree with my own suggestion above is if there are virtually no referrals coming from Google, just the bots.

    e.g. 99% of your bandwidth is from the Googlebot, 1% is direct traffic e.g. yourself, and there is no traffic coming from Google. Then it's probably time to shut down the site.
     
    LinkBliss, Jan 14, 2008 IP
  3. mymaldives

    mymaldives Well-Known Member

    Messages:
    153
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    108
    #3
    It’s normal to have a high Google crawl rate before a PR update, but 31.66GB of date is quite unbelievable. Do you some huge files on your server?
     
    mymaldives, Jan 14, 2008 IP
  4. blackpudding

    blackpudding Peon

    Messages:
    206
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I have a couple of thousand pdfs but they aren't accessible by search engines as they are protected by a download script. The site does have around 20K pages indexed but still...

    Could this is due to my change over to Drupal in October? Does Google suddenly think I'm updating the pages more often than I am?

    Cheers
    BP
     
    blackpudding, Jan 14, 2008 IP
  5. blackpudding

    blackpudding Peon

    Messages:
    206
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #5
    FWIW it's up to 17 GB already this month!
     

    Attached Files:

    blackpudding, Jan 14, 2008 IP
  6. mymaldives

    mymaldives Well-Known Member

    Messages:
    153
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    108
    #6
    use the Google Webmaster tools and check crawl stats, and the indexed pages in your site
     
    mymaldives, Jan 14, 2008 IP
    blackpudding likes this.
  7. blackpudding

    blackpudding Peon

    Messages:
    206
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Hi,
    Crawl rate is set to normal, but there does seem to be have been a steady increase in the number of pages crawled since my redesign. There are 20400 pages indexed but I've been careful to exclude registration pages etc in robots.txt.

    Cheers
    BP
     

    Attached Files:

    blackpudding, Jan 14, 2008 IP
  8. mymaldives

    mymaldives Well-Known Member

    Messages:
    153
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    108
    #8
    could you post the site?
     
    mymaldives, Jan 14, 2008 IP
  9. Valley

    Valley Peon

    Messages:
    1,820
    Likes Received:
    47
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Is the bandwidth expensive then?
     
    Valley, Jan 14, 2008 IP
  10. blackpudding

    blackpudding Peon

    Messages:
    206
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #10
    I'd rather not ;)

    It's not so much the bandwidth as the load on the server I'm concerned about. All my sites are php based and while I'm signed up for 5 times the bandwidth I need, I can see me needing more RAM and a faster CPU pretty soon.

    Obviously I don't mind if it's the price you pay for entry into Google Land, I was just shocked at the bandwidth they used on just one of my sites, albeit the most established one.

    Cheers
    BP
     
    blackpudding, Jan 14, 2008 IP
  11. Valley

    Valley Peon

    Messages:
    1,820
    Likes Received:
    47
    Best Answers:
    0
    Trophy Points:
    0
    #11
    bots's got a lot to answer 4
     
    Valley, Jan 15, 2008 IP
    sweetfunny likes this.
  12. little_angel

    little_angel Peon

    Messages:
    10
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Well if you use Google webmaster tools, you can check the google bot's progress.
     
    little_angel, Jan 15, 2008 IP
  13. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #13
    That's pretty excessive for the amount of pages, i've got sites with 5 times the volume of pages and most are around the 2GB mark so far for this month so 17GB is pretty heavy.

    Do this query in Google

    allinurl:www.yoursite.com + .pdf

    And see if your download script is working correctly, also if your .pdf's are in a specific folder do an exclusion for that directory with Robots.txt
     
    sweetfunny, Jan 15, 2008 IP
  14. blackpudding

    blackpudding Peon

    Messages:
    206
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Thanks for the tip. Unfortunately it only came up with the one small pdf that isn't in the protected folder.

    I already went through the last 3 months download logs looking for rogue bots but Google hasn't been in there.

    I've set GeositeCrawler on my site to see what URLs I can afford to add to robots.txt (downloads is already in there). GoogleBot took another 2GB last night!

    Cheers
    BP
     
    blackpudding, Jan 15, 2008 IP
  15. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Yeah somethings not right, for 17GB (2GB in 24 hours) and only 20k pages is very excessive.

    Ok i just logged in and had a look at a few of my larger sites, and:

    190,000 page site - 3.4GB with 326,529 Hits
    500,000 page site - 8.2GB with 844,287 Hits

    So.. As comparison, yours is consuming much more bandwidth compared to the ratio of hits 665,816 yet 31GB

    As you can see my 500k page site gets more page hits but only 1/4 of the bandwidth. To me it appears Google is getting caught up on something with your site, and looping or trying to download video content or something but it's hard to tell without viewing the site.

    Is you code fairly valid? I did have something like this a few years ago, and the culprit was some badly formatted code.
     
    sweetfunny, Jan 15, 2008 IP
  16. blackpudding

    blackpudding Peon

    Messages:
    206
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Hi,
    It seems to check out with HTML and CSS validator, except for the odd display fix for IE. I even have alt text for the thousands of images.

    I'm think I'm going to have a long, hard look at the Rewrite Rules once Geosite has finished (it's up to about 38000 pages so far).

    Cheers
    BP
     
    blackpudding, Jan 15, 2008 IP
  17. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #17
    No problems, feel free to PM me the URL if you cant work it out and want me to have a quick look over it.
     
    sweetfunny, Jan 15, 2008 IP
    Valley and blackpudding like this.
  18. blackpudding

    blackpudding Peon

    Messages:
    206
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Thanks for the help. I think I may have got to the bottom of it :rolleyes:

    I spent ages getting my robots.txt file just right last year but since then I've migrated to Drupal (from a mix of static files and my own php generated pages) and the Drupal generated robots file is woefully inadequate. The GSiteCrawler has showed up thousands of pages that are being crawled pointlessly and I'm pretty sure I have things tuned in now. I just need to wait for Google to grab my new file to see if it's done the trick.

    Another strange thing is that my Google search engine traffic and adsense income has increased by 25% since the redesign but my PR has dropped by 2. I'm not too worried though, its just weird!

    Cheers
    BP
     
    blackpudding, Jan 15, 2008 IP
  19. Valley

    Valley Peon

    Messages:
    1,820
    Likes Received:
    47
    Best Answers:
    0
    Trophy Points:
    0