Hello - a short story, possibly useful for some... One of my clients (a retail website) sites receives about 500 visitors per day and has approximately 100,000 products available. I recently moved hosts, my previous host had no real bandwidth monitoring and high bandwidth limits, but my new host has higher bandwidth costs and lower limits. After a few days had seen very high bandwidth usage, about 5 GB per day, much higher than I was expecting for the site (mostly text pages, hardly any graphics) so I started investigating... I enabled sc-bytes in my IIS logfiles, and used logparser from Microsoft to analyse the bandwidth usage over several days. To cut a long story short, Yahoo was the culprit, requesting over 3GB PER DAY from the server whilst Google requesting a mere 200mb. The cause was duplicate pages in the index, with parameters causing thousands of duplicate urls. Google seems clever enough to know not to crawl the duplicate pages but not Yahoo. For example: Page.aspx?letter=A is the main page, and should be crawled But yahoo was also crawling.. Page.aspx?letter=A&page=2,Page.aspx?letter=A&page=3,Page.aspx?letter=A&page=4.... A simple change to the robots file to stop this and the bandwidth use is much more sensible. So a word of warning - check your logs and make sure yahoo is now hammering your site! cDc
We all like when they crawl our sites, but not so much if they are chewing up more bandwidth than necessary. Makes me wonder how much bandwidth is consumed every day just for bots, crawlers, or whatever.