Excessive Bandwidth Usage From Slurp: Check your logs

Discussion in 'Yahoo' started by cDc, Jul 15, 2010.

  1. #1
    Hello - a short story, possibly useful for some...

    One of my clients (a retail website) sites receives about 500 visitors per day and has approximately 100,000 products available. I recently moved hosts, my previous host had no real bandwidth monitoring and high bandwidth limits, but my new host has higher bandwidth costs and lower limits. After a few days had seen very high bandwidth usage, about 5 GB per day, much higher than I was expecting for the site (mostly text pages, hardly any graphics) so I started investigating...

    I enabled sc-bytes in my IIS logfiles, and used logparser from Microsoft to analyse the bandwidth usage over several days.

    To cut a long story short, Yahoo was the culprit, requesting over 3GB PER DAY from the server whilst Google requesting a mere 200mb. The cause was duplicate pages in the index, with parameters causing thousands of duplicate urls. Google seems clever enough to know not to crawl the duplicate pages but not Yahoo.

    For example:

    Page.aspx?letter=A is the main page, and should be crawled

    But yahoo was also crawling.. Page.aspx?letter=A&page=2,Page.aspx?letter=A&page=3,Page.aspx?letter=A&page=4....

    A simple change to the robots file to stop this and the bandwidth use is much more sensible.

    So a word of warning - check your logs and make sure yahoo is now hammering your site!

    cDc
     
    cDc, Jul 15, 2010 IP
  2. bama boy

    bama boy Active Member

    Messages:
    376
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    55
    #2
    Nice info
    thanks for sharing with us
     
    bama boy, Jul 15, 2010 IP
  3. sivaganesh

    sivaganesh Member

    Messages:
    523
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    28
    #3
    use Chennai Central plugin if you have Wordpress CMS ..
     
    sivaganesh, Jul 15, 2010 IP
  4. gameutopia

    gameutopia Peon

    Messages:
    975
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #4
    We all like when they crawl our sites, but not so much if they are chewing up more bandwidth than necessary. Makes me wonder how much bandwidth is consumed every day just for bots, crawlers, or whatever.
     
    gameutopia, Jul 15, 2010 IP