Slurp is eating my bandwidth

Discussion in 'Yahoo' started by cenkbut, Nov 28, 2006.

  1. #1
    Do you have similar experience with Yahoo bot (Slurp). I have 500 hits a day from yahoobot, expecting to be indexed all my pages which is around (23000) and only indexed 1800 pages. 1.2 gb bandwidth a day is outrageous. So what is this, all about?

    I'd appreciate if you share your experiences with slurp recently, Shall i change the slurp crawl settings in robot.txt? What would you advise?

    thanks a lot
     
    cenkbut, Nov 28, 2006 IP
  2. CanadianEh

    CanadianEh Notable Member

    Messages:
    3,812
    Likes Received:
    380
    Best Answers:
    0
    Trophy Points:
    260
    #2
    Yes, 1.2 gb bandwidth a day is outrageous.

    I recently ran out of bandwidth on a site I believe Slurp was the part of the problem. Slurp has been using up the same as the other search engines combined.

    Have you given Yahoo a sitemap. You should be able to solve your problem with a sitemap.
     
    CanadianEh, Nov 28, 2006 IP
  3. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Same here. One of our smaller sites all of a sudden had 20K 'uniques' according to Urchin. Turns out to be 90% Slurp. The website only has 100 or so pages and maybe totals 10mb and yet they managed to eat around 400 meg worth of bandwidth. Slurpy must have gotten confused.

     
    T0PS3O, Nov 28, 2006 IP
  4. inspiration100

    inspiration100 Active Member

    Messages:
    930
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    60
    #4
    I think robots.txt should work. Slurp goes burp while surfing your turf.
     
    inspiration100, Nov 28, 2006 IP
  5. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Are any of you submitting a sitemap/feed to Yahoo's Site tool? I'm not for this site but I am for another site which is also hit heavily by Slurpsy.
     
    T0PS3O, Nov 28, 2006 IP
  6. master06

    master06 Peon

    Messages:
    2,806
    Likes Received:
    121
    Best Answers:
    0
    Trophy Points:
    0
    #6
    you can use robots.txt and you can deny slurp buts if you want.
     
    master06, Nov 28, 2006 IP
  7. dastuff

    dastuff Peon

    Messages:
    475
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I've also seen ways to slow slurp down (w/o denying it) using robots.txt

    This can be a good way to limit how much it munches...

    If i can find the article I'll post it later (or someone else may have it).
     
    dastuff, Nov 28, 2006 IP
  8. petertdavis

    petertdavis Notable Member

    Messages:
    1,494
    Likes Received:
    159
    Best Answers:
    0
    Trophy Points:
    235
    #8
    Heh, I wrote in my blog, back in April, about my server getting the Slurp! Hump! and it's still happening.
     
    petertdavis, Nov 28, 2006 IP
  9. vagrant

    vagrant Peon

    Messages:
    2,284
    Likes Received:
    181
    Best Answers:
    0
    Trophy Points:
    0
    #9
    vagrant, Nov 28, 2006 IP
  10. Nithanth

    Nithanth Banned

    Messages:
    998
    Likes Received:
    71
    Best Answers:
    0
    Trophy Points:
    0
    #10
    There's nothing much you can do, other than what vagrant has suggested.

    By the way, just 500 hits by slurp takes away 1.2 GB bandwidth? 500 hits? That's a bit strange.
     
    Nithanth, Nov 28, 2006 IP
  11. Anita

    Anita Peon

    Messages:
    1,142
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Do you have lots of pages that link to each other? Slurp is probably following every single link, and not recognizing that they've already cached/read the page ... resulting in some pages being read many times. This is what I've seen on my blog when I've viewed the logs.
     
    Anita, Nov 29, 2006 IP
  12. CanadianEh

    CanadianEh Notable Member

    Messages:
    3,812
    Likes Received:
    380
    Best Answers:
    0
    Trophy Points:
    260
    #12
    That's a useful observation.
     
    CanadianEh, Nov 29, 2006 IP
  13. Cryogenius

    Cryogenius Peon

    Messages:
    1,280
    Likes Received:
    118
    Best Answers:
    0
    Trophy Points:
    0
    #13
    A good way to limit search bot traffic (as well as user bandwidth) is to make sure your server headers and time-stamps are correct. Specifically: "Last-Modified" should be only updated when the content actually changes, and "Expires" should be set to a time and date when you expect the content to change again (1 day or 1 week after the Last-Modified). Static HTML files will be handled properly by the webserver, so don't worry about them. Many dynamic systems output a Last-Modified time which is always the current time, regardless of the actual time it was changed.

    Even more advanced is to check the client headers for "If-Modified-Since", and if the modification date of the content is not newer than this date, then return HTTP 304 "Not Modified", and exit. The client (search engine or browser) will then use the copy that it has cached from a previous visit.

    If you have a XML, RSS or ROR sitemap, make sure that you have accurate timestamps. You don't want the search engines thinking that your content has changed when it hasn't.

    Finally, session ids are a real killer for search engines, so make sure you detect their user agents and don't get give one.

    Hope that helps,

    Cryo.
     
    Cryogenius, Nov 29, 2006 IP
  14. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #14
    you can prevent yahoobot with your robots.txt file and sending a urllist to yahoo may help to you;)
     
    trichnosis, Nov 29, 2006 IP
  15. cenkbut

    cenkbut Peon

    Messages:
    218
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Yes they are linked to each other around 24000 dynamic pages. Btw it is more than 500 hits, my stats show 500 uniques from yahoo and 10 times more hits. How many bots, do yahoo have and how can it be 500 uniques.

    thanks cyrogenius, that explains a lot about my dynamic pages but how can I prevent the system not to change the last modified date for dynamic pages.
    Thank god, i am not using session ids for search engines.

    Btw thanks a lot, vagrant, I will try changing the robots.txt file and see what happens, i hope it will help a bit..
     
    cenkbut, Dec 1, 2006 IP
  16. sarathy

    sarathy Peon

    Messages:
    1,613
    Likes Received:
    76
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Sorry, posted in the wrong thread
     
    sarathy, Dec 1, 2006 IP
  17. Not Registered

    Not Registered Well-Known Member

    Messages:
    685
    Likes Received:
    58
    Best Answers:
    0
    Trophy Points:
    120
    #17
    You might be checking this bandwidth management solution, although it's not a cheap solution IMHO.
     
    Not Registered, Dec 1, 2006 IP
  18. RaginBajin

    RaginBajin Peon

    Messages:
    87
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #18
    I created a couple of new sites that have a lot of dynamic pages on them. I got to start paying attention to this as well or I could be getting myself into trouble.

    I just wish Google would do my site like this as well as yahoo.
     
    RaginBajin, Dec 5, 2006 IP
  19. Correctus

    Correctus Straight Edge

    Messages:
    3,453
    Likes Received:
    389
    Best Answers:
    0
    Trophy Points:
    195
    #19
    Yep one of my friends sites is having the same problem I wonder if Yahoo is going for a SERP makeover

    IT
     
    Correctus, Dec 8, 2006 IP
  20. jbladeus

    jbladeus Peon

    Messages:
    485
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #20
    thank goodness that my host doesnt have bandwidth restrictions otherwise i'd have been fried. Slurp was eating up 3.4 gbs of average daily bandwidth on my 90k+ site last week. :)
     
    jbladeus, Dec 8, 2006 IP