Help in encouraging Yahoo and MSN robots to obedience

Discussion in 'robots.txt' started by ThomaSeidler, Jun 14, 2007.

  1. #1
    Yahoo and MSN have gone absolute nutts on my site: www.thegoodbook.co.uk

    A real pain. They don't seem to obey my robots.txt either...

    Here are my site stats for June:

    1 Yahoo Robot (www.yahoo.com) 18844 56.97%
    2 MSN Robot (search.msn.com) 5458 16.50%
    3 Internet Explorer 6.0 2377 7.19%
    4 Internet Explorer 7.0 2269 6.86%
    5 Google Robot (www.google.com) 1278 3.86%
    6 Mozilla 5 824 2.49%
    7 Firefox 2.0 709 2.14%
    8 Safari 338 1.02%
    9 Ask Jeeves Robot (www.ask.com) 253 0.76%

    Kind of comic, huh? Except my site is slowing right down, and wretched yahoo is to blame in good part. Unless it is my bad programming - it wouldn't be the first time sadly. But i've looked and looked and can see no wrong. Maybe my blindness, so i thought peer help would be best... Here is my robots.txt in full:

    # Robots.txt


    # slow down the mad MSN bot and the madder Yahoo bot
    User-agent: Slurp
    Crawl-delay: 99 [note - not in file: i've had this at 5, then 10, then 60, then 120 with apparently NO effect whatsoever, I then look on Yahoo site just now and they document crawl-delay: xx so it thought maybe it should just be two figures, hence current value]


    User-agent: msnbot
    Crawl-delay: 99
    Disallow: /*.jpg$
    Disallow: /*.mp3$

    # Disallow directory /productimages
    User-agent: *
    Disallow: /bookcovers/
    Disallow: /productfiles/
    Disallow: /*.jpg$
    Disallow: /*.mp3$

    Any help would be very very gladly appreciated.

    Yours in SEO,

    tom
     
    ThomaSeidler, Jun 14, 2007 IP
  2. ThomaSeidler

    ThomaSeidler Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    can anyone help? am i an idiot? is there something basic wrong with my robots.txt? I'm asking Yahoo atm and they are hopeless... didn't even read my email at first... cheers for any ANY help, even if its: "No you are not mad, your robots text looks like it SHOULD be working..."
     
    ThomaSeidler, Jun 15, 2007 IP
  3. ThomaSeidler

    ThomaSeidler Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    ah ha!

    now it could have been Mac OS Roman text encoding the file - which is default on my Macs, however, I recall having this as a major problem with .htaccess files, so I reckon 'tis possible it is the cause of this little beast. Anyway, have changed it to Windows Latin encoding, and Unix LF - will let you know if this tames the tiger...

    Peace,

    t
     
    ThomaSeidler, Jun 15, 2007 IP
  4. madkad

    madkad Active Member

    Messages:
    1,686
    Likes Received:
    83
    Best Answers:
    0
    Trophy Points:
    90
    #4
    I am a bit confussed by your post you seem to want to no about bots, but have browser info added in your stats, also the versions of the browsers ie.

    3 Internet Explorer 6.0 2377 7.19%
    4 Internet Explorer 7.0 2269 6.86%
    6 Mozilla 5 824 2.49%
    7 Firefox 2.0 709 2.14%

    :-S is this ment to be in?
     
    madkad, Jun 15, 2007 IP
  5. ThomaSeidler

    ThomaSeidler Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Perhaps you're right on reflection i didn't need to include them - it was quicker though, and you just see how strikingly the robots are dominating my bandwidth usage...
     
    ThomaSeidler, Jun 19, 2007 IP
  6. ThomaSeidler

    ThomaSeidler Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Checked line endings, no joy. Encoding a red herring apparently.

    Man, Slurp! is vexing me most exceedingly. A friend and I read online that Yahoo says it accepts crawl delay values up to a MAXIMUM value of 10. So I had another eureka moment, and thought, oh MAYBE this is it! Changed values to 10 just to see if it works. And what happens.

    Over last few days Yahoo goes up to its highest ever reading of my site, it is absolutely driving me crazy at the minute, people ring up saying can't access your site blah blah is slow. And it is that DoG of search engine robots, slurp, eating my bandwidth (perhaps, or you can at least see why it gets the blame).

    Total request stats:
    Friday 21st:
    1 Yahoo Robot 2580 71.53%
    3 MSN Robot 339 9.40%
    5 Google Robot 102 2.83%
    Saturday 22nd:
    1 Yahoo Robot 2901 74.52%
    2 MSN Robot 401 10.30%
    5 Google Robot 108 2.77%
    Sunday 23rd:
    1 Yahoo Robot 2165 69.79%
    2 MSN Robot 352 11.35%
    4 Google Robot 122 3.93%

    Madness! Will get on to Yahoo again directly, and try and get them to check their software/robot... My robots.txt is OK and should work shouldn't it? Someone say "yes" otherwise I still might think I'm insane! ;) lol
     
    ThomaSeidler, Jun 25, 2007 IP
  7. ThomaSeidler

    ThomaSeidler Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    beg your pardon, that should read "Total Unique visit stats" not "Total Request stats"
     
    ThomaSeidler, Jun 25, 2007 IP
  8. markowe

    markowe Well-Known Member

    Messages:
    1,136
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    165
    #8
    I was about to whinge about the same thing! Yahoo is also going crazy with my sites. Another really irritating thing is that it constantly changes IP address, like some Chinese scraper bot, excuse my stereotyping, which is really annoying for trying to track certain unique visitor trends...

    I notice Yahoo bot hits have quadrupled since April!! What gives, did they up their capacity or something? On the plus side, I also see a huge increase in Yahoo referrals, so I suppose I can't grumble too much...
     
    markowe, Jun 25, 2007 IP
  9. ThomaSeidler

    ThomaSeidler Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Well now. No replies from Yahoo. And they are still going crazy on my site, in spite of clear robots.txt telling them not to. MSN is also unchanged.

    July 2nd:
    1 Yahoo Robot 1876 62.08%
    3 MSN Robot 378 12.51%
    5 Google Robot 85 2.81%

    3rd:
    1 Yahoo Robot 1919 61.98%
    3 MSN Robot 381 12.31%
    5 Google Robot 99 3.20%

    4th:
    1 Yahoo Robot 2070 65.94%
    3 MSN Robot 339 10.80%
    4 Google Robot 125 3.98%

    I'm now putting rel="nofollow" on loads and loads of my links that are irrelevant hoping this may help curtail their wildness... will post back in time no doubt... Cheers ;)
     
    ThomaSeidler, Jul 5, 2007 IP
  10. ThomaSeidler

    ThomaSeidler Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Incidentally that they are still going crazy means they are apparently almost completely unaffected by their crawl-delay setting whether it is on 0 or 10 appears to have no impact. Just useful stuff fyi
     
    ThomaSeidler, Jul 5, 2007 IP
  11. ThomaSeidler

    ThomaSeidler Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    OK guys, this is me giving up. Yahoo is unstoppable. I can't ban them totally (cos i want to appear in their results). They won't contact me. Oh, yes, I suppose I'll keep trying, what is the point in a dedicated server if yahoo robots devour your processing power and bandwidth. Though maybe this sluggishness is caused by other things, but i'm sure not helped by yahoo...

    Later on, if anyone has any bright ideas, please please POST...
     
    ThomaSeidler, Jul 10, 2007 IP
  12. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #12
    i did not see any problem on your robots.txt file.
     
    trichnosis, Aug 21, 2007 IP
  13. evera

    evera Peon

    Messages:
    283
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #13
    When I put this in robots.txt :
    User-Agent: msnbot
    Crawl-Delay: 100
    
    User-agent: Slurp
    Crawl-delay: 100
    
    User-agent: slurp
    Crawl-delay: 100
    
    User-agent: msnbot-products
    Crawl-Delay: 100
    
    User-agent: msnbot-news
    Crawl-Delay: 100
    
    User-agent: msnbot-media
    Crawl-Delay: 100
    Code (markup):
    They stopped beeing a pain. Why shouldn't that work out for you?
     
    evera, Aug 28, 2007 IP