1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

MSNBot being too aggressive?

Discussion in 'Bing' started by digitalpoint, Jul 31, 2009.

  1. #1
    So I was trying to figure out where a huge influx of used bandwidth was coming from for this site, and it *looked* like it was msnbot from the brief look at the logs. I ended up blocked a giant allocation of Microsoft IP addresses (65.52.0.0 - 65.55.255.255, which is 262,142 IP addresses) as a quick test to see if my bandwidth usage went back to normal...

    /sbin/route add -net 65.52.0.0 netmask 255.252.0.0  reject
    Code (markup):
    Sure enough, sustained bandwidth output went down by about 8Mbit/sec... So msnbot alone was eating up 8Mbit 24/7.

    I ended unblocking the IP addresses and adding a 2 second delay for each page msnbot can get via robots.txt (Crawl-delay: 2).

    Anyone else seeing this sort of stupidity from msnbot (149 robot requests every 10 seconds)?

    Check out a 10 second sample of hits from msnbot. Dumb.
     
    digitalpoint, Jul 31, 2009 IP
  2. DoDo Me

    DoDo Me Peon

    Messages:
    2,257
    Likes Received:
    27
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Bing has much less data compare Google, so they must work harder, what do you expect them to do?

    Get a better server or just block all robots except googlebot

    Who has a better server may not win, but who has the worse server will definitely lose.
     
    DoDo Me, Jul 31, 2009 IP
  3. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #3
    It's not about needing a better server, the servers handle the traffic just fine. Just saying 8Mbit/sec wasted for a bad search engine that gives few users compared to Google isn't worth it, that's all. That's why I limited them via robots.txt. MSNBot was crawling at a rate of 1,287,360 pages per day, now they are allowed to get 43,200 per day. If they can start driving traffic at a crawl pages -> users sent ratio that's anywhere remotely close to Google, I would let them crawl more.

    Right now MSN/Bing bring 0.2% of the search traffic Google does and waste about 40,000% more server resources than Google (those are real numbers, not made up BTW).

    I think maybe the software engineers that designed Windows might have designed their search engine spider... Both are really bad pieces of software.
     
    digitalpoint, Jul 31, 2009 IP
  4. Slincon

    Slincon Well-Known Member

    Messages:
    1,319
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    180
    #4
    i've heard about this before. msnbot also hammers requests to sites using various queries. There were a lot of complaints about this a year or two ago.
     
    Slincon, Jul 31, 2009 IP
  5. addaminsane

    addaminsane Well-Known Member

    Messages:
    431
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    130
    #5
    is it a good idea to block the msn bots as bing appears to be the up and coming online search engine?
     
    addaminsane, Jul 31, 2009 IP
  6. darkdrgn2k

    darkdrgn2k Active Member

    Messages:
    159
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    53
    #6
    Always bad to block bots....

    especialy if you wnt them to index your site
     
    darkdrgn2k, Jul 31, 2009 IP
  7. webbynoc.com

    webbynoc.com Peon

    Messages:
    71
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    if they are a harm to the network i would block them, we do not really care about indexing we just keep our network safe and I would do such to prevent bandwidth wasting.
     
    webbynoc.com, Jul 31, 2009 IP
  8. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #8
    Depends if you need to traffic or not. Search engine traffic ultimately comes at a resource cost (the resources used by the spiders). So if MSN can someday figure out how to drive more users for the amount of resources used, I'd let them spider more than the 40k pages/day I limited them to. Until then, they get throttled/limited here.
     
    digitalpoint, Aug 1, 2009 IP
  9. anthonywebs

    anthonywebs Banned

    Messages:
    657
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #9
    wow i have never heard about this before... i hate bing anyway so... WOW 40.000% ??? thats amazing how hard microsoft tries to beat google but they never will
     
    anthonywebs, Aug 1, 2009 IP
  10. Articlestopost.com

    Articlestopost.com Member

    Messages:
    138
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    26
    #10
    Thins is the first time i am hearing about this. I will have to keep a close eye on my stats.
     
    Articlestopost.com, Aug 3, 2009 IP
  11. Ibn Juferi

    Ibn Juferi Prominent Member

    Messages:
    6,221
    Likes Received:
    365
    Best Answers:
    0
    Trophy Points:
    310
    #11
    Wow, that is really a lot of hammering involved. I better keep a lookout on my own forum in case this happens too.
     
    Ibn Juferi, Aug 3, 2009 IP
  12. rustybrick

    rustybrick User ID 3

    Messages:
    384
    Likes Received:
    41
    Best Answers:
    0
    Trophy Points:
    158
    #12
    http://www.bing.com/community/blogs/webmaster/archive/2009/07/17/new-bot-work-continues-at-bing.aspx

    Hope that helps.
     
    rustybrick, Aug 3, 2009 IP
  13. FifthDimension

    FifthDimension Member

    Messages:
    294
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    43
    #13
    Bing seems to be growing very fast, so I guess it is normal for them to send lots of read requests to get their index size larger than the competition.
     
    FifthDimension, Aug 3, 2009 IP
  14. searchcandy

    searchcandy Peon

    Messages:
    64
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    searchcandy, Aug 3, 2009 IP
  15. searchcandy

    searchcandy Peon

    Messages:
    64
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Bing for the bin!
     
    searchcandy, Aug 3, 2009 IP
  16. Professional Dude

    Professional Dude Prominent Member

    Messages:
    6,261
    Likes Received:
    430
    Best Answers:
    0
    Trophy Points:
    330
    #16
    I have seen Yahoo bot doing the same thing, but never seen Msn bot.
     
    Professional Dude, Aug 3, 2009 IP
  17. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #17
    It's not even that we can't handle the load. It's more like, "wtf is the point of letting them eat so much resources for so little in return?"

    It's been 72 hours now since I added
    User-agent: msnbot
    Crawl-delay: 2
    Code (markup):
    to our robots.txt file, and they have yet to start adhering to it.

    Honestly, I think their search engine spider is just very badly designed. It seems like MSNBot causes problems for websites since it's inception.

    Even in 2009, now they can't figure out how to throttle themselves, use HTTP/1.1 consistently or adhere to their own Crawl-delay directives. Thumbs up for Microsoft engineers! {rolls eyes}
     
    digitalpoint, Aug 3, 2009 IP
  18. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #18
    digitalpoint, Aug 3, 2009 IP
  19. Professional Dude

    Professional Dude Prominent Member

    Messages:
    6,261
    Likes Received:
    430
    Best Answers:
    0
    Trophy Points:
    330
    #19
    Professional Dude, Aug 3, 2009 IP
  20. scylla

    scylla Notable Member

    Messages:
    1,025
    Likes Received:
    33
    Best Answers:
    1
    Trophy Points:
    225
    #20
    820 bots when I checked, most were msn.
     
    scylla, Aug 6, 2009 IP