So I was trying to figure out where a huge influx of used bandwidth was coming from for this site, and it *looked* like it was msnbot from the brief look at the logs. I ended up blocked a giant allocation of Microsoft IP addresses (65.52.0.0 - 65.55.255.255, which is 262,142 IP addresses) as a quick test to see if my bandwidth usage went back to normal... /sbin/route add -net 65.52.0.0 netmask 255.252.0.0 reject Code (markup): Sure enough, sustained bandwidth output went down by about 8Mbit/sec... So msnbot alone was eating up 8Mbit 24/7. I ended unblocking the IP addresses and adding a 2 second delay for each page msnbot can get via robots.txt (Crawl-delay: 2). Anyone else seeing this sort of stupidity from msnbot (149 robot requests every 10 seconds)? Check out a 10 second sample of hits from msnbot. Dumb.
Bing has much less data compare Google, so they must work harder, what do you expect them to do? Get a better server or just block all robots except googlebot Who has a better server may not win, but who has the worse server will definitely lose.
It's not about needing a better server, the servers handle the traffic just fine. Just saying 8Mbit/sec wasted for a bad search engine that gives few users compared to Google isn't worth it, that's all. That's why I limited them via robots.txt. MSNBot was crawling at a rate of 1,287,360 pages per day, now they are allowed to get 43,200 per day. If they can start driving traffic at a crawl pages -> users sent ratio that's anywhere remotely close to Google, I would let them crawl more. Right now MSN/Bing bring 0.2% of the search traffic Google does and waste about 40,000% more server resources than Google (those are real numbers, not made up BTW). I think maybe the software engineers that designed Windows might have designed their search engine spider... Both are really bad pieces of software.
i've heard about this before. msnbot also hammers requests to sites using various queries. There were a lot of complaints about this a year or two ago.
is it a good idea to block the msn bots as bing appears to be the up and coming online search engine?
if they are a harm to the network i would block them, we do not really care about indexing we just keep our network safe and I would do such to prevent bandwidth wasting.
Depends if you need to traffic or not. Search engine traffic ultimately comes at a resource cost (the resources used by the spiders). So if MSN can someday figure out how to drive more users for the amount of resources used, I'd let them spider more than the 40k pages/day I limited them to. Until then, they get throttled/limited here.
wow i have never heard about this before... i hate bing anyway so... WOW 40.000% ??? thats amazing how hard microsoft tries to beat google but they never will
Wow, that is really a lot of hammering involved. I better keep a lookout on my own forum in case this happens too.
http://www.bing.com/community/blogs/webmaster/archive/2009/07/17/new-bot-work-continues-at-bing.aspx Hope that helps.
Bing seems to be growing very fast, so I guess it is normal for them to send lots of read requests to get their index size larger than the competition.
It's not even that we can't handle the load. It's more like, "wtf is the point of letting them eat so much resources for so little in return?" It's been 72 hours now since I added User-agent: msnbot Crawl-delay: 2 Code (markup): to our robots.txt file, and they have yet to start adhering to it. Honestly, I think their search engine spider is just very badly designed. It seems like MSNBot causes problems for websites since it's inception. Even in 2009, now they can't figure out how to throttle themselves, use HTTP/1.1 consistently or adhere to their own Crawl-delay directives. Thumbs up for Microsoft engineers! {rolls eyes}
Oh, you can check out how well they are adhering to their own crawl-delay directive... http://forums.digitalpoint.com/online.php?who=spiders&ua=1&order=desc&sort=time&pp=200&page=1