There's some info here http://www.tenspider.com/business-blog/weblog.php scroll down to the post from March 16th.
Any bot that's crawling your pages that does not send you traffic in return for the use of the content and your server resources is a bad bot. Block the UA and the IP range(s).
I use archive.org to look at other sites but I block their bot from most of mine. Last year it went on a rampage with one site and indexed about 5k pages in 24 hours so they were graduated with honors and placed on my bad bot list.
I keep trying to tell people this: If it's truly a bad bot, it isn't going to pay any attention to your little robots.txt file - you have to ban it in .htaccess.
Who said anything about robots.txt? .htaccess is ok after you’ve identified the bot or IP but the only way to cut them off in real time is via session monitoring, spider traps and captchas. And I vote for spanking the miscreants with 2x4 Spanking the miscreants owners of all the scrapers & harvesters
My mistake. I linked to the post from the keyword tracker tool and didn't notice which forum it was in. . -jay
to block via .htaccess simply ban the user_agent like so SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot SetEnvIfNoCase User-Agent "^Teleport" bad_bot SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot SetEnvIfNoCase User-Agent "^SickleBot" bad_bot <Limit GET POST> Order Allow,Deny Allow from all Deny from env=bad_bot </Limit> Code (markup):
And if they're truly bad bots, they'll masquerade as some other UA, creating the need for IP banning in your .htaccess.
And the really really bad bots will rotate their UAs and IP addresses, support javascript and browse your site like a user. I block 100 or so IP ranges a day and the volume keeps growing. It's an epidemic caused by Adsense and MFA SE spam sites. This month for my sites the worst offending countries are Peru, China, Romania, and the Netherlands. Last month the list was different and I'm sure next month will be to.
yes but i had 40 sicklebots on my site yesterday (i accidently temp allowed them access) and each one had a completly different IP.
As a side note, bots can be a bother but I find they are a very small percentage of my total bandwidth/visitors, so I don't get too uptight about it.