There are loads of them floating around. What is your question exactly? I can tell you that Brett Tabke's solution (ban them all) is a bad idea, as he discovered. I can also tell you that trying to build a huge robots.txt file in an effort to ban all bad bots is also a major waste of time. The bad bots don't even read robots.txt, unless they're trying to get a list of directories and files you'd rather not have anyone see so they can zip and and have a really close look at them.
Awesome resources and information there Minstrel - thanks! Ironic reference source there girbaud - here's a specific exclusion I run in many of my sites: User-agent: WebmasterWorldForumBot Disallow: /
Yes, Minstrel always has good answers and resources; he's well organized obviously and that's the idiosynchratic trait that sets him above others. Now ... all ass-kissing and joking aside, I do use some very specific exlusions of what I consider bad bots. Here's a lengthy example.
These will block some good bots too, including Googlebot/2.1 : This blocks a popular dead link checker which I and many other people use to update websites and delete dead links. It generates reports when it's done, and generally, unless it's a very special site, if an error message is generated it is simpler for me to delete the link than to investigate why it's generating an error. This, if I had a link to you on my site, after I ran Xenu link checker, you'd lose a backlink - probably not what you want. Most of the rest are bots that are almost certainly going to ignore the robots.txt file anyway. For that reason, I always advise AGAINST this type of robots.txt file -- you are not going to stop the bad bots and you may well inadvertently block some good ones.
More phenomenal information Minstrel thanks! This is a lengthy example just for smart minds like yours to pick apart and advise on This robot file is actually from a client site that had some specific concerns and issues so we "appeased" them with an eye candy robots file to subdue some fears. Luckily it has stopped it from getting good results thus far. Googlebot 138+37 1.01 MB 19 Jan 2006 - 01:31 MSNBot 125+41 1.69 MB 19 Jan 2006 - 02:48 Inktomi Slurp 79+72 594.98 KB 19 Jan 2006 - 01:52 Unknown robot (identified by 'crawl') 32+40 402.34 KB 17 Jan 2006 - 12:58 Unknown robot (identified by hit on 'robots.txt') 0+53 244.14 KB 19 Jan 2006 - 01:28 Unknown robot (identified by 'spider') 10+5 138.46 KB 19 Jan 2006 - 01:05 Harvest 9+1 102.24 KB 07 Jan 2006 - 20:07 AskJeeves 2+2 34.97 KB 14 Jan 2006 - 05:30 Alexa (IA Archiver) 2+2 34.97 KB 16 Jan 2006 - 23:00 Walhello appie 1+1 14.68 KB 18 Jan 2006 - 15:22 Unknown robot (identified by 'robot') 1+1 4.61 KB 19 Jan 2006 - 02:41 Code (markup): The real concern was spam and DOS attacks as it's a political campaign website, but I will take a further look at the information you provided and reinitiate the conversation with the candidate. Are they missing potential traffic would be the primary concern and point that might make them reconsider. Thanks.