View Full Version : Unauthorized bots?
girbaud
Jan 17th 2006, 10:24 am
Do you know some unauthorized bots?
try to look at this one Robots blog (http://www.webmasterworld.com/robots.txt)
minstrel
Jan 17th 2006, 8:45 pm
There are loads of them floating around. What is your question exactly?
I can tell you that Brett Tabke's solution (ban them all) is a bad idea, as he discovered.
I can also tell you that trying to build a huge robots.txt file in an effort to ban all bad bots is also a major waste of time. The bad bots don't even read robots.txt, unless they're trying to get a list of directories and files you'd rather not have anyone see so they can zip and and have a really close look at them.
girbaud
Jan 18th 2006, 12:33 pm
can you give me an example of bad bots?
Smyrl
Jan 18th 2006, 12:36 pm
Those that harvesy e-mail addresses are among the collection of bad bots.
Shannon
minstrel
Jan 18th 2006, 5:57 pm
See bad bots (http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLG,GGLG:2005-35,GGLG:en&q=bad+bots) and rogue bots (http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLG,GGLG:2005-35,GGLG:en&q=rogue+bots).
For example:
KLOTH.NET - List of Bad BotsSome anti bad bot measures, list of bad bots and nasty spiders.
www.kloth.net/internet/badbots.php - 50k - Jan 16, 2006 - Cached - Similar pages
KLOTH.NET - Trap bad bots in a bot traptrap bad bots, anti bad bot measures. ... How to build a Bot Trap and keep bad bots away from a web site. Block spam bots and other bad bots from accessing ...
www.kloth.net/internet/bottrap.php - 14k - Jan 17, 2006 - Cached - Similar pages
[ More results from www.kloth.net ]
Bad, Bad Bots - The Community's Center for SecurityBad, Bad Bots, PDF ยท Print ... Once that access is available, the machines become "bots," controlled remotely by hackers to do their nefarious bidding. ...
www.linuxsecurity.com/content/view/117295/65/ - 37k - Cached - Similar pages
wrmineo
Jan 18th 2006, 6:05 pm
Awesome resources and information there Minstrel - thanks!
Ironic reference source there girbaud - here's a specific exclusion I run in many of my sites:
User-agent: WebmasterWorldForumBot
Disallow: /
minstrel
Jan 18th 2006, 6:13 pm
User-agent: WebmasterWorldForumBot
Disallow: /
That's hilarious!! :D
minstrel
Jan 18th 2006, 7:07 pm
Yes, but keep it in perspective: Most people will never see most of those.
girbaud
Jan 19th 2006, 7:43 am
Thanks a lot for those nice responses.
minstrel always gives good resources :D
Thanks a lot people!
wrmineo
Jan 19th 2006, 8:16 am
Yes, Minstrel always has good answers and resources; he's well organized obviously and that's the idiosynchratic trait that sets him above others.
Now ... all ass-kissing and joking aside, I do use some very specific exlusions of what I consider bad bots.
Here's a lengthy example (http://www.weaver2006.org/robots.txt).
minstrel
Jan 19th 2006, 8:24 am
Disallow: /
User-agent: mozilla/4
Disallow: /
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000)
Disallow: /
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows ME)
Disallow: /
User-agent: mozilla/5
Disallow: /
These will block some good bots too, including Googlebot/2.1 :
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Disallow: /
User-agent: Xenu's
Disallow: /
User-agent: Xenu's Link Sleuth 1.1c
This blocks a popular dead link checker which I and many other people use to update websites and delete dead links. It generates reports when it's done, and generally, unless it's a very special site, if an error message is generated it is simpler for me to delete the link than to investigate why it's generating an error. This, if I had a link to you on my site, after I ran Xenu link checker, you'd lose a backlink - probably not what you want.
Most of the rest are bots that are almost certainly going to ignore the robots.txt file anyway. For that reason, I always advise AGAINST this type of robots.txt file -- you are not going to stop the bad bots and you may well inadvertently block some good ones.
wrmineo
Jan 19th 2006, 8:43 am
More phenomenal information Minstrel thanks!
This is a lengthy example just for smart minds like yours to pick apart and advise on :)
This robot file is actually from a client site that had some specific concerns and issues so we "appeased" them with an eye candy robots file to subdue some fears.
Luckily it has stopped it from getting good results thus far.
Googlebot 138+37 1.01 MB 19 Jan 2006 - 01:31
MSNBot 125+41 1.69 MB 19 Jan 2006 - 02:48
Inktomi Slurp 79+72 594.98 KB 19 Jan 2006 - 01:52
Unknown robot (identified by 'crawl') 32+40 402.34 KB 17 Jan 2006 - 12:58
Unknown robot (identified by hit on 'robots.txt') 0+53 244.14 KB 19 Jan 2006 - 01:28
Unknown robot (identified by 'spider') 10+5 138.46 KB 19 Jan 2006 - 01:05
Harvest 9+1 102.24 KB 07 Jan 2006 - 20:07
AskJeeves 2+2 34.97 KB 14 Jan 2006 - 05:30
Alexa (IA Archiver) 2+2 34.97 KB 16 Jan 2006 - 23:00
Walhello appie 1+1 14.68 KB 18 Jan 2006 - 15:22
Unknown robot (identified by 'robot') 1+1 4.61 KB 19 Jan 2006 - 02:41
The real concern was spam and DOS attacks as it's a political campaign website, but I will take a further look at the information you provided and reinitiate the conversation with the candidate.
Are they missing potential traffic would be the primary concern and point that might make them reconsider.
Thanks.
blue_angel
Jun 8th 2009, 12:59 am
Look for Unknown robot in a search engine there are many many articles relative with that
linkdealer
Jun 8th 2009, 11:40 pm
You can check few at seocrazy.blogspot.com/2008/04/spammy-robots-list.html
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.