Anyone knows the list of bad bots ? There are many crawlers which are causing overloading on our website. I tried installing anticrawl, but it blocks many useful bots like Googlebot's image and mobile bot. Any idea of some new list of bots which one can put in htaccess. Thanks Gurpreet
Here is a small list which I've created after regularly investigating my access.log's; ^$ libwww-perl charlotte Metalogger irlbot lmcrawler java libwww lwp::simple larbin mothra netscan snapbot sna-0 Microsoft URL Control Missigua Locator PEAR HTTP_Request class Wells Search II psycheclone Python-urllib WEP Search FDW Code (markup): It is generally useless to ban bots via their user-agent strings, since it can be configured to anything (including legitimate ones like IE or Mozilla) for most bots. But, it still can be useful if you use at least the first two user-agents on my list, because in most cases script kiddies generally either leave their bots user-agent string intact or just deletes the string
I have this list that mixed and matched with the above can improve your robots.txt User-agent: Googlebot-Image Disallow: / User-agent: BotRightHere User-agent: larbin User-agent: b2w/0.1 User-agent: Copernic User-agent: psbot User-agent: Python-urllib User-agent: NetMechanic User-agent: URL_Spider_Pro User-agent: CherryPicker User-agent: EmailCollector User-agent: EmailSiphon User-agent: WebBandit User-agent: EmailWolf User-agent: ExtractorPro User-agent: CopyRightCheck User-agent: Crescent User-agent: SiteSnagger User-agent: ProWebWalker User-agent: CheeseBot User-agent: LNSpiderguy User-agent: Alexibot User-agent: Teleport User-agent: TeleportPro User-agent: MIIxpc User-agent: Telesoft User-agent: Website Quester User-agent: WebZip User-agent: moget/2.1 User-agent: WebZip/4.0 User-agent: WebStripper User-agent: WebSauger User-agent: WebCopier User-agent: NetAnts User-agent: Mister PiX User-agent: WebAuto User-agent: TheNomad User-agent: WWW-Collector-E User-agent: RMA User-agent: libWeb/clsHTTP User-agent: asterias User-agent: httplib User-agent: turingos User-agent: spanner User-agent: InfoNaviRobot User-agent: Harvest/1.5 User-agent: Bullseye/1.0 User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 User-agent: CherryPickerSE/1.0 User-agent: CherryPickerElite/1.0 User-agent: WebBandit/3.50 User-agent: NICErsPRO User-agent: DittoSpyder User-agent: Foobot User-agent: SpankBot User-agent: BotALot User-agent: lwp-trivial/1.34 User-agent: lwp-trivial User-agent: BunnySlippers User-agent: URLy Warning User-agent: Wget/1.6 User-agent: Wget/1.5.3 User-agent: Wget User-agent: LinkWalker User-agent: cosmos User-agent: moget User-agent: hloader User-agent: humanlinks User-agent: LinkextractorPro User-agent: Offline Explorer User-agent: Mata Hari User-agent: LexiBot User-agent: Web Image Collector User-agent: The Intraformant User-agent: True_Robot/1.0 User-agent: True_Robot User-agent: BlowFish/1.0 User-agent: JennyBot User-agent: MIIxpc/4.2 User-agent: BuiltBotTough User-agent: ProPowerBot/2.14 User-agent: BackDoorBot/1.0 User-agent: toCrawl/UrlDispatcher User-agent: suzuran User-agent: TightTwatBot User-agent: VCI WebViewer VCI WebViewer Win32 User-agent: VCI User-agent: Szukacz/1.4 User-agent: Openfind data gatherer User-agent: Openfind User-agent: Xenu's Link Sleuth 1.1c User-agent: Xenu's User-agent: Zeus User-agent: RepoMonkey Bait & Tackle/v1.01 User-agent: RepoMonkey User-agent: Openbot User-agent: URL Control User-agent: Zeus Link Scout User-agent: Zeus 32297 Webster Pro V2.9 Win32 User-agent: Webster Pro User-agent: EroCrawler User-agent: LinkScan/8.1a Unix User-agent: Keyword Density/0.9 User-agent: Kenjin Spider User-agent: Iron33/1.0.2 User-agent: Bookmark search tool User-agent: GetRight/4.2 User-agent: FairAd Client User-agent: Gaisbot User-agent: Aqua_Products User-agent: Radiation Retriever 1.1 User-agent: Flaming AttackBot User-agent: Curl User-agent: Web Reaper User-agent: Firefox User-agent: Opera User-agent: Netscape User-agent: WebVulnCrawl User-agent: WebVulnScan Disallow: / User-agent: * Disallow: Code (markup): A few of them are web browsers but, according to the site I get this list from, such software contributes with hacking due to their plugins, thus require blocking. I'm not ir that's okay though.
Because they are usually used for harvesting email addresses from websites (obviously, for creating email lists for spamming) or creating spam comments/posts/edits to the sites which has editable sections open to public. Also, in some cases, they are used for hacking attempts.