Is there any bot or spider i should take out from this list. I put this list in my htaccess file SetEnvIfNoCase User-Agent "^BackDoorBot" bad_bot SetEnvIfNoCase user-agent "^BlackWidow" bad_bot SetEnvIfNoCase User-Agent "^BotALot" bad_bot SetEnvIfNoCase User-Agent "^Cegbfeieh" bad_bot SetEnvIfNoCase user-agent "^ChinaClaw" bad_bot SetEnvIfNoCase User-Agent "^CopyRightCheck" bad_bot SetEnvIfNoCase user-agent "^Custo" bad_bot SetEnvIfNoCase user-agent "^DISCo" bad_bot SetEnvIfNoCase user-agent "^Download\ Demon" bad_bot SetEnvIfNoCase user-agent "^eCatch" bad_bot SetEnvIfNoCase user-agent "^EirGrabber" bad_bot SetEnvIfNoCase user-agent "^EmailSiphon" bad_bot SetEnvIfNoCase user-agent "^EmailWolf" bad_bot SetEnvIfNoCase user-agent "^Express\ WebPictures" bad_bot SetEnvIfNoCase user-agent "^ExtractorPro" bad_bot SetEnvIfNoCase user-agent "^EyeNetIE" bad_bot SetEnvIfNoCase user-agent "^FlashGet" bad_bot SetEnvIfNoCase user-agent "^GetRight" bad_bot SetEnvIfNoCase user-agent "^GetWeb!" bad_bot SetEnvIfNoCase user-agent "^Go!Zilla" bad_bot SetEnvIfNoCase user-agent "^Go-Ahead-Got-It" bad_bot SetEnvIfNoCase user-agent "^GrabNet" bad_bot SetEnvIfNoCase user-agent "^Grafula" bad_bot SetEnvIfNoCase user-agent "^HMView" bad_bot SetEnvIfNoCase user-agent "HTTrack" bad_bot SetEnvIfNoCase user-agent "^Image\ Stripper" bad_bot SetEnvIfNoCase user-agent "Indy\ Library" [NC,OR] SetEnvIfNoCase user-agent "^InterGET" bad_bot SetEnvIfNoCase user-agent "^Internet\ Ninja" bad_bot SetEnvIfNoCase user-agent "^JetCar" bad_bot SetEnvIfNoCase user-agent "^JOC\ Web\ Spider" bad_bot SetEnvIfNoCase user-agent "^larbin" bad_bot SetEnvIfNoCase user-agent "^LeechFTP" bad_bot SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot SetEnvIfNoCase user-agent "^Mass\ Downloader" bad_bot SetEnvIfNoCase user-agent "^MIDown\ tool" bad_bot SetEnvIfNoCase user-agent "^Mister\ PiX" bad_bot SetEnvIfNoCase user-agent "^Navroad" bad_bot SetEnvIfNoCase user-agent "^NearSite" bad_bot SetEnvIfNoCase user-agent "^NetAnts" bad_bot SetEnvIfNoCase user-agent "^NetSpider" bad_bot SetEnvIfNoCase user-agent "^Net\ Vampire" bad_bot SetEnvIfNoCase user-agent "^NetZIP" bad_bot SetEnvIfNoCase user-agent "^Octopus" bad_bot SetEnvIfNoCase user-agent "^Offline\ Explorer" bad_bot SetEnvIfNoCase user-agent "^Offline\ Navigator" bad_bot SetEnvIfNoCase User-Agent "^Openfind" bad_bot SetEnvIfNoCase user-agent "^PageGrabber" bad_bot SetEnvIfNoCase user-agent "^Papa\ Foto" bad_bot SetEnvIfNoCase user-agent "^pavuk" bad_bot SetEnvIfNoCase user-agent "^pcBrowser" bad_bot SetEnvIfNoCase user-agent "^RealDownload" bad_bot SetEnvIfNoCase user-agent "^ReGet" bad_bot SetEnvIfNoCase user-agent "^SiteSnagger" bad_bot SetEnvIfNoCase user-agent "^SmartDownload" bad_bot SetEnvIfNoCase User-Agent "^SpankBot" bad_bot SetEnvIfNoCase user-agent "^SuperBot" bad_bot SetEnvIfNoCase user-agent "^SuperHTTP" bad_bot SetEnvIfNoCase user-agent "^Surfbot" bad_bot SetEnvIfNoCase user-agent "^tAkeOut" bad_bot SetEnvIfNoCase user-agent "^Teleport\ Pro" bad_bot SetEnvIfNoCase User-Agent "^Titan" bad_bot SetEnvIfNoCase user-agent "^VoidEYE" bad_bot SetEnvIfNoCase user-agent "^Web\ Image\ Collector" bad_bot SetEnvIfNoCase user-agent "^Web\ Sucker" bad_bot SetEnvIfNoCase user-agent "^WebAuto" bad_bot SetEnvIfNoCase User-Agent "^WebBandit" bad_bot SetEnvIfNoCase user-agent "^WebCopier" bad_bot SetEnvIfNoCase user-agent "^WebFetch" bad_bot SetEnvIfNoCase user-agent "^WebGo\ IS" bad_bot SetEnvIfNoCase user-agent "^WebLeacher" bad_bot SetEnvIfNoCase user-agent "^WebReaper" bad_bot SetEnvIfNoCase user-agent "^WebSauger" bad_bot SetEnvIfNoCase user-agent "^Website\ eXtractor" bad_bot SetEnvIfNoCase user-agent "^Website\ Quester" bad_bot SetEnvIfNoCase User-Agent "^Webster Pro" bad_bot SetEnvIfNoCase user-agent "^WebStripper" bad_bot SetEnvIfNoCase user-agent "^WebWhacker" bad_bot SetEnvIfNoCase user-agent "^WebZIP" bad_bot SetEnvIfNoCase user-agent "^Wget" bad_bot SetEnvIfNoCase user-agent "^Widow" bad_bot SetEnvIfNoCase user-agent "^WWWOFFLE" bad_bot SetEnvIfNoCase user-agent "^Xaldon\ WebSpider" bad_bot SetEnvIfNoCase user-agent "^Zeus" bad_bot <FilesMatch "(.*)"> Order Allow,Deny Allow from all Deny from env=bad_bot </FilesMatch> PHP:
Here's a list I use: ^$ 8484 Boston Project AA Advanced Email Extractor* agdm79@mail.ru ahrefs AhrefsBot AhrefsBot/1.0 aipbot Alexibot Amiga-AWeb/3.4 Anarchie Anonymizer Art-Online ASPSeek asterias attach Attributor autoemailspider autoemailspider_bot backdoor BackDoorbot BackDoorBot BaiDuSpider Baiduspider Baiduspider-image Baiduspider-video Bandit BatchFTP BecomeBot Bigfoot Black Hole Black.Hole BlackWidow BlowFish Bork-edition bot.* BotALot botALot Bot mailto:craftbot@yahoo.com Brutus/AET BuiltBotTough BuiltbotTough Bullseye BunnySlippers Butch__2.1.1 Cegbfeieh cgichk CheeseBot Cheesebot CherryPicker CherryPicker* CherryPicker/1.0_bot CherryPickerElite/1.0_bot CherryPickerSE/1.0_bot ChinaClaw Cityreview combine compatible ; MSIE concealed defense CopyGuard CopyRightCheck core-project/1.0 cosmos crawl Crescent Crescent Internet ToolPak crescent internet toolpak Crescent Internet ToolPak_bot curl/ Custo DataCha0s DataCha0s/2.0 Deepnet Explorer desktopsmiley DigExt Digimarc WebReader DIIbot DISCo DittoSpyder DOC DoCoMo Dotbot Download\ Download Demon Download Ninja Download Ninja 2.0 DownloadsDemon DTS Agent DynaWeb eCatch ecollector EirGrabber EmailCollector EmailCollector/1.0_bot EmailSiphon EmailSiphon_bot EmailWolf EmailWolf 1.00_bot envolk EroCrawler Exabot Express\ ExpresssWebPictures Express WebPictures ExtractorPro Extractorpro ExtractorPro_bot EyeNetIE .*fantomBrowser .*fantomCrew Browser fast Faxobot feedfinder Fetch Fetch API Request fiddler FlashGet flipboardbrowser FooBar/42 Foobot Franklin Locator FrontPage GameBoy, Powered by Nintendo gamingharbor GetRight GetWeb! Gigabot/... Gigabot.* Go-Ahead-Got-It Go!Zilla GrabNet Grafula grub-client grub crawler Harvest heritrix hl_ftien_spider hloader HMView HTMLParser .*HTTP_GET_VARS http_get_vars httplib HTTrack humanlinks ia_archiver iblog ichiro Image\ ImagesStripper ImagesSucker Image Stripper Image Sucker Indy\ indy library Indy Library IndysLibrary InfoNaviRobot InfonaviRobot InterGET Internet\ INTERNET EXPLOITER SUX Internet-exprorer Internet Ninja Internet Ninja x.0 Jakarta Jakarta Commons Java Java/ JBH Agent 2.0 Jennybot JennyBot JetCar JOC\ JOC Web Spider juicyaccess k1b compatible; rss 6.0; windows sot 5.1 security kol k2spider Kenjin Spider Kenjin.Spider Keyword.Density K-Meleon/0.8 larbin Larbin LeechFTP LexiBot Lexibot libcurl libWeb/clsHTTP libwww libwww-perl LinkextractorPro linko LinkScan/8.1a.Unix Linkwalker LinkWalker lwp lwp-request LWP::Simple lwp-trivial Majestic.* Mass\ Mass Downloader Mata.Hari Microsoft Data Access Microsoft Internet Explorer/5.0$ ^Microsoft URL Microsoft.URL Microsoft URL Control Microsoft.URL.Control MIDown\ MIDown tool MIIxpc Missigua Mister\ Mister PiX Mister.PiX MJ12bot moget Morzilla Mosiac 1.* Mozilla/2 Mozilla/3.Mozilla/2.01 Mozilla/3.Mozilla/2.01$ Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )$ Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Maxthon)$ Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1$ Mozilla/(4|5).0$ ^Mozilla/5.0$ Mozilla.*Nessus Mozilla.*NEWT MRSPUTNIK MSIECrawler MS Web Services Client Protocol nameprotect NASA Search NaverBot Navroad NearSite Net\ NetAnts netforex NetMechanic NetSpider Net Vampire NetZIP NeuralBot/0.2 NEWT ActiveX; Win32 NG 1.x (Exalead) NICErsPRO NICErsPRO_bot .*Nikto Nokia-WAPToolkit.* googlebot.*googlebot NPbot NPBot Nutch Octopus Offline Explorer Offline.Explorer Offline Navigator Openbot Openfind Opera/6.01 (Windows ME; U) [en] Opera/9.0 (Windows NT 5.1; U; en) PageGrabber Pagerabber panscient Papa\ Papa Foto pavuk pcBrowser PECL::HTTP picscout plaNETWORK pleasecrawl/1. PMAFind POE-Component-Client poe-component-client POE-Component-Client-HTTP POE::Component::Client::HTTP/ Port Huron Labs Program Shareware Program Shareware 1 Program Shareware 1.0.0 ProPowerbot/2.14 ProPowerBot/2.14 ProWebWalker ProWebWalker psbot psbot/0.1 PycURL PycURL/7.15.5$ QihooBot QuepasaCreep QueryN.Metasearch RealDownload ReGet RepoMonkey RMA Rufus Web Miner .*SAFEXPLORER TL safexplorer tl Scooter searchbot admin@google.com searchestate security scan ^Shockwave Flash sitecheck.internetseer.com SiteSnagger SiteSnagger Slurp SlySearch SmartDownload SMBot Snapbot Snoopy Sogou Sogou.* sohu.* Sosospider Spankbot SpankBot spanner Sphider spider S.T.A.L.K.E.R. stress test SuperBot Superbot SuperHTTP Surfbot SurveyBot suzuran Szukacz/1.4 tAkeOut Teleport TeleportPro teleport pro Teleport Pro Telesoft Telesoft* TestBED.6.3 .*T H A T ' S G O T T A H U R T* The.Intraformant TheNomad .*THIS IS AN EXPLOIT* TightTwatbot TightTwatBot TinEye Titan TJvMultiHttpGrabber Component TMCrawler toCrawl/UrlDispatcher TrackBack/ True_Robot turingos TurnitinBot Turnitinbot/1.5 TurnitinBot/1.5 twengabot TwengaBot Twiceler Twitturly UbiCrawler URLy.Warning User-Agent User-Agent: Mozilla/4.0 vadixbot VB Project VCI Viewzi voideye VoidEYE voyager/1.0 WebAuto WebBandit webbandit WebBandit webbandit WebBandit WebBandit/2.1_bot WebBandit/3.50_bot webbandit/4.00.0_bot WebCapture WebCopier Web Downloader WebEMailExtrac.* WebEMailExtrac* WebEMailExtractor WebEMailExtractor/1.0B_bot WebEnhancer WebFetch WebGo\ WebGo IS Web Image Collector Web.Image.Collector WebLeacher WebmasterWorldForumbot WebmasterWorldForumBot WEBMOLE WebReaper .*WebRoot WebSauger Website eXtractor Website Quester Website.Quester Webster Webster.Pro Webstripper WebStripper Web Sucker WebVulnScan WebWhacker WebZIP WebZip West Wind Internet Protocols Wget wget Wget ^Wget Wget/1.8.2 whatweb/ Widow windows-update-agent Windows-Update-Agent WISEbot WordPress/2.0.2 Wordpress Hash Grabber WWW-Collector-E WWW::Mechanize WWWOFFLE ^www.weblogs.com Xaldon\ Xaldon WebSpider Xenu.* Xenu.*Link.*Sleuth.* xmlrpc exploit* XX Yandex Yandex.* YandexBlogs YandexBot yandexbot YandexMedia YebolBot Yeti Yodao.* Youdao.* YoudaoBot Zao Zealbot Zeus Zeus.*Webster Zeus .*Webster Pro* ZyBORG ZyBorg Code (markup):
Ok is there a way to call another file or you put all these in your htaccess file i am sure you update it from time to time?
I block them server-wide via mod_security using the method outlined here: http://www.puntapirata.com/ModSec-Rules.php You could also add the directives to the httpd.conf file to block them server-wide, but I had problems with that. I'm not fanatical about this list, and only add something if I notice a really active bad bod. Mod_security + CSF take care of banning them permanently, should they (or other bots) misbehave. There are several websites that list the latest bad bots, in case you want to check them from time-to-time and update your list.
Since i put them in my htaccess file probably a bit but on the other hand loads of bots slow your website down. If you have a dedicated server it can put in the firewall i think.
Excessively large htacess files hurt server performance as htacess will have to be processed for every file request. As mentioned a few posts above adding deny rules to your firewall is a better solution performance wise, plus you don't need to create new htaccess rules for every new domain you want to host.