Hi there, please let me know if this is ok as a robots.txt file... Also, someone said not to use the wildcard at all, as it may confure some bots and cause them not to index the entire site. Is that true??? Lastly, can any1 advise on how to create a corresponding htaccess file to block all of these bots? And may I see, great forums, with a helpful bunch... Cheers User-agent: Alexibot User-agent: Aqua_Products User-agent: BackDoorBot User-agent: BackDoorBot/1.0 User-agent: Black.Hole User-agent: BlackWidow User-agent: BlowFish User-agent: BlowFish/1.0 User-agent: Bookmark search tool User-agent: Bot mailto:craftbot@yahoo.com User-agent: BotALot User-agent: BotRightHere User-agent: BuiltBotTough User-agent: Bullseye User-agent: Bullseye/1.0 User-agent: BunnySlippers User-agent: Cegbfeieh User-agent: CheeseBot User-agent: CherryPicker User-agent: CherryPickerElite/1.0 PS I have cut off some off the disallow list, coz of the 10000 word limit! User-agent: Grafula User-agent: HMView User-agent: HTTrack User-agent: HTTrack 3.0 User-agent: HTTrack [NC,OR] User-agent: Harvest User-agent: Harvest/1.5 User-agent: Image Stripper User-agent: Image Sucker User-agent: Indy Library User-agent: Indy Library [NC,OR] User-agent: InfoNaviRobot User-agent: InterGET User-agent: Internet Ninja User-agent: Internet Ninja 4.0 User-agent: Internet Ninja 5.0 User-agent: Internet Ninja 6.0 User-agent: Iron33/1.0.2 User-agent: JOC Web Spider User-agent: JennyBot User-agent: JetCar User-agent: Kenjin Spider User-agent: Kenjin.Spider User-agent: Keyword Density/0.9 User-agent: Keyword.Density User-agent: LNSpiderguy User-agent: LeechFTP User-agent: LexiBot User-agent: LinkScan/8.1a Unix User-agent: LinkScan/8.1a.Unix User-agent: LinkWalker User-agent: LinkextractorPro User-agent: MIDown tool User-agent: MIIxpc User-agent: MIIxpc/4.2 User-agent: MSIECrawler User-agent: Mass Downloader User-agent: Mass Downloader/2.2 User-agent: Mata Hari User-agent: Mata.Hari User-agent: Microsoft URL Control User-agent: Microsoft URL Control - 5.01.4511 User-agent: Microsoft URL Control - 6.00.8169 User-agent: Microsoft.URL User-agent: Mister PiX User-agent: Mister PiX version.dll User-agent: Mister Pix II 2.01 User-agent: Mister Pix II 2.02a User-agent: Mister.PiX User-agent: NICErsPRO User-agent: NPBot User-agent: NPbot User-agent: Navroad User-agent: NearSite User-agent: Net Vampire User-agent: Net Vampire/3.0 User-agent: NetAnts User-agent: NetAnts/1.10 User-agent: NetAnts/1.23 User-agent: NetAnts/1.24 User-agent: NetAnts/1.25 User-agent: NetMechanic User-agent: NetSpider User-agent: NetZIP User-agent: NetZip Downloader 1.0 Win32(Nov 12 1998) User-agent: NetZip-Downloader/1.0.62 (Win32; Dec 7 1998) User-agent: NetZippy+(http://www.innerprise.net/usp-spider.asp) User-agent: Octopus User-agent: Offline Explorer User-agent: Offline Explorer/1.2 User-agent: Offline Explorer/1.4 User-agent: Offline Explorer/1.6 User-agent: Offline Explorer/1.7 User-agent: Offline Explorer/1.9 User-agent: Offline Explorer/2.0 User-agent: Offline Explorer/2.1 User-agent: Offline Explorer/2.3 User-agent: Offline Explorer/2.4 User-agent: Offline Explorer/2.5 User-agent: Offline Navigator User-agent: Offline.Explorer User-agent: Openbot User-agent: Openfind User-agent: Openfind data gatherer User-agent: Oracle Ultra Search User-agent: PageGrabber User-agent: Papa Foto User-agent: PerMan User-agent: ProPowerBot/2.14 User-agent: ProWebWalker User-agent: Python-urllib User-agent: QueryN Metasearch User-agent: QueryN.Metasearch User-agent: RMA User-agent: Radiation Retriever 1.1 User-agent: ReGet User-agent: RealDownload User-agent: RealDownload/4.0.0.40 User-agent: RealDownload/4.0.0.41 User-agent: RealDownload/4.0.0.42 User-agent: RepoMonkey User-agent: RepoMonkey Bait & Tackle/v1.01 User-agent: SiteSnagger User-agent: SlySearch User-agent: SmartDownload User-agent: SmartDownload/1.2.76 (Win32; Apr 1 1999) User-agent: SmartDownload/1.2.77 (Win32; Aug 17 1999) User-agent: SmartDownload/1.2.77 (Win32; Feb 1 2000) User-agent: SmartDownload/1.2.77 (Win32; Jun 19 2001) User-agent: SpankBot User-agent: Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux User-agent: SuperBot User-agent: SuperBot/3.0 (Win32) User-agent: SuperBot/3.1 (Win32) User-agent: SuperHTTP User-agent: SuperHTTP/1.0 User-agent: Surfbot User-agent: Szukacz/1.4 User-agent: Teleport User-agent: Teleport Pro User-agent: Teleport Pro/1.29 User-agent: Teleport Pro/1.29.1590 User-agent: TeleportPro User-agent: Telesoft User-agent: The Intraformant User-agent: The.Intraformant User-agent: TheNomad User-agent: TightTwatBot User-agent: Titan User-agent: True_Robot User-agent: True_Robot/1.0 User-agent: TurnitinBot User-agent: TurnitinBot/1.5 User-agent: URL Control User-agent: URL_Spider_Pro User-agent: URLy Warning User-agent: URLy.Warning User-agent: VCI User-agent: VCI WebViewer VCI WebViewer Win32 User-agent: VoidEYE User-agent: WWW-Collector-E User-agent: WWWOFFLE User-agent: Web Image Collector User-agent: WebEMailExtrac.* User-agent: WebEnhancer User-agent: WebFetch User-agent: WebGo IS User-agent: WebLeacher User-agent: WebReaper User-agent: WebReaper [info@webreaper.net] User-agent: WebReaper [webreaper@otway.com] User-agent: WebReaper v9.1 - www.otway.com/webreaper User-agent: WebReaper v9.7 - www.webreaper.net User-agent: WebReaper v9.8 - www.webreaper.net User-agent: WebReaper vWebReaper v7.3 - www,otway.com/webreaper User-agent: WebSauger User-agent: WebSauger 1.20b User-agent: WebSauger 1.20j User-agent: WebSauger 1.20k User-agent: WebStripper User-agent: WebStripper/2.03 User-agent: WebStripper/2.10 User-agent: WebStripper/2.12 User-agent: WebStripper/2.13 User-agent: WebStripper/2.15 User-agent: WebStripper/2.16 User-agent: WebStripper/2.19 User-agent: WebWhacker User-agent: WebZIP User-agent: WebZIP/2.75 (http://www.spidersoft.com) User-agent: WebZIP/3.65 (http://www.spidersoft.com) User-agent: WebZIP/3.80 (http://www.spidersoft.com) User-agent: WebZIP/4.0 (http://www.spidersoft.com) User-agent: WebZIP/4.1 (http://www.spidersoft.com) User-agent: WebZIP/4.21 User-agent: WebZIP/4.21 (http://www.spidersoft.com) User-agent: WebZIP/5.0 User-agent: WebZIP/5.0 (http://www.spidersoft.com) User-agent: WebZIP/5.0 PR1 (http://www.spidersoft.com) User-agent: WebZip User-agent: WebZip/4.0 User-agent: WebmasterWorldForumBot User-agent: Website Quester User-agent: Website Quester - www.asona.org User-agent: Website Quester - www.esalesbiz.com/extra/ User-agent: Website eXtractor User-agent: Website eXtractor (http://www.asona.org) User-agent: Website.Quester User-agent: Webster Pro User-agent: Webster.Pro User-agent: Wget User-agent: Wget/1.5.2 User-agent: Wget/1.5.3 User-agent: Wget/1.6 User-agent: Wget/1.7 User-agent: Wget/1.8 User-agent: Wget/1.8.1 User-agent: Wget/1.8.1+cvs User-agent: Wget/1.8.2 User-agent: Wget/1.9-beta User-agent: Widow User-agent: Xaldon WebSpider User-agent: Xaldon WebSpider 2.5.b3 User-agent: Xenu's User-agent: Xenu's Link Sleuth 1.1c User-agent: Zeus User-agent: Zeus 11389 Webster Pro V2.9 Win32 User-agent: Zeus 11652 Webster Pro V2.9 Win32 User-agent: Zeus 18018 Webster Pro V2.9 Win32 User-agent: Zeus 26378 Webster Pro V2.9 Win32 User-agent: Zeus 30747 Webster Pro V2.9 Win32 User-agent: Zeus 32297 Webster Pro V2.9 Win32 User-agent: Zeus 39206 Webster Pro V2.9 Win32 User-agent: Zeus 41641 Webster Pro V2.9 Win32 User-agent: Zeus 44238 Webster Pro V2.9 Win32 User-agent: Zeus 51070 Webster Pro V2.9 Win32 User-agent: Zeus 51674 Webster Pro V2.9 Win32 User-agent: Zeus 51837 Webster Pro V2.9 Win32 User-agent: Zeus 63567 Webster Pro V2.9 Win32 User-agent: Zeus 6694 Webster Pro V2.9 Win32 User-agent: Zeus 71129 Webster Pro V2.9 Win32 User-agent: Zeus 82016 Webster Pro V2.9 Win32 User-agent: Zeus 82900 Webster Pro V2.9 Win32 User-agent: Zeus 84842 Webster Pro V2.9 Win32 User-agent: Zeus 90872 Webster Pro V2.9 Win32 User-agent: Zeus 94934 Webster Pro V2.9 Win32 User-agent: Zeus 95245 Webster Pro V2.9 Win32 User-agent: Zeus 95351 Webster Pro V2.9 Win32 User-agent: Zeus 97371 Webster Pro V2.9 Win32 User-agent: Zeus Link Scout User-agent: asterias User-agent: b2w/0.1 User-agent: cosmos User-agent: eCatch User-agent: eCatch/3.0 User-agent: hloader User-agent: httplib User-agent: humanlinks User-agent: ia_archiver User-agent: larbin User-agent: larbin (samualt9@bigfoot.com) User-agent: larbin User-agent: larbin_2.6.2 (kabura@sushi.com) User-agent: larbin_2.6.2 (larbin2.6.2@unspecified.mail) User-agent: larbin_2.6.2 (listonATccDOTgatechDOTedu) User-agent: larbin_2.6.2 (vitalbox1@hotmail.com) User-agent: larbin_2.6.2 User-agent: larbin_2.6.2 User-agent: larbin_2.6.2 User-agent: larbin_2.6.2 listonATccDOTgatechDOTedu User-agent: larbin_2.6.2 User-agent: libWeb/clsHTTP User-agent: lwp-trivial User-agent: lwp-trivial/1.34 User-agent: moget User-agent: moget/2.1 User-agent: pavuk User-agent: pcBrowser User-agent: psbot User-agent: searchpreview User-agent: spanner User-agent: suzuran User-agent: tAkeOut User-agent: toCrawl/UrlDispatcher User-agent: turingos User-agent: webfetch/2.1.0 User-agent: wget Disallow: / User-agent: * Disallow: /private/ Disallow: /images/ Disallow: /affiliate/ Disallow: /cgi-bin/ Disallow: /include/ Disallow: /webalizer/ Disallow: /modlogan/ Disallow: /cp/
1.You can do the checking for yourself, here is a TOOL to do so http://www.searchengineworld.com/cgi-bin/robotcheck.cgi 2. You should really consider removing most of these things and going with the htaccess route. The reason would be is, why would a spambot or a leechbot even read a Robots.TXT 3. I found you some forum posts to learn about .htaccess to block the bots like you asked. I used google to find these (READ ALL) A: http://www.webmasterworld.com/forum13/687.htm B: http://www.webmasterworld.com/forum92/205.htm C: http://www.webmasterworld.com/forum92/413.htm
Why don't you just create robot traps? http://www.fleiner.com/bots/ Personally I haven't been visited badly by these bad robots yet, so I do not know if the robot trap works. You have to wait for expert reply on this matter ..
This is a very bad idea. Xenu is a popular (and excellent) freeware links checker. I use it on my site to check the validity of links from my pages to other pages outside my site. If you block Xenu, it will report the link as an error -- chances are many webmasters using Xenu will then delete the link to your site and you will have just lost a potentially valuable bit of PR. Beyond that, I agree with INV: not everything on that list is a bad bot but most of the really bad ones aren't going to even read your robots.txt file so you're wasting your time (and that of the good bots). Delete everything above User-agent: * Disallow: /private/ Disallow: /images/ Disallow: /affiliate/ Disallow: /cgi-bin/ Disallow: /include/ Disallow: /webalizer/ Disallow: /modlogan/ Disallow: /cp/ Code (markup):
All of those links don't work. And http://www.searchengineworld.com/cgi-bin/robotcheck.cgi points to WebmasterWorld I registered to WebmasterWorld but it does work either.
searchengineworld.com/cgi-bin/robotcheck.cgi works for me... Try again, perhaps was only down for a while