Debt Consolidation - Spanish Property - Turquoise Jewelry - Debt Consolidation - Car Insurance Quotes

PDA

View Full Version : is this robots.txt ok??


saleemster
Sep 9th 2005, 11:04 am
Hi there, please let me know if this is ok as a robots.txt file...

Also, someone said not to use the wildcard at all, as it may confure some bots and cause them not to index the entire site. Is that true???

Lastly, can any1 advise on how to create a corresponding htaccess file to block all of these bots?

And may I see, great forums, with a helpful bunch... :)

Cheers


User-agent: Alexibot
User-agent: Aqua_Products
User-agent: BackDoorBot
User-agent: BackDoorBot/1.0
User-agent: Black.Hole
User-agent: BlackWidow
User-agent: BlowFish
User-agent: BlowFish/1.0
User-agent: Bookmark search tool
User-agent: Bot mailto:craftbot@yahoo.com
User-agent: BotALot
User-agent: BotRightHere
User-agent: BuiltBotTough
User-agent: Bullseye
User-agent: Bullseye/1.0
User-agent: BunnySlippers
User-agent: Cegbfeieh
User-agent: CheeseBot
User-agent: CherryPicker
User-agent: CherryPickerElite/1.0

PS I have cut off some off the disallow list, coz of the 10000 word limit!
User-agent: Grafula
User-agent: HMView
User-agent: HTTrack
User-agent: HTTrack 3.0
User-agent: HTTrack [NC,OR]
User-agent: Harvest
User-agent: Harvest/1.5
User-agent: Image Stripper
User-agent: Image Sucker
User-agent: Indy Library
User-agent: Indy Library [NC,OR]
User-agent: InfoNaviRobot
User-agent: InterGET
User-agent: Internet Ninja
User-agent: Internet Ninja 4.0
User-agent: Internet Ninja 5.0
User-agent: Internet Ninja 6.0
User-agent: Iron33/1.0.2
User-agent: JOC Web Spider
User-agent: JennyBot
User-agent: JetCar
User-agent: Kenjin Spider
User-agent: Kenjin.Spider
User-agent: Keyword Density/0.9
User-agent: Keyword.Density
User-agent: LNSpiderguy
User-agent: LeechFTP
User-agent: LexiBot
User-agent: LinkScan/8.1a Unix
User-agent: LinkScan/8.1a.Unix
User-agent: LinkWalker
User-agent: LinkextractorPro
User-agent: MIDown tool
User-agent: MIIxpc
User-agent: MIIxpc/4.2
User-agent: MSIECrawler
User-agent: Mass Downloader
User-agent: Mass Downloader/2.2
User-agent: Mata Hari
User-agent: Mata.Hari
User-agent: Microsoft URL Control
User-agent: Microsoft URL Control - 5.01.4511
User-agent: Microsoft URL Control - 6.00.8169
User-agent: Microsoft.URL
User-agent: Mister PiX
User-agent: Mister PiX version.dll
User-agent: Mister Pix II 2.01
User-agent: Mister Pix II 2.02a
User-agent: Mister.PiX
User-agent: NICErsPRO
User-agent: NPBot
User-agent: NPbot
User-agent: Navroad
User-agent: NearSite
User-agent: Net Vampire
User-agent: Net Vampire/3.0
User-agent: NetAnts
User-agent: NetAnts/1.10
User-agent: NetAnts/1.23
User-agent: NetAnts/1.24
User-agent: NetAnts/1.25
User-agent: NetMechanic
User-agent: NetSpider
User-agent: NetZIP
User-agent: NetZip Downloader 1.0 Win32(Nov 12 1998)
User-agent: NetZip-Downloader/1.0.62 (Win32; Dec 7 1998)
User-agent: NetZippy+(http://www.innerprise.net/usp-spider.asp)
User-agent: Octopus
User-agent: Offline Explorer
User-agent: Offline Explorer/1.2
User-agent: Offline Explorer/1.4
User-agent: Offline Explorer/1.6
User-agent: Offline Explorer/1.7
User-agent: Offline Explorer/1.9
User-agent: Offline Explorer/2.0
User-agent: Offline Explorer/2.1
User-agent: Offline Explorer/2.3
User-agent: Offline Explorer/2.4
User-agent: Offline Explorer/2.5
User-agent: Offline Navigator
User-agent: Offline.Explorer
User-agent: Openbot
User-agent: Openfind
User-agent: Openfind data gatherer
User-agent: Oracle Ultra Search
User-agent: PageGrabber
User-agent: Papa Foto
User-agent: PerMan
User-agent: ProPowerBot/2.14
User-agent: ProWebWalker
User-agent: Python-urllib
User-agent: QueryN Metasearch
User-agent: QueryN.Metasearch
User-agent: RMA
User-agent: Radiation Retriever 1.1
User-agent: ReGet
User-agent: RealDownload
User-agent: RealDownload/4.0.0.40
User-agent: RealDownload/4.0.0.41
User-agent: RealDownload/4.0.0.42
User-agent: RepoMonkey
User-agent: RepoMonkey Bait & Tackle/v1.01
User-agent: SiteSnagger
User-agent: SlySearch
User-agent: SmartDownload
User-agent: SmartDownload/1.2.76 (Win32; Apr 1 1999)
User-agent: SmartDownload/1.2.77 (Win32; Aug 17 1999)
User-agent: SmartDownload/1.2.77 (Win32; Feb 1 2000)
User-agent: SmartDownload/1.2.77 (Win32; Jun 19 2001)
User-agent: SpankBot
User-agent: Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux
User-agent: SuperBot
User-agent: SuperBot/3.0 (Win32)
User-agent: SuperBot/3.1 (Win32)
User-agent: SuperHTTP
User-agent: SuperHTTP/1.0
User-agent: Surfbot
User-agent: Szukacz/1.4
User-agent: Teleport
User-agent: Teleport Pro
User-agent: Teleport Pro/1.29
User-agent: Teleport Pro/1.29.1590
User-agent: TeleportPro
User-agent: Telesoft
User-agent: The Intraformant
User-agent: The.Intraformant
User-agent: TheNomad
User-agent: TightTwatBot
User-agent: Titan
User-agent: True_Robot
User-agent: True_Robot/1.0
User-agent: TurnitinBot
User-agent: TurnitinBot/1.5
User-agent: URL Control
User-agent: URL_Spider_Pro
User-agent: URLy Warning
User-agent: URLy.Warning
User-agent: VCI
User-agent: VCI WebViewer VCI WebViewer Win32
User-agent: VoidEYE
User-agent: WWW-Collector-E
User-agent: WWWOFFLE
User-agent: Web Image Collector
User-agent: WebEMailExtrac.*
User-agent: WebEnhancer
User-agent: WebFetch
User-agent: WebGo IS
User-agent: WebLeacher
User-agent: WebReaper
User-agent: WebReaper [info@webreaper.net]
User-agent: WebReaper [webreaper@otway.com]
User-agent: WebReaper v9.1 - www.otway.com/webreaper
User-agent: WebReaper v9.7 - www.webreaper.net
User-agent: WebReaper v9.8 - www.webreaper.net
User-agent: WebReaper vWebReaper v7.3 - www,otway.com/webreaper
User-agent: WebSauger
User-agent: WebSauger 1.20b
User-agent: WebSauger 1.20j
User-agent: WebSauger 1.20k
User-agent: WebStripper
User-agent: WebStripper/2.03
User-agent: WebStripper/2.10
User-agent: WebStripper/2.12
User-agent: WebStripper/2.13
User-agent: WebStripper/2.15
User-agent: WebStripper/2.16
User-agent: WebStripper/2.19
User-agent: WebWhacker
User-agent: WebZIP
User-agent: WebZIP/2.75 (http://www.spidersoft.com)
User-agent: WebZIP/3.65 (http://www.spidersoft.com)
User-agent: WebZIP/3.80 (http://www.spidersoft.com)
User-agent: WebZIP/4.0 (http://www.spidersoft.com)
User-agent: WebZIP/4.1 (http://www.spidersoft.com)
User-agent: WebZIP/4.21
User-agent: WebZIP/4.21 (http://www.spidersoft.com)
User-agent: WebZIP/5.0
User-agent: WebZIP/5.0 (http://www.spidersoft.com)
User-agent: WebZIP/5.0 PR1 (http://www.spidersoft.com)
User-agent: WebZip
User-agent: WebZip/4.0
User-agent: WebmasterWorldForumBot
User-agent: Website Quester
User-agent: Website Quester - www.asona.org
User-agent: Website Quester - www.esalesbiz.com/extra/
User-agent: Website eXtractor
User-agent: Website eXtractor (http://www.asona.org)
User-agent: Website.Quester
User-agent: Webster Pro
User-agent: Webster.Pro
User-agent: Wget
User-agent: Wget/1.5.2
User-agent: Wget/1.5.3
User-agent: Wget/1.6
User-agent: Wget/1.7
User-agent: Wget/1.8
User-agent: Wget/1.8.1
User-agent: Wget/1.8.1+cvs
User-agent: Wget/1.8.2
User-agent: Wget/1.9-beta
User-agent: Widow
User-agent: Xaldon WebSpider
User-agent: Xaldon WebSpider 2.5.b3
User-agent: Xenu's
User-agent: Xenu's Link Sleuth 1.1c
User-agent: Zeus
User-agent: Zeus 11389 Webster Pro V2.9 Win32
User-agent: Zeus 11652 Webster Pro V2.9 Win32
User-agent: Zeus 18018 Webster Pro V2.9 Win32
User-agent: Zeus 26378 Webster Pro V2.9 Win32
User-agent: Zeus 30747 Webster Pro V2.9 Win32
User-agent: Zeus 32297 Webster Pro V2.9 Win32
User-agent: Zeus 39206 Webster Pro V2.9 Win32
User-agent: Zeus 41641 Webster Pro V2.9 Win32
User-agent: Zeus 44238 Webster Pro V2.9 Win32
User-agent: Zeus 51070 Webster Pro V2.9 Win32
User-agent: Zeus 51674 Webster Pro V2.9 Win32
User-agent: Zeus 51837 Webster Pro V2.9 Win32
User-agent: Zeus 63567 Webster Pro V2.9 Win32
User-agent: Zeus 6694 Webster Pro V2.9 Win32
User-agent: Zeus 71129 Webster Pro V2.9 Win32
User-agent: Zeus 82016 Webster Pro V2.9 Win32
User-agent: Zeus 82900 Webster Pro V2.9 Win32
User-agent: Zeus 84842 Webster Pro V2.9 Win32
User-agent: Zeus 90872 Webster Pro V2.9 Win32
User-agent: Zeus 94934 Webster Pro V2.9 Win32
User-agent: Zeus 95245 Webster Pro V2.9 Win32
User-agent: Zeus 95351 Webster Pro V2.9 Win32
User-agent: Zeus 97371 Webster Pro V2.9 Win32
User-agent: Zeus Link Scout
User-agent: asterias
User-agent: b2w/0.1
User-agent: cosmos
User-agent: eCatch
User-agent: eCatch/3.0
User-agent: hloader
User-agent: httplib
User-agent: humanlinks
User-agent: ia_archiver
User-agent: larbin
User-agent: larbin (samualt9@bigfoot.com)
User-agent: larbin samualt9@bigfoot.com
User-agent: larbin_2.6.2 (kabura@sushi.com)
User-agent: larbin_2.6.2 (larbin2.6.2@unspecified.mail)
User-agent: larbin_2.6.2 (listonATccDOTgatechDOTedu)
User-agent: larbin_2.6.2 (vitalbox1@hotmail.com)
User-agent: larbin_2.6.2 kabura@sushi.com
User-agent: larbin_2.6.2 larbin2.6.2@unspecified.mail
User-agent: larbin_2.6.2 larbin@correa.org
User-agent: larbin_2.6.2 listonATccDOTgatechDOTedu
User-agent: larbin_2.6.2 vitalbox1@hotmail.com
User-agent: libWeb/clsHTTP
User-agent: lwp-trivial
User-agent: lwp-trivial/1.34
User-agent: moget
User-agent: moget/2.1
User-agent: pavuk
User-agent: pcBrowser
User-agent: psbot
User-agent: searchpreview
User-agent: spanner
User-agent: suzuran
User-agent: tAkeOut
User-agent: toCrawl/UrlDispatcher
User-agent: turingos
User-agent: webfetch/2.1.0
User-agent: wget
Disallow: /

User-agent: *
Disallow: /private/
Disallow: /images/
Disallow: /affiliate/
Disallow: /cgi-bin/
Disallow: /include/
Disallow: /webalizer/
Disallow: /modlogan/
Disallow: /cp/

INV
Sep 9th 2005, 12:44 pm
1.You can do the checking for yourself, here is a TOOL to do so http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

2. You should really consider removing most of these things and going with the htaccess route. The reason would be is, why would a spambot or a leechbot even read a Robots.TXT :)

3. I found you some forum posts to learn about .htaccess to block the bots like you asked. I used google to find these


(READ ALL)
A: http://www.webmasterworld.com/forum13/687.htm
B: http://www.webmasterworld.com/forum92/205.htm
C: http://www.webmasterworld.com/forum92/413.htm

iskandar
Sep 15th 2005, 8:33 am
Why don't you just create robot traps?
http://www.fleiner.com/bots/

Personally I haven't been visited badly by these bad robots yet, so I do not know if the robot trap works. You have to wait for expert reply on this matter ..

minstrel
Sep 17th 2005, 11:00 pm
User-agent: Xenu's
User-agent: Xenu's Link Sleuth 1.1c

This is a very bad idea.

Xenu is a popular (and excellent) freeware links checker. I use it on my site to check the validity of links from my pages to other pages outside my site. If you block Xenu, it will report the link as an error -- chances are many webmasters using Xenu will then delete the link to your site and you will have just lost a potentially valuable bit of PR.

Beyond that, I agree with INV: not everything on that list is a bad bot but most of the really bad ones aren't going to even read your robots.txt file so you're wasting your time (and that of the good bots).

Delete everything above

User-agent: *
Disallow: /private/
Disallow: /images/
Disallow: /affiliate/
Disallow: /cgi-bin/
Disallow: /include/
Disallow: /webalizer/
Disallow: /modlogan/
Disallow: /cp/

Repo
Jun 24th 2006, 6:28 pm
All of those links don't work.


(READ ALL)
A: http://www.webmasterworld.com/forum13/687.htm
B: http://www.webmasterworld.com/forum92/205.htm
C: http://www.webmasterworld.com/forum92/413.htm

And http://www.searchengineworld.com/cgi-bin/robotcheck.cgi points to WebmasterWorld :confused:

I registered to WebmasterWorld but it does work either.

ottodo
Aug 22nd 2006, 6:32 pm
This is really important thread, ain't?

MLDesigners
Aug 29th 2006, 7:29 am
searchengineworld.com/cgi-bin/robotcheck.cgi works for me...

Try again, perhaps was only down for a while