is this robots.txt ok??

Discussion in 'robots.txt' started by SaleemY, Sep 9, 2005.

  1. #1
    Hi there, please let me know if this is ok as a robots.txt file...

    Also, someone said not to use the wildcard at all, as it may confure some bots and cause them not to index the entire site. Is that true???

    Lastly, can any1 advise on how to create a corresponding htaccess file to block all of these bots?

    And may I see, great forums, with a helpful bunch... :)

    Cheers


    User-agent: Alexibot
    User-agent: Aqua_Products
    User-agent: BackDoorBot
    User-agent: BackDoorBot/1.0
    User-agent: Black.Hole
    User-agent: BlackWidow
    User-agent: BlowFish
    User-agent: BlowFish/1.0
    User-agent: Bookmark search tool
    User-agent: Bot mailto:craftbot@yahoo.com
    User-agent: BotALot
    User-agent: BotRightHere
    User-agent: BuiltBotTough
    User-agent: Bullseye
    User-agent: Bullseye/1.0
    User-agent: BunnySlippers
    User-agent: Cegbfeieh
    User-agent: CheeseBot
    User-agent: CherryPicker
    User-agent: CherryPickerElite/1.0

    PS I have cut off some off the disallow list, coz of the 10000 word limit!
    User-agent: Grafula
    User-agent: HMView
    User-agent: HTTrack
    User-agent: HTTrack 3.0
    User-agent: HTTrack [NC,OR]
    User-agent: Harvest
    User-agent: Harvest/1.5
    User-agent: Image Stripper
    User-agent: Image Sucker
    User-agent: Indy Library
    User-agent: Indy Library [NC,OR]
    User-agent: InfoNaviRobot
    User-agent: InterGET
    User-agent: Internet Ninja
    User-agent: Internet Ninja 4.0
    User-agent: Internet Ninja 5.0
    User-agent: Internet Ninja 6.0
    User-agent: Iron33/1.0.2
    User-agent: JOC Web Spider
    User-agent: JennyBot
    User-agent: JetCar
    User-agent: Kenjin Spider
    User-agent: Kenjin.Spider
    User-agent: Keyword Density/0.9
    User-agent: Keyword.Density
    User-agent: LNSpiderguy
    User-agent: LeechFTP
    User-agent: LexiBot
    User-agent: LinkScan/8.1a Unix
    User-agent: LinkScan/8.1a.Unix
    User-agent: LinkWalker
    User-agent: LinkextractorPro
    User-agent: MIDown tool
    User-agent: MIIxpc
    User-agent: MIIxpc/4.2
    User-agent: MSIECrawler
    User-agent: Mass Downloader
    User-agent: Mass Downloader/2.2
    User-agent: Mata Hari
    User-agent: Mata.Hari
    User-agent: Microsoft URL Control
    User-agent: Microsoft URL Control - 5.01.4511
    User-agent: Microsoft URL Control - 6.00.8169
    User-agent: Microsoft.URL
    User-agent: Mister PiX
    User-agent: Mister PiX version.dll
    User-agent: Mister Pix II 2.01
    User-agent: Mister Pix II 2.02a
    User-agent: Mister.PiX
    User-agent: NICErsPRO
    User-agent: NPBot
    User-agent: NPbot
    User-agent: Navroad
    User-agent: NearSite
    User-agent: Net Vampire
    User-agent: Net Vampire/3.0
    User-agent: NetAnts
    User-agent: NetAnts/1.10
    User-agent: NetAnts/1.23
    User-agent: NetAnts/1.24
    User-agent: NetAnts/1.25
    User-agent: NetMechanic
    User-agent: NetSpider
    User-agent: NetZIP
    User-agent: NetZip Downloader 1.0 Win32(Nov 12 1998)
    User-agent: NetZip-Downloader/1.0.62 (Win32; Dec 7 1998)
    User-agent: NetZippy+(http://www.innerprise.net/usp-spider.asp)
    User-agent: Octopus
    User-agent: Offline Explorer
    User-agent: Offline Explorer/1.2
    User-agent: Offline Explorer/1.4
    User-agent: Offline Explorer/1.6
    User-agent: Offline Explorer/1.7
    User-agent: Offline Explorer/1.9
    User-agent: Offline Explorer/2.0
    User-agent: Offline Explorer/2.1
    User-agent: Offline Explorer/2.3
    User-agent: Offline Explorer/2.4
    User-agent: Offline Explorer/2.5
    User-agent: Offline Navigator
    User-agent: Offline.Explorer
    User-agent: Openbot
    User-agent: Openfind
    User-agent: Openfind data gatherer
    User-agent: Oracle Ultra Search
    User-agent: PageGrabber
    User-agent: Papa Foto
    User-agent: PerMan
    User-agent: ProPowerBot/2.14
    User-agent: ProWebWalker
    User-agent: Python-urllib
    User-agent: QueryN Metasearch
    User-agent: QueryN.Metasearch
    User-agent: RMA
    User-agent: Radiation Retriever 1.1
    User-agent: ReGet
    User-agent: RealDownload
    User-agent: RealDownload/4.0.0.40
    User-agent: RealDownload/4.0.0.41
    User-agent: RealDownload/4.0.0.42
    User-agent: RepoMonkey
    User-agent: RepoMonkey Bait & Tackle/v1.01
    User-agent: SiteSnagger
    User-agent: SlySearch
    User-agent: SmartDownload
    User-agent: SmartDownload/1.2.76 (Win32; Apr 1 1999)
    User-agent: SmartDownload/1.2.77 (Win32; Aug 17 1999)
    User-agent: SmartDownload/1.2.77 (Win32; Feb 1 2000)
    User-agent: SmartDownload/1.2.77 (Win32; Jun 19 2001)
    User-agent: SpankBot
    User-agent: Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux
    User-agent: SuperBot
    User-agent: SuperBot/3.0 (Win32)
    User-agent: SuperBot/3.1 (Win32)
    User-agent: SuperHTTP
    User-agent: SuperHTTP/1.0
    User-agent: Surfbot
    User-agent: Szukacz/1.4
    User-agent: Teleport
    User-agent: Teleport Pro
    User-agent: Teleport Pro/1.29
    User-agent: Teleport Pro/1.29.1590
    User-agent: TeleportPro
    User-agent: Telesoft
    User-agent: The Intraformant
    User-agent: The.Intraformant
    User-agent: TheNomad
    User-agent: TightTwatBot
    User-agent: Titan
    User-agent: True_Robot
    User-agent: True_Robot/1.0
    User-agent: TurnitinBot
    User-agent: TurnitinBot/1.5
    User-agent: URL Control
    User-agent: URL_Spider_Pro
    User-agent: URLy Warning
    User-agent: URLy.Warning
    User-agent: VCI
    User-agent: VCI WebViewer VCI WebViewer Win32
    User-agent: VoidEYE
    User-agent: WWW-Collector-E
    User-agent: WWWOFFLE
    User-agent: Web Image Collector
    User-agent: WebEMailExtrac.*
    User-agent: WebEnhancer
    User-agent: WebFetch
    User-agent: WebGo IS
    User-agent: WebLeacher
    User-agent: WebReaper
    User-agent: WebReaper [info@webreaper.net]
    User-agent: WebReaper [webreaper@otway.com]
    User-agent: WebReaper v9.1 - www.otway.com/webreaper
    User-agent: WebReaper v9.7 - www.webreaper.net
    User-agent: WebReaper v9.8 - www.webreaper.net
    User-agent: WebReaper vWebReaper v7.3 - www,otway.com/webreaper
    User-agent: WebSauger
    User-agent: WebSauger 1.20b
    User-agent: WebSauger 1.20j
    User-agent: WebSauger 1.20k
    User-agent: WebStripper
    User-agent: WebStripper/2.03
    User-agent: WebStripper/2.10
    User-agent: WebStripper/2.12
    User-agent: WebStripper/2.13
    User-agent: WebStripper/2.15
    User-agent: WebStripper/2.16
    User-agent: WebStripper/2.19
    User-agent: WebWhacker
    User-agent: WebZIP
    User-agent: WebZIP/2.75 (http://www.spidersoft.com)
    User-agent: WebZIP/3.65 (http://www.spidersoft.com)
    User-agent: WebZIP/3.80 (http://www.spidersoft.com)
    User-agent: WebZIP/4.0 (http://www.spidersoft.com)
    User-agent: WebZIP/4.1 (http://www.spidersoft.com)
    User-agent: WebZIP/4.21
    User-agent: WebZIP/4.21 (http://www.spidersoft.com)
    User-agent: WebZIP/5.0
    User-agent: WebZIP/5.0 (http://www.spidersoft.com)
    User-agent: WebZIP/5.0 PR1 (http://www.spidersoft.com)
    User-agent: WebZip
    User-agent: WebZip/4.0
    User-agent: WebmasterWorldForumBot
    User-agent: Website Quester
    User-agent: Website Quester - www.asona.org
    User-agent: Website Quester - www.esalesbiz.com/extra/
    User-agent: Website eXtractor
    User-agent: Website eXtractor (http://www.asona.org)
    User-agent: Website.Quester
    User-agent: Webster Pro
    User-agent: Webster.Pro
    User-agent: Wget
    User-agent: Wget/1.5.2
    User-agent: Wget/1.5.3
    User-agent: Wget/1.6
    User-agent: Wget/1.7
    User-agent: Wget/1.8
    User-agent: Wget/1.8.1
    User-agent: Wget/1.8.1+cvs
    User-agent: Wget/1.8.2
    User-agent: Wget/1.9-beta
    User-agent: Widow
    User-agent: Xaldon WebSpider
    User-agent: Xaldon WebSpider 2.5.b3
    User-agent: Xenu's
    User-agent: Xenu's Link Sleuth 1.1c
    User-agent: Zeus
    User-agent: Zeus 11389 Webster Pro V2.9 Win32
    User-agent: Zeus 11652 Webster Pro V2.9 Win32
    User-agent: Zeus 18018 Webster Pro V2.9 Win32
    User-agent: Zeus 26378 Webster Pro V2.9 Win32
    User-agent: Zeus 30747 Webster Pro V2.9 Win32
    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    User-agent: Zeus 39206 Webster Pro V2.9 Win32
    User-agent: Zeus 41641 Webster Pro V2.9 Win32
    User-agent: Zeus 44238 Webster Pro V2.9 Win32
    User-agent: Zeus 51070 Webster Pro V2.9 Win32
    User-agent: Zeus 51674 Webster Pro V2.9 Win32
    User-agent: Zeus 51837 Webster Pro V2.9 Win32
    User-agent: Zeus 63567 Webster Pro V2.9 Win32
    User-agent: Zeus 6694 Webster Pro V2.9 Win32
    User-agent: Zeus 71129 Webster Pro V2.9 Win32
    User-agent: Zeus 82016 Webster Pro V2.9 Win32
    User-agent: Zeus 82900 Webster Pro V2.9 Win32
    User-agent: Zeus 84842 Webster Pro V2.9 Win32
    User-agent: Zeus 90872 Webster Pro V2.9 Win32
    User-agent: Zeus 94934 Webster Pro V2.9 Win32
    User-agent: Zeus 95245 Webster Pro V2.9 Win32
    User-agent: Zeus 95351 Webster Pro V2.9 Win32
    User-agent: Zeus 97371 Webster Pro V2.9 Win32
    User-agent: Zeus Link Scout
    User-agent: asterias
    User-agent: b2w/0.1
    User-agent: cosmos
    User-agent: eCatch
    User-agent: eCatch/3.0
    User-agent: hloader
    User-agent: httplib
    User-agent: humanlinks
    User-agent: ia_archiver
    User-agent: larbin
    User-agent: larbin (samualt9@bigfoot.com)
    User-agent: larbin
    User-agent: larbin_2.6.2 (kabura@sushi.com)
    User-agent: larbin_2.6.2 (larbin2.6.2@unspecified.mail)
    User-agent: larbin_2.6.2 (listonATccDOTgatechDOTedu)
    User-agent: larbin_2.6.2 (vitalbox1@hotmail.com)
    User-agent: larbin_2.6.2
    User-agent: larbin_2.6.2
    User-agent: larbin_2.6.2
    User-agent: larbin_2.6.2 listonATccDOTgatechDOTedu
    User-agent: larbin_2.6.2
    User-agent: libWeb/clsHTTP
    User-agent: lwp-trivial
    User-agent: lwp-trivial/1.34
    User-agent: moget
    User-agent: moget/2.1
    User-agent: pavuk
    User-agent: pcBrowser
    User-agent: psbot
    User-agent: searchpreview
    User-agent: spanner
    User-agent: suzuran
    User-agent: tAkeOut
    User-agent: toCrawl/UrlDispatcher
    User-agent: turingos
    User-agent: webfetch/2.1.0
    User-agent: wget
    Disallow: /

    User-agent: *
    Disallow: /private/
    Disallow: /images/
    Disallow: /affiliate/
    Disallow: /cgi-bin/
    Disallow: /include/
    Disallow: /webalizer/
    Disallow: /modlogan/
    Disallow: /cp/
     
    SaleemY, Sep 9, 2005 IP
  2. INV

    INV Peon

    Messages:
    1,686
    Likes Received:
    101
    Best Answers:
    0
    Trophy Points:
    0
    #2
    1.You can do the checking for yourself, here is a TOOL to do so http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

    2. You should really consider removing most of these things and going with the htaccess route. The reason would be is, why would a spambot or a leechbot even read a Robots.TXT :)

    3. I found you some forum posts to learn about .htaccess to block the bots like you asked. I used google to find these


    (READ ALL)
    A: http://www.webmasterworld.com/forum13/687.htm
    B: http://www.webmasterworld.com/forum92/205.htm
    C: http://www.webmasterworld.com/forum92/413.htm
     
    INV, Sep 9, 2005 IP
  3. iskandar

    iskandar Well-Known Member

    Messages:
    897
    Likes Received:
    83
    Best Answers:
    0
    Trophy Points:
    148
    #3
    Why don't you just create robot traps?
    http://www.fleiner.com/bots/

    Personally I haven't been visited badly by these bad robots yet, so I do not know if the robot trap works. You have to wait for expert reply on this matter ..
     
    iskandar, Sep 15, 2005 IP
  4. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #4
    This is a very bad idea.

    Xenu is a popular (and excellent) freeware links checker. I use it on my site to check the validity of links from my pages to other pages outside my site. If you block Xenu, it will report the link as an error -- chances are many webmasters using Xenu will then delete the link to your site and you will have just lost a potentially valuable bit of PR.

    Beyond that, I agree with INV: not everything on that list is a bad bot but most of the really bad ones aren't going to even read your robots.txt file so you're wasting your time (and that of the good bots).

    Delete everything above

    User-agent: *
    Disallow: /private/
    Disallow: /images/
    Disallow: /affiliate/
    Disallow: /cgi-bin/
    Disallow: /include/
    Disallow: /webalizer/
    Disallow: /modlogan/
    Disallow: /cp/
    
    Code (markup):
     
    minstrel, Sep 17, 2005 IP
  5. Repo

    Repo Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    All of those links don't work.

    And http://www.searchengineworld.com/cgi-bin/robotcheck.cgi points to WebmasterWorld :confused:

    I registered to WebmasterWorld but it does work either.
     
    Repo, Jun 24, 2006 IP
  6. ottodo

    ottodo Guest

    Messages:
    2,055
    Likes Received:
    70
    Best Answers:
    0
    Trophy Points:
    0
    #6
    This is really important thread, ain't?
     
    ottodo, Aug 22, 2006 IP
  7. MLDesigners

    MLDesigners Peon

    Messages:
    34
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    searchengineworld.com/cgi-bin/robotcheck.cgi works for me...

    Try again, perhaps was only down for a while
     
    MLDesigners, Aug 29, 2006 IP