1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

ideal robots.txt for all sites.

Discussion in 'robots.txt' started by slayer__, Jul 15, 2009.

  1. #1
    User-agent: *
    Disallow: /?ref=
    
    User-agent: HTTrack
    Disallow: /
    
    User-agent: grub-client
    Disallow: /
    
    User-agent: grub
    Disallow: /
    
    User-agent: looksmart
    Disallow: /
    
    User-agent: WebZip
    Disallow: /
    
    User-agent: larbin
    Disallow: /
    
    User-agent: b2w/0.1
    Disallow: /
    
    User-agent: psbot
    Disallow: /
    
    User-agent: Python-urllib
    Disallow: /
    
    User-agent: NetMechanic
    Disallow: /
    
    User-agent: URL_Spider_Pro
    Disallow: /
    
    User-agent: CherryPicker
    Disallow: /
    
    User-agent: EmailCollector
    Disallow: /
    
    User-agent: EmailSiphon
    Disallow: /
    
    User-agent: WebBandit
    Disallow: /
    
    User-agent: EmailWolf
    Disallow: /
    
    User-agent: ExtractorPro
    Disallow: /
    
    User-agent: CopyRightCheck
    Disallow: /
    
    User-agent: Crescent
    Disallow: /
    
    User-agent: SiteSnagger
    Disallow: /
    
    User-agent: ProWebWalker
    Disallow: /
    
    User-agent: CheeseBot
    Disallow: /
    
    User-agent: LNSpiderguy
    Disallow: /
    
    User-agent: Teleport
    Disallow: /
    
    User-agent: TeleportPro
    Disallow: /
    
    User-agent: MIIxpc
    Disallow: /
    
    User-agent: Telesoft
    Disallow: /
    
    User-agent: Website Quester
    Disallow: /
    
    User-agent: moget/2.1
    Disallow: /
    
    User-agent: WebZip/4.0
    Disallow: /
    
    User-agent: WebStripper
    Disallow: /
    
    User-agent: WebSauger
    Disallow: /
    
    User-agent: WebCopier
    Disallow: /
    
    User-agent: NetAnts
    Disallow: /
    
    User-agent: Mister PiX
    Disallow: /
    
    User-agent: WebAuto
    Disallow: /
    
    User-agent: TheNomad
    Disallow: /
    
    User-agent: WWW-Collector-E
    Disallow: /
    
    User-agent: RMA
    Disallow: /
    
    User-agent: libWeb/clsHTTP
    Disallow: /
    
    User-agent: asterias
    Disallow: /
    
    User-agent: httplib
    Disallow: /
    
    User-agent: turingos
    Disallow: /
    
    User-agent: spanner
    Disallow: /
    
    User-agent: InfoNaviRobot
    Disallow: /
    
    User-agent: Harvest/1.5
    Disallow: /
    
    User-agent: Bullseye/1.0
    Disallow: /
    
    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /
    
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /
    
    User-agent: CherryPickerSE/1.0
    Disallow: /
    
    User-agent: CherryPickerElite/1.0
    Disallow: /
    
    User-agent: WebBandit/3.50
    Disallow: /
    
    User-agent: NICErsPRO
    Disallow: /
    
    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /
    
    User-agent: DittoSpyder
    Disallow: /
    
    User-agent: Foobot
    Disallow: /
    
    User-agent: WebmasterWorldForumBot
    Disallow: /
    
    User-agent: SpankBot
    Disallow: /
    
    User-agent: BotALot
    Disallow: /
    
    User-agent: lwp-trivial/1.34
    Disallow: /
    
    User-agent: lwp-trivial
    Disallow: /
    
    User-agent: BunnySlippers
    Disallow: /
    
    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /
    
    User-agent: URLy Warning
    Disallow: /
    
    User-agent: Wget/1.6
    Disallow: /
    
    User-agent: Wget/1.5.3
    Disallow: /
    
    User-agent: Wget
    Disallow: /
    
    User-agent: LinkWalker
    Disallow: /
    
    User-agent: cosmos
    Disallow: /
    
    User-agent: moget
    Disallow: /
    
    User-agent: hloader
    Disallow: /
    
    User-agent: humanlinks
    Disallow: /
    
    User-agent: LinkextractorPro
    Disallow: /
    
    User-agent: Offline Explorer
    Disallow: /
    
    User-agent: Mata Hari
    Disallow: /
    
    User-agent: LexiBot
    Disallow: /
    
    User-agent: Web Image Collector
    Disallow: /
    
    User-agent: The Intraformant
    Disallow: /
    
    User-agent: True_Robot/1.0
    Disallow: /
    
    User-agent: True_Robot
    Disallow: /
    
    User-agent: BlowFish/1.0
    Disallow: /
    
    User-agent: JennyBot
    Disallow: /
    
    User-agent: MIIxpc/4.2
    Disallow: /
    
    User-agent: BuiltBotTough
    Disallow: /
    
    User-agent: ProPowerBot/2.14
    Disallow: /
    
    User-agent: BackDoorBot/1.0
    Disallow: /
    
    User-agent: toCrawl/UrlDispatcher
    Disallow: /
    
    User-agent: WebEnhancer
    Disallow: /
    
    User-agent: suzuran
    Disallow: /
    
    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /
    
    User-agent: VCI
    Disallow: /
    
    User-agent: Szukacz/1.4 
    Disallow: /
    
    User-agent: QueryN Metasearch
    Disallow: /
    
    User-agent: Openfind data gathere
    Disallow: /
    
    User-agent: Openfind 
    Disallow: /
    
    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /
    
    User-agent: Xenu's
    Disallow: /
    
    User-agent: Zeus
    Disallow: /
    
    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /
    
    User-agent: RepoMonkey
    Disallow: /
    
    User-agent: Microsoft URL Control
    Disallow: /
    
    User-agent: Openbot
    Disallow: /
    
    User-agent: URL Control
    Disallow: /
    
    User-agent: Zeus Link Scout
    Disallow: /
    
    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /
    
    User-agent: Webster Pro
    Disallow: /
    
    User-agent: EroCrawler
    Disallow: /
    
    User-agent: LinkScan/8.1a Unix
    Disallow: /
    
    User-agent: Keyword Density/0.9
    Disallow: /
    
    User-agent: Kenjin Spider
    Disallow: /
    
    User-agent: Iron33/1.0.2
    Disallow: /
    
    User-agent: Bookmark search tool
    Disallow: /
    
    User-agent: GetRight/4.2
    Disallow: /
    
    User-agent: FairAd Client
    Disallow: /
    
    User-agent: Gaisbot
    Disallow: /
    
    User-agent: Aqua_Products
    Disallow: /
    
    User-agent: Radiation Retriever 1.1
    Disallow: /
    
    User-agent: Flaming AttackBot
    Disallow: /
    
    User-agent: Oracle Ultra Search
    Disallow: /
    
    User-agent: MSIECrawler
    Disallow: /
    
    User-agent: PerMan
    Disallow: /
    
    User-agent: searchpreview
    Disallow: /
    Code (markup):
    you can use this for all forums,portals,html sites etc. (you should add your admin directory to disallows)
     
    slayer__, Jul 15, 2009 IP
  2. IMcruiser

    IMcruiser Banned

    Messages:
    31
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Looks good, nice list of malicious bots
     
    IMcruiser, Jul 15, 2009 IP
  3. pictureboarduk

    pictureboarduk Well-Known Member

    Messages:
    551
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    140
    #3
    I was wondering what defines a malicious bot and do they even take notice of the robots.txt?

    Nice thread.
     
    pictureboarduk, Jul 18, 2009 IP
  4. neddyy

    neddyy Peon

    Messages:
    21
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I was thinking that as well. If there is such thing as a malicious bot why would they open up robots.txt and see if their allowed.

    I think they are probably bots that are slightly undesired, un-useful or inefficient and not worth wasting bandwidth on. Maybe?

    Anyway good list. Its good to share with others :D

    Neddy
     
    neddyy, Jul 31, 2009 IP
  5. IMcruiser

    IMcruiser Banned

    Messages:
    31
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Malicious bots take no notice of anything so disallowing them is rather a pointless exercise. However some will follow the rules if the lay them down
     
    IMcruiser, Jul 31, 2009 IP
  6. psetyo

    psetyo Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    do not put your admin site into robots.txt.
    If you listed your admin site, hackers will know the url of your admin site..
    Better check your front end not to contain any link to your admin site (javascript, css, ect)
     
    psetyo, Aug 3, 2009 IP
  7. nd09

    nd09 Peon

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    There is the good combination robot.txt code :)
     
    nd09, Aug 5, 2009 IP
  8. Master Directory

    Master Directory Well-Known Member

    Messages:
    3,351
    Likes Received:
    82
    Best Answers:
    0
    Trophy Points:
    135
    Digital Goods:
    1
    #8
    well this is very good code , its work on my website.
     
    Master Directory, Aug 5, 2009 IP
  9. premiumscripts

    premiumscripts Peon

    Messages:
    1,062
    Likes Received:
    48
    Best Answers:
    0
    Trophy Points:
    0
    #9
    As others have said, putting this in robots.txt is rather pointless if the bot just ignores it. A better way would be to put this (converted) into a .htaccess file, so that bots with those specific user agents are automatically blocked from your site and won't be able to visit it.
     
    premiumscripts, Aug 6, 2009 IP
  10. Codex-m

    Codex-m Peon

    Messages:
    36
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Great tip.. Anyway, I personally use .htaccess to block bad bots. It is much safer because spammers have no way of seeing what type of bots I am allowing to crawl the site.

    This is one downside of robots.txt, it is publicly available.
     
    Codex-m, Aug 7, 2009 IP
  11. sachin9sharma

    sachin9sharma Peon

    Messages:
    19
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    I used this code in my robots.txt file in my backup of website and uploaded

    User-agent: *
    Disallow:


    Is this the right way to use robots file....?
     
    sachin9sharma, Aug 25, 2009 IP
  12. Active_Hosting

    Active_Hosting Peon

    Messages:
    13
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    any exammple of how to block bots via .htaccess
     
    Active_Hosting, Aug 31, 2009 IP
  13. redtubex

    redtubex Peon

    Messages:
    79
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    seems good robot file
     
    redtubex, Sep 2, 2009 IP
  14. Codex-m

    Codex-m Peon

    Messages:
    36
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Codex-m, Sep 2, 2009 IP
  15. fr0gvn

    fr0gvn Peon

    Messages:
    35
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Can i use for wordpress? I just started learn!
     
    fr0gvn, Sep 19, 2009 IP
  16. theapparatus

    theapparatus Peon

    Messages:
    2,925
    Likes Received:
    119
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Running out the door but this article may be helpful:

    http://www.twentysteps.com/creating-the-ultimate-wordpress-robotstxt-file/

    Just for reference, I believe you can combine all those User Agent lines and follow them with a single Disallow line. But, as noted, it's all moot since 1) bad robots and scrapers will just ignore what's there and 2) no robots.txt files is perfect for all sites.

    For example, here's mine: http://drmikessteakdinner.com/robots.txt That won't work well on a WordPress site.
     
    theapparatus, Sep 19, 2009 IP