is this robo file right or wrong

Discussion in 'All Other Search Engines' started by QueenEve, Oct 15, 2009.

  1. #1
    can you check for me if this robot file is right or wrong , and if possible to explain for me why it is right or why it is wrong

    http://www.tripontop.com/robots.txt

    thanks in advance
     
    QueenEve, Oct 15, 2009 IP
  2. WebshoppeSolutions

    WebshoppeSolutions Peon

    Messages:
    139
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    A little over the top I think.

    See if you can keep it fairly simple.

    -----------------------------------------------

    # Begin block Bad-Robots from robots.txt
    User-agent: asterias
    Disallow:/
    User-agent: BotALot
    Disallow:/
    User-agent: BuiltBotTough
    Disallow:/
    User-agent: BunnySlippers
    Disallow:/
    User-agent: Cegbfeieh
    Disallow:/
    User-agent: CheeseBot
    Disallow:/
    User-agent: CherryPicker
    Disallow:/
    User-agent: CopyRightCheck
    Disallow:/
    User-agent: cosmos
    Disallow:/
    User-agent: Crescent
    Disallow:/
    User-agent: DittoSpyder
    Disallow:/
    User-agent: EmailCollector
    Disallow:/
    User-agent: EmailSiphon
    Disallow:/
    User-agent: EmailWolf
    Disallow:/
    User-agent: EroCrawler
    Disallow:/
    User-agent: ExtractorPro
    Disallow:/
    User-agent: Foobot
    Disallow:/
    User-agent: hloader
    Disallow:/
    User-agent: httplib
    Disallow:/
    User-agent: humanlinks
    Disallow:/
    User-agent: InfoNaviRobot
    Disallow:/
    User-agent: JennyBot
    Disallow:/
    User-agent: LexiBot
    Disallow:/
    User-agent: LinkextractorPro
    Disallow:/
    User-agent: LinkWalker
    Disallow:/
    User-agent: LNSpiderguy
    Disallow:/
    User-agent: lwp-trivial
    Disallow:/
    User-agent: MIIxpc
    Disallow:/
    User-agent: moget
    Disallow:/
    User-agent: NetAnts
    Disallow:/
    User-agent: NICErsPRO
    Disallow:/
    User-agent: Openfind
    Disallow:/
    User-agent: ProWebWalker
    Disallow:/
    User-agent: RepoMonkey
    Disallow:/
    User-agent: RMA
    Disallow:/
    User-agent: SiteSnagger
    Disallow:/
    User-agent: SpankBot
    Disallow:/
    User-agent: spanner
    Disallow:/
    User-agent: suzuran
    Disallow:/
    User-agent: Teleport
    Disallow:/
    User-agent: TeleportPro
    Disallow:/
    User-agent: Telesoft
    Disallow:/
    User-agent: TheNomad
    Disallow:/
    User-agent: TightTwatBot
    Disallow:/
    User-agent: Titan
    Disallow:/
    User-agent: True_Robot
    Disallow:/
    User-agent: turingos
    Disallow:/
    User-agent: VCI
    Disallow:/
    User-agent: WebAuto
    Disallow:/
    User-agent: WebBandit
    Disallow:/
    User-agent: WebCopier
    Disallow:/
    User-agent: WebEnhancer
    Disallow:/
    User-agent: WebmasterWorldForumBot
    Disallow:/
    User-agent: WebSauger
    Disallow:/
    User-agent: WebStripper
    Disallow:/
    User-agent: WebZip
    Disallow:/
    User-agent: Wget
    Disallow:/
    User-agent: WWW-Collector-E
    Disallow:/
    User-agent: Xenu's
    Disallow:/
    User-agent: Zeus
    Disallow:/
    # Begin Exclusion From Directories from robots.txt
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-content/plugins/
    Disallow: /wp-content/cache/
    Disallow: /wp-content/themes/
    Disallow: /wp-login.php
    Disallow: /wp-register.php

    Sitemap: http://www.tripontop.com/sitemap.xml.gz

    ----------------------------------------

    Why people insist on adding version and build numbers ( "4.01", or, /1.0") is beyond me .. but it's been done for years and bots will blow on by without even giving version numbers a second glance.

    Fact of the matter is that most of these listed above, if being manipulated at all by the one that runs them, won't pay any attention to your robots.txt file anyway.

    Google? well, truth be known .. if you have any Google ads on your site, then blocking Mediapartners-Google or related Google ad bots won't work .. they'll come on in anyway.

    Oh, and unzip your sitemap.xml ... it'll give the search engines one less hoop to jump through while indexing your site.

    If you are really really serious about blocking site scrapers, spam-bots, and nosey-nates, then I'll suggest you do all of the blocking through your .htaccess file .. stops them cold .. guaranteed.

    Things change fast on the net ... and I try to keep my tools as up-to-date as possible.
    In order to get a robots.txt that will validate, you may want to visit our robots.txt tool here;
    http://www.webshoppesolutions.com/bottxt_generator.htm
     
    Last edited: Oct 15, 2009
    WebshoppeSolutions, Oct 15, 2009 IP
  3. sherone

    sherone Well-Known Member

    Messages:
    1,539
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    130
    #3

    I simply write

    User-agent: *
    Sitemap: http://www.myurl/sitemap.xml


    Is not correct?
    shout I disallow all the bot like above?
     
    sherone, Oct 15, 2009 IP
  4. WebshoppeSolutions

    WebshoppeSolutions Peon

    Messages:
    139
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Sure .. your way could work providing you specify Allow or Disallow

    User-agent: *
    Disallow: /

    or

    User-agent: *
    Allow: /

    In that the "*" refers to "all" robots and parsing agents.

    If you want only Google to visit you, you can make an exception for Google this way;

    User-agent: Googlebot
    Allow: /
    User-agent: *
    Disallow: /
     
    WebshoppeSolutions, Oct 15, 2009 IP
  5. Traffic-Bug

    Traffic-Bug Active Member

    Messages:
    1,866
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    80
    #5
    Traffic-Bug, Oct 15, 2009 IP
  6. QueenEve

    QueenEve Active Member

    Messages:
    256
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    53
    #6
    QueenEve, Oct 16, 2009 IP
  7. slidetheweb

    slidetheweb Peon

    Messages:
    27
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Guessing here but isn't it shorter to Allow all the browser user-agents instead of blocking the zillion spiders ?
     
    slidetheweb, Oct 17, 2009 IP