My robots.txt file, is it too silly example?

Discussion in 'robots.txt' started by papek, Jun 7, 2007.

  1. #1
    What sort of a traffic amount would I miss if I kept this robots.txt file on my site? Shame it won't show in stats right away so I am trying to figure out how many small crawlers would drop the website from their index. Before I was disabling only but the list was growing too long.

    User-agent: Mediapartners-Google*
    Disallow:

    User-Agent: ArchitextSpider # Excite
    User-Agent: Ask Jeeves
    User-Agent: FAST-WebCrawler
    User-Agent: Freecrawl # euroseek.net
    User-Agent: Googlebot
    User-Agent: Googlebot-Mobile
    User-Agent: Googlebot-Image
    User-Agent: Adsbot-Google
    User-Agent: Gulliver # Northern Light
    User-Agent: ia_archiver
    User-Agent: InfoSeek
    User-Agent: Lycos
    User-Agent: msnbot
    User-Agent: Scooter
    User-Agent: Slurp
    Disallow:

    User-Agent: *
    Disallow: /
     
    papek, Jun 7, 2007 IP
  2. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #2
    you will not recieve traffic to your site, if you keep this file:D
     
    trichnosis, Jun 8, 2007 IP
  3. papek

    papek Peon

    Messages:
    92
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks trichnosis I changed it back to what it was before. BUT I thought all the crawlers from the top were allowed, except the last "/" line as disallowed.
     
    papek, Jun 8, 2007 IP
  4. DavidK1

    DavidK1 Peon

    Messages:
    507
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #4
    but the line before that is saying ALL with *. So you are basically saying allow all those ones you have listed, but then disallow all.

    I assume you are trying to keep all other robots from visiting. Remember that only "good" robots will "listen" to a robots.txt, and you have those listed.

    If a certain bot is causing you issues, just ban it in your .htaccess
     
    DavidK1, Jun 11, 2007 IP
  5. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #5
    You are perfectly right.

    Jean-Luc
     
    Jean-Luc, Jun 11, 2007 IP
  6. DavidK1

    DavidK1 Peon

    Messages:
    507
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Ummm.. No he isn't.
     
    DavidK1, Jun 11, 2007 IP
  7. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #7
    This is not correct.


    "User-Agent: *" means all other robots, not mentioned in another rule. This is explained in the original robots.txt specification:
    Jean-Luc
     
    Jean-Luc, Jun 11, 2007 IP
  8. DavidK1

    DavidK1 Peon

    Messages:
    507
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Nope. It does not work that way, and the link you gave does not say that.
     
    DavidK1, Jun 11, 2007 IP
  9. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #9
    What does it say, then ?:confused:

    Jean-Luc
     
    Jean-Luc, Jun 11, 2007 IP
  10. papek

    papek Peon

    Messages:
    92
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Thanks guys. I am watching this and for now I use back on my previouse version of robots.txt file containing list of only disabled robots. I am adopting more knowledge about this!

    The file in this thread was used only for 2 weeks and I noticed 5% decrease in organic traffic; hard to say the reason as it well could be just arriving summer months.
     
    papek, Jun 11, 2007 IP
  11. DavidK1

    DavidK1 Peon

    Messages:
    507
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #11
    It doesn't work that way. Try it for yourself. Set up a robots.txt file like the example given. Then use one of those spider simulators or page crawlers that allow you to set the User agent to whatever you wish. Then take note of what happens.

    You are also forgetting that only "good" bots like the ones that are listed will comply with a robots.txt command. The ones that cause trouble have to be blocked via .htaccess
     
    DavidK1, Jun 12, 2007 IP
  12. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #12
    I rely on the spec, not on the (maybe invalid) design of a spider simulator.

    I do not forget that. I fully agree with you on the need to use .htaccess for bad-intended bots.

    Jean-Luc
     
    Jean-Luc, Jun 12, 2007 IP
    DavidK1 likes this.
  13. DavidK1

    DavidK1 Peon

    Messages:
    507
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #13
    There are a lot of them out there, they are all invalid? When I disallow all, the tools cannot crawl. Considering they aren't Googlebot MSN or Slurp, it shouldn't allow them.. but it does.

    The spec you are relying on is not in control of any of the bots out there. It is a guide on how "well-behaved" bots work.
     
    DavidK1, Jun 12, 2007 IP