The most ideal robots.txt? how should it look like?

Discussion in 'robots.txt' started by Edz, Nov 24, 2005.

  1. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #21
    I've been doing that.

    I'll take a look at the ones you mentioned.

    Heck, I have to enable Xenu every time I use it on myself.
     
    Will.Spencer, Nov 26, 2005 IP
  2. Edz

    Edz Peon

    Messages:
    1,690
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    0
    #22
    Thanks for that info Minstrel:cool:
     
    Edz, Nov 26, 2005 IP
  3. Edz

    Edz Peon

    Messages:
    1,690
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    0
    #23
    Ok i would like get some more knowledge about this robots.txt so i would like to ask a couple of questions.

    If i would want to have the regular bots visiting and indexing my site such as Google and yahoo and such and not wanting the bad bots to visit my site and have them access denied i would have to put in this type of syntaxes right?

    I know real bad bots will ignore the robots.txt but everything i can manage to block out with only a simple installment of an .text file is well worth it. What could it hurt right? (or maybe i am missing something here)

    Ok here goes:

    User-agent: *
    Disallow:

    Disallow: /
    User-agent: Alexibot

    Disallow: /
    User-agent: Alexibot

    Disallow: /
    User-agent: Alexibot

    I used Alexibot as an sample for the example.
    But would this mean that all robots are granted access accept for the ones in with Disallow: / Alexibot ??

    I know there are a lot of bots that will ignore the robots.txt file but every bot that is on the list that can be stopped is one that bites the dust and saving me bandwith:)

    I will also look into the botsense beta service that yeah i know is also not fail proof but everything helps and it will improve in time even more as they say, hopefully;)
     
    Edz, Dec 9, 2005 IP
  4. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #24
    No.

    First, you have the order of lines reversed - the useragent comes first before the disallow:

    User-agent: Alexibot
    Disallow: /
    
    User-agent: *
    Disallow: 
    Code (markup):
    Second, repeating the name of the user-agent won't help - once is enough.

    Third, construct your robots.txt file for the GOOD bots. Filling it full of stuff for the bad bots won't help you because the bad bots are not going to be even reading the robots.txt file.
     
    minstrel, Dec 9, 2005 IP
  5. Edz

    Edz Peon

    Messages:
    1,690
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    0
    #25
    Oh man your right, yeah i have to list it as you said and not the other way around, why i even put it like that anyways:confused: copy and paste error.


    Yeah, i know repeating is not neccassary but i did this as an example to illustrate various bot names:)
    Putting a list in wouldn't hurt if it would help to deter all the bots is another question but if you don't shoot you would miss for certain;)


    But in this manor:


    User-agent: *
    Disallow:

    User-agent: Alexibot
    Disallow: /

    i would allow all bots and would make an attempt to block Alexibot right?
     
    Edz, Dec 9, 2005 IP
  6. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #26
    I think the usual order is disallow bots first and allow all others at the end, though it may not matter at all.

    I really don't disallow bots - only certain directories I don't want indexed. As I said, the good bots I want to let in - the bad bots are going to ignore the robots.txt file anyway so all that does is clutter up the file for the good bots. Why make it any harder than it has to be for Googlebot and the other Good Witches?
     
    minstrel, Dec 9, 2005 IP
  7. ServerUnion

    ServerUnion Peon

    Messages:
    3,611
    Likes Received:
    296
    Best Answers:
    0
    Trophy Points:
    0
    #27
     
    ServerUnion, Dec 9, 2005 IP
  8. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #28
    I keep a nice big fat robots.txt to help reduce bandwidth utilization from annoying and useless web robots.
     
    Will.Spencer, Dec 9, 2005 IP
  9. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #29
     
    minstrel, Dec 9, 2005 IP
  10. Edz

    Edz Peon

    Messages:
    1,690
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    0
    #30
    To be clear on something guys, when i would use this syntax

    Would i need to specify each bot also that i want to grant access? i am not sure about this. Or would above suffice?

    Will Spencer, i see on your robots.txt for instance no indication of this, only an indication of the ones that are disallowed. So can i presume i only have to put in the file which ones to block and the ones that aren't listed such as googlebot would crawl the site since it doesn't encounter any reference to it?
     
    Edz, Dec 10, 2005 IP
  11. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #31
    No. This part:

    User-agent: *
    Disallow: 
    
    Code (markup):
    applies to ALL bots not specifically mentioned.

    Yes. Although it's best to include the two lines above to state that clearly. It may not be necessary for the bots but if nothing else it will remind you the human of what the robots.txt file is doing.
     
    minstrel, Dec 10, 2005 IP
  12. Edz

    Edz Peon

    Messages:
    1,690
    Likes Received:
    72
    Best Answers:
    0
    Trophy Points:
    0
    #32
    Thank you minstrel for clearing that up.

    Much appreciated:cool:
     
    Edz, Dec 10, 2005 IP
  13. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #33
    :) Happy to help, Edz.
     
    minstrel, Dec 10, 2005 IP
  14. mcfox

    mcfox Wind Maker

    Messages:
    7,526
    Likes Received:
    716
    Best Answers:
    0
    Trophy Points:
    360
    #34
    Not thread related but Will, your site isn't displaying properly in Opera:
     

    Attached Files:

    • will.jpg
      will.jpg
      File size:
      75.4 KB
      Views:
      812
    mcfox, Dec 10, 2005 IP
  15. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #35
    A while back, I installed Opera specifically because it seems to be the one that is most likely these days to break pages. I don't use it for general browsing but it can be instructive to get a copy to view your site... much like in the old days I used to keep a copy of Netscape 4.7x around as a worst case scenario browser.
     
    minstrel, Dec 10, 2005 IP