1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Proper robots.txt Disallow format

Discussion in 'Programming' started by postcd, Apr 8, 2014.

  1. #1
    Hello, i found 3 ways to disallow some robots from indexing a file, please help me to point out which are incorrect and which you think is best?

    #1

    #2

    #3

    I like a third one, but im not sure if its valid.

    Thank you
     
    postcd, Apr 8, 2014 IP
  2. RobinInTexas

    RobinInTexas Active Member

    Messages:
    217
    Likes Received:
    14
    Best Answers:
    2
    Trophy Points:
    65
    #2
    RobinInTexas, Apr 8, 2014 IP
  3. DrewRamsey

    DrewRamsey Banned

    Messages:
    11
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    21
    #3
    I use this code.

    User-Agent: *
    Allow: /
    Disallow: /folder/
    Disallow: /file.html
    www.website.com/sitemap.xml
     
    DrewRamsey, Apr 11, 2014 IP
  4. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,732
    Likes Received:
    1,998
    Best Answers:
    253
    Trophy Points:
    515
    #4
    @RobinInTexas has it right that only your first one is 'correct' -- every time you say "user-agent" you are effectively starting a new 'section'. Because of that in that second example you have there NOTHING is being set for YahooBot or GoogleBot. Only Microsoftbot is actually getting that disallow. To implement #2 properly (assuming that same value should be sent to all three) it should read:

    User-Agent: googlebot
    Disallow: /index-ads.html
    
    User-Agent: yahoobot
    Disallow: /index-ads.html
    
    User-Agent: microsoftbot
    Disallow: /index-ads.html
    
    User-agent: *
    Disallow: /downloads/
    Allow: /
    Code (markup):
    THOUGH, you really shouldn't have to say "allow: /" as that should be assumed.... and could override the first three UA settings. In fact "allow" is technically a third party/non-standard value.

    Also keep in mind that while there is "allow" as third party in robots.txt, there is no such thing as "allow" or "index" in a robots META tag. (no matter how many fools try to say there is or deploy code with them). Those invalid values could in fact have the opposite of the desired affect. Remember, the valid values for a robots META tag are noarchive, nofollow, noindex, nosnippet, and noodp. ANYTHING else is 100% fiction.
     
    Last edited: Apr 11, 2014
    deathshadow, Apr 11, 2014 IP