Proper robots.txt Disallow format

postcd Well-Known Member

Messages:: 1,043

Likes Received:: 9

Best Answers:: 1

Trophy Points:: 190

#1

Hello, i found 3 ways to disallow some robots from indexing a file, please help me to point out which are incorrect and which you think is best?

#1

User-Agent: *
Disallow:

User-Agent: Googlebot
Disallow: /index-ads.html

User-Agent: Googlebot-Mobile
Disallow: /index-ads.html

User-Agent: Googlebot-Image
Disallow: /index-ads.html

User-Agent: Mediapartners-Google
Disallow: /index-ads.html

User-Agent: Adsbot-Google
Disallow: /index-ads.html

User-Agent: Slurp
Disallow: /index-ads.html

User-Agent: msnbot
Disallow: /index-ads.html

User-Agent: msnbot-media
Disallow: /index-ads.html

Sitemap: http://mysite.com/sitemap.php
Click to expand...

#2

User-Agent: Googlebot
User-Agent: Yahoobot
User-Agent: Microsoftbot
Disallow: /index-ads.html

User-agent: *
Disallow: /downloads/
Allow: /
Click to expand...

#3

User-Agent: Googlebot Googlebot-Mobile Googlebot-Image Mediapartners-Google
Disallow: /index-ads.html
Disallow: /downloads/
Allow: /
Click to expand...

I like a third one, but im not sure if its valid.

Thank you

postcd, Apr 8, 2014 IP

RobinInTexas Active Member

Messages:: 217

Likes Received:: 14

Best Answers:: 2

Trophy Points:: 65

#2

I would go to http://www.mcanerin.com/en/search-engine/robots-txt.asp and generate a correct one there.
Of yours, I believe only #1 is correct.

RobinInTexas, Apr 8, 2014 IP

DrewRamsey Banned

Messages:: 11

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 21

#3

I use this code.

User-Agent: *
Allow: /
Disallow: /folder/
Disallow: /file.html
www.website.com/sitemap.xml

DrewRamsey, Apr 11, 2014 IP

deathshadow Acclaimed Member

Messages:: 9,732

Likes Received:: 1,999

Best Answers:: 253

Trophy Points:: 515

#4

@RobinInTexas has it right that only your first one is 'correct' -- every time you say "user-agent" you are effectively starting a new 'section'. Because of that in that second example you have there NOTHING is being set for YahooBot or GoogleBot. Only Microsoftbot is actually getting that disallow. To implement #2 properly (assuming that same value should be sent to all three) it should read:
User-Agent: googlebot
Disallow: /index-ads.html

User-Agent: yahoobot
Disallow: /index-ads.html

User-Agent: microsoftbot
Disallow: /index-ads.html

User-agent: *
Disallow: /downloads/
Allow: /
Code (markup):
THOUGH, you really shouldn't have to say "allow: /" as that should be assumed.... and could override the first three UA settings. In fact "allow" is technically a third party/non-standard value.

Also keep in mind that while there is "allow" as third party in robots.txt, there is no such thing as "allow" or "index" in a robots META tag. (no matter how many fools try to say there is or deploy code with them). Those invalid values could in fact have the opposite of the desired affect. Remember, the valid values for a robots META tag are noarchive, nofollow, noindex, nosnippet, and noodp. ANYTHING else is 100% fiction.

Last edited: Apr 11, 2014

deathshadow, Apr 11, 2014 IP

Log in or Sign up

Proper robots.txt Disallow format

postcd Well-Known Member

RobinInTexas Active Member

DrewRamsey Banned

deathshadow Acclaimed Member

Useful Searches