Hello, i found 3 ways to disallow some robots from indexing a file, please help me to point out which are incorrect and which you think is best? #1 #2 #3 I like a third one, but im not sure if its valid. Thank you
I would go to http://www.mcanerin.com/en/search-engine/robots-txt.asp and generate a correct one there. Of yours, I believe only #1 is correct.
I use this code. User-Agent: * Allow: / Disallow: /folder/ Disallow: /file.html www.website.com/sitemap.xml
@RobinInTexas has it right that only your first one is 'correct' -- every time you say "user-agent" you are effectively starting a new 'section'. Because of that in that second example you have there NOTHING is being set for YahooBot or GoogleBot. Only Microsoftbot is actually getting that disallow. To implement #2 properly (assuming that same value should be sent to all three) it should read: User-Agent: googlebot Disallow: /index-ads.html User-Agent: yahoobot Disallow: /index-ads.html User-Agent: microsoftbot Disallow: /index-ads.html User-agent: * Disallow: /downloads/ Allow: / Code (markup): THOUGH, you really shouldn't have to say "allow: /" as that should be assumed.... and could override the first three UA settings. In fact "allow" is technically a third party/non-standard value. Also keep in mind that while there is "allow" as third party in robots.txt, there is no such thing as "allow" or "index" in a robots META tag. (no matter how many fools try to say there is or deploy code with them). Those invalid values could in fact have the opposite of the desired affect. Remember, the valid values for a robots META tag are noarchive, nofollow, noindex, nosnippet, and noodp. ANYTHING else is 100% fiction.