Is there any difference between (1) and (2) if in my robots.txt, I put either 1) User-agent: Mediapartners-Google Disallow: OR 2) User-agent: * Disallow:
wealthmastery, why do you want to ban the mediapartner google bot from your pages? it helps you in crawling all of your pages with adsense implemented.
Ooopsss... My initial intention is to WANT them to crawl my pages. So what should be the syntax? As below? User-agent: Mediapartners-Google Allow: 2) User-agent: * Allow: So what if it is User-agent: * Disallow: /abc/ ?? I am really confused about teh syntax. When is 'allow'? When is 'disallow'?
He knows much more than me who has just empty robots.txt files on every domain. http://www.robotstxt.org/wc/robots.html
That is where I got it from. His one is # Allow all User-agent: * Disallow: Which means it bans all the bots? That is exactly teh same as my initial post of option (2)
for disallowing them all otherwise the´ll crawl your site. If you wanna just allow them to come, leave the whole robots.txt file empty (but create one).
google bot http://www.google.com/support/webmasters/bin/answer.py?answer=40364&topic=8846 msnbot http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_RestrictAccessToSite.htm yaho http://help.yahoo.com/help/us/ysearch/slurp/ memo: http://3w.ezer.com/robots/robots.txt/disallow.asp
Thanks guys. (1) Let me get this right. WITHOUT a "/" after "disallow" meaning ALLOWING the bot to crawl? From google page http://www.google.com/support/webmasters/bin/answer.py?answer=40364&topic=8846 "Allowing Googlebot If you want to block access to all bots other than the Googlebot, you can use the following syntax: User-agent: * Disallow: / User-agent: Googlebot Disallow:" (2) So back to my original post ... I actually ALLOW them to crawl as there is NO "/" after "disallow"? Am I correct? (3) If we just specify one type of bot without specifiying others, BY DEFAULT, it means ALLOWING OTHER bots to crawl right? For example if my robots.txt is only. User-agent: Googlebot-Image Disallow:/ Does that mean, I BAN only google from crawling my images BUT allowing other bots?
There is "error" on the error log, if I do not create robots.txt. Ok I can create an empty one. But that is not the point. The point here is I am confused about the syntax. That is why I asked Thanks and I hope I get some straight answers to my post. This is my robots.txt ---- User-agent: Mediapartners-Google Disallow: User-agent: OmniExplorer_Bot Disallow: / User-agent: FreeFind Disallow: / ----
Yeah, it looks like there are different school of thoughts about robots.txt 1) User-agent: * Disallow: maybe seen as an error and be interpreted as User-agent: * Disallow: / 2) User-agent: * Allow: only used by Googlebot and the rests of the bots do not recognise it 3) It is better not to ahve robots.txt at all (but then how to deal with error notification liek "robots.txt is not found " in the error log? gosh... thsi thing can sometimes drive one crazy...
Hi waelthmastery, You got a lot of confusing answers here ! No, all robots understand the syntax you are using. It is correct. I never use it. Most robots do not support the "Allow" directive. The ones that support it do not agree on the exact meaning of it. A site without robots.txt is fine for search engines, but it fills your error log. A completely empty robots.txt is a good solution. It has exactly the same meaning as this : User-agent: * Disallow: Code (markup): In case of doubt, refer to the "official" standard used by all serious robot designers : Robots exclusion standard (1994 edition). It is not a nice web page, but it contains all you might want to know about the standard. Jean-Luc
if you do not use User-agent: Googlebot Disallow: / then it is not necessary to list User-agent: Mediapartners-Google Disallow: