robots.txt

Discussion in 'Google' started by waelthmastery, Oct 2, 2006.

  1. #1
    Is there any difference between (1) and (2) if in my robots.txt, I put either

    1) User-agent: Mediapartners-Google
    Disallow:

    OR

    2) User-agent: *
    Disallow:
     
    waelthmastery, Oct 2, 2006 IP
  2. ketan9

    ketan9 Active Member

    Messages:
    548
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    58
    #2
    Bans Google Adsense bots from crawling the pages


    Bans all the bots from crawling the pages!!
     
    ketan9, Oct 2, 2006 IP
  3. Kaudo

    Kaudo Peon

    Messages:
    358
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #3
    wealthmastery, why do you want to ban the mediapartner google bot from your pages? it helps you in crawling all of your pages with adsense implemented.
     
    Kaudo, Oct 2, 2006 IP
  4. waelthmastery

    waelthmastery Peon

    Messages:
    64
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Ooopsss... My initial intention is to WANT them to crawl my pages.

    So what should be the syntax? As below?

    User-agent: Mediapartners-Google
    Allow:

    2) User-agent: *
    Allow:

    So what if it is

    User-agent: *
    Disallow: /abc/ ??

    I am really confused about teh syntax.

    When is 'allow'?
    When is 'disallow'?
     
    waelthmastery, Oct 2, 2006 IP
  5. Kaudo

    Kaudo Peon

    Messages:
    358
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Kaudo, Oct 2, 2006 IP
  6. waelthmastery

    waelthmastery Peon

    Messages:
    64
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    That is where I got it from.

    His one is

    # Allow all
    User-agent: *
    Disallow:

    Which means it bans all the bots? That is exactly teh same as my initial post
    of option (2)
     
    waelthmastery, Oct 2, 2006 IP
  7. Kaudo

    Kaudo Peon

    Messages:
    358
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #7
    for disallowing them all otherwise the´ll crawl your site.
    If you wanna just allow them to come, leave the whole robots.txt file empty (but create one).
     
    Kaudo, Oct 2, 2006 IP
  8. ablaye

    ablaye Well-Known Member

    Messages:
    4,024
    Likes Received:
    97
    Best Answers:
    0
    Trophy Points:
    150
    #8
    You don't even need to create one.
     
    ablaye, Oct 2, 2006 IP
  9. Link.ezer.com

    Link.ezer.com Peon

    Messages:
    647
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
  10. waelthmastery

    waelthmastery Peon

    Messages:
    64
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Thanks guys.

    (1) Let me get this right.

    WITHOUT a "/" after "disallow" meaning ALLOWING the bot to crawl?

    From google page
    http://www.google.com/support/webmasters/bin/answer.py?answer=40364&topic=8846

    "Allowing Googlebot
    If you want to block access to all bots other than the Googlebot, you can use the following syntax:

    User-agent: *
    Disallow: /

    User-agent: Googlebot
    Disallow:"

    (2) So back to my original post ...


    I actually ALLOW them to crawl as there is NO "/" after "disallow"?
    Am I correct?

    (3) If we just specify one type of bot without specifiying others, BY DEFAULT, it means ALLOWING OTHER bots to crawl right?

    For example if my robots.txt is only.

    User-agent: Googlebot-Image
    Disallow:/

    Does that mean, I BAN only google from crawling my images
    BUT allowing other bots?
     
    waelthmastery, Oct 3, 2006 IP
  11. rehash

    rehash Well-Known Member

    Messages:
    1,502
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    150
    #11
    dont create a robots.txt file unless you want to ban some bot
    by default they are all allowed
     
    rehash, Oct 3, 2006 IP
  12. waelthmastery

    waelthmastery Peon

    Messages:
    64
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    There is "error" on the error log, if I do not create robots.txt. Ok I can create an empty one. But that is not the point. The point here is I am confused about the syntax.

    That is why I asked

    Thanks and I hope I get some straight answers to my post.

    This is my robots.txt

    ----
    User-agent: Mediapartners-Google
    Disallow:

    User-agent: OmniExplorer_Bot
    Disallow: /

    User-agent: FreeFind
    Disallow: /
    ----
     
    waelthmastery, Oct 3, 2006 IP
  13. waelthmastery

    waelthmastery Peon

    Messages:
    64
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Yeah, it looks like there are different school of thoughts about robots.txt

    1) User-agent: *
    Disallow:

    maybe seen as an error and be interpreted as

    User-agent: *
    Disallow: /

    2) User-agent: *
    Allow:

    only used by Googlebot and the rests of the bots do not recognise it

    3) It is better not to ahve robots.txt at all (but then how to deal with error notification liek "robots.txt is not found " in the error log?

    gosh... thsi thing can sometimes drive one crazy...
     
    waelthmastery, Oct 3, 2006 IP
  14. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Hi waelthmastery,

    You got a lot of confusing answers here !:rolleyes:

    No, all robots understand the syntax you are using.

    It is correct. I never use it. Most robots do not support the "Allow" directive. The ones that support it do not agree on the exact meaning of it.

    A site without robots.txt is fine for search engines, but it fills your error log. A completely empty robots.txt is a good solution. It has exactly the same meaning as this :
    User-agent: *
    Disallow:
    Code (markup):
    In case of doubt, refer to the "official" standard used by all serious robot designers : Robots exclusion standard (1994 edition). It is not a nice web page, but it contains all you might want to know about the standard.

    Jean-Luc
     
    Jean-Luc, Oct 3, 2006 IP
  15. Link.ezer.com

    Link.ezer.com Peon

    Messages:
    647
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
    #15
    if you do not use

    User-agent: Googlebot
    Disallow: /

    then it is not necessary to list

    User-agent: Mediapartners-Google
    Disallow:
     
    Link.ezer.com, Oct 3, 2006 IP
  16. waelthmastery

    waelthmastery Peon

    Messages:
    64
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Thanks A Lot Guys!
     
    waelthmastery, Oct 3, 2006 IP
  17. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #17
    No.

    "Disallow: " means "disallow nothing - spider everything".
     
    minstrel, Oct 3, 2006 IP