robots.txt help... trying to disallow file only from Google.

Discussion in 'robots.txt' started by vprp, Jul 16, 2005.

  1. #1
    There are a lot of files on my forum that I don't want all search engine spiders to visit so I have them listed as:

    User-agent: *
    Disallow: /admincp/
    Disallow: /attachments/
    Disallow: /clientscript/

    Additionally, there is one file that I want spidered by other spiders except for Googlebot. So I have added something like:

    User-agent: googlebot
    Disallow: /arcade.php

    So my robots.txt file may look something like this:

    User-agent: *
    Disallow: /admincp/
    Disallow: /attachments/
    Disallow: /clientscript/
    
    User-agent: googlebot
    Disallow: /arcade.php
    PHP:
    Does that mean that googlebot will not spider anything in /admincp, /attachments, /clientscript and arcade.php or will it only listen to what is directly specified for Googlebot? Meaning, will it only choose not to index arcade.php?
     
    vprp, Jul 16, 2005 IP
  2. ResaleBroker

    ResaleBroker Active Member

    Messages:
    1,665
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    90
    #2
    That is my understanding.
     
    ResaleBroker, Jul 16, 2005 IP
  3. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #3
    Yes, that's correct (theoretically). All spiders which obey the robots.txt directives (the reputable ones do but not all of them do) will obey the first set of directives, including googlebot, and additionally googlebot will obey the second googlebot-specific directive.
     
    minstrel, Jul 18, 2005 IP
  4. gatordun

    gatordun Guest

    Messages:
    114
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #4
    User-agent: *
    User-agent: Googlebot
    User-agent: Googlebot-Image
    Disallow: /
     
    gatordun, Jul 28, 2005 IP
  5. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #5
    No, that's incorrect. What that does is tell ALL spiders not to index anything (disallow everything). That wasn't what was requested and frankly I can't imagine anyone wanting a robots.txt file like that except perhaps for a private members only site.
     
    minstrel, Jul 28, 2005 IP
  6. gatordun

    gatordun Guest

    Messages:
    114
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Thought he was asking about google.
    So i added the image google too so he could add it to his list.
    The best thing to do is list all spiders that you don't want by name.
    Some are good some are not.
    I posted a full list of bots on one of these forums.
     
    gatordun, Jul 29, 2005 IP
  7. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #7
    He was asking about Google.

    Look at the first line of the robots.txt file you posted:

    followed by

    That instructs all spiders to disallow everything.
     
    minstrel, Jul 29, 2005 IP