robots.txt

Discussion in 'Google' started by gayc, Jan 27, 2007.

  1. #1
    Hi

    Please excuse what is probably going to be a naive question but I have never used robots.txt before.

    I noticed a few times where google came looking for a robots.txt (which I don't have) and then left, ie it never looked at any other pages.

    So I decided to have a robots.txt (see below).

    One thing I would like to do is exclude some directories that are called 'data' of which there are various eg

    holidays/data/
    uk/data/
    swimming/data/

    Can I exclude them in one line or do I risk other directories in the holiday swimming or UK directories being excuded?

    eg disallow: /*/data/

    Any advice is very welcome.

    Thanks, Gay

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /_borders/
    Disallow: /_derived/
    Disallow: /_fpclass/
    Disallow: /_overlay/
    Disallow: /_private/
    Disallow: /_themes/
    Disallow: /_vti_bin/
    Disallow: /_vti_cnf/
    Disallow: /_vti_log/
    Disallow: /_vti_map/
    Disallow: /_vti_pvt/
    Disallow: /_vti_txt/
     
    gayc, Jan 27, 2007 IP
  2. kh7

    kh7 Peon

    Messages:
    2,715
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi - I have not yet used robot.txt either, so I can't help you with that.

    I do know that Google will index your site in its own time (unfortunately). It will not index your site faster if you have a robot.txt file. It already knows your site exists, or it would not have asked for your robot.txt file. That's a start. You probably know the refrain: get links, get visitors, get links.
     
    kh7, Jan 27, 2007 IP
  3. Diether

    Diether Peon

    Messages:
    278
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Google understands wildcards, so I think you can do it.
    But you can also upload an empty index.html file into those directories to prevent google and the other search engines from seeing the content of those pages.
    As far as I know google is the only SE that uses those wildcards (but I'm not 100% sure of this though).
     
    Diether, Jan 27, 2007 IP
  4. kh7

    kh7 Peon

    Messages:
    2,715
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #4
    As long as there are no links to those directories, google isn't likely to index them anyhow. Even if they were indexed, they would not be likely to rank either.
     
    kh7, Jan 27, 2007 IP
  5. sqeeze

    sqeeze Peon

    Messages:
    169
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #5
    www.robotstxt.org - everything you need to know about robots.txt with a complete list of robots that you may allow or disallow.
     
    sqeeze, Jan 27, 2007 IP
  6. gayc

    gayc Well-Known Member

    Messages:
    533
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    108
    #6
    Many thanks everyone.

    I will start with something simple.
     
    gayc, Jan 28, 2007 IP