Hi Please excuse what is probably going to be a naive question but I have never used robots.txt before. I noticed a few times where google came looking for a robots.txt (which I don't have) and then left, ie it never looked at any other pages. So I decided to have a robots.txt (see below). One thing I would like to do is exclude some directories that are called 'data' of which there are various eg holidays/data/ uk/data/ swimming/data/ Can I exclude them in one line or do I risk other directories in the holiday swimming or UK directories being excuded? eg disallow: /*/data/ Any advice is very welcome. Thanks, Gay User-agent: * Disallow: /cgi-bin/ Disallow: /_borders/ Disallow: /_derived/ Disallow: /_fpclass/ Disallow: /_overlay/ Disallow: /_private/ Disallow: /_themes/ Disallow: /_vti_bin/ Disallow: /_vti_cnf/ Disallow: /_vti_log/ Disallow: /_vti_map/ Disallow: /_vti_pvt/ Disallow: /_vti_txt/
Hi - I have not yet used robot.txt either, so I can't help you with that. I do know that Google will index your site in its own time (unfortunately). It will not index your site faster if you have a robot.txt file. It already knows your site exists, or it would not have asked for your robot.txt file. That's a start. You probably know the refrain: get links, get visitors, get links.
Google understands wildcards, so I think you can do it. But you can also upload an empty index.html file into those directories to prevent google and the other search engines from seeing the content of those pages. As far as I know google is the only SE that uses those wildcards (but I'm not 100% sure of this though).
As long as there are no links to those directories, google isn't likely to index them anyhow. Even if they were indexed, they would not be likely to rank either.
www.robotstxt.org - everything you need to know about robots.txt with a complete list of robots that you may allow or disallow.