The syntax is very limited and easy to understand. The first part specifies the robot we are referring to. User-agent: BotName Code (markup): Replace BotName with the robot name in question. To address all of them, simply use an asterisk. User-agent: * Code (markup): The second part tells the robot in question not to enter certain parts of your web site. Disallow: /cgi-bin/ Code (markup): In this example, any path on our site starting with the string /cgi-bin/ is declared off limits. Multiple paths can be excluded per robot by using several Disallow lines. User-agent: * Disallow: /cgi-bin/ Disallow: /temp/ Disallow: /private Code (markup): This robots.txt file would apply to all bots and instruct them to stay out of directories /cgi-bin/ and /temp/. It also tells them any path/URL on your site starting with /private (files and directories) is off limits. To declare your entire website off limits to BotName, use the example shown below. User-agent: BotName Disallow: / Code (markup): To have a generic robots.txt file which welcomes every robot and does not restrict them, use this sample. User-agent: * Disallow: Code (markup):
I don't see any mention of wild card entries or regular expressions for dynamic URLs in this post. I blogged on how to take advantage of wild card entries in robots.txt here: bluechipseo.com/2009/01/how-to-use-wildcard-entries-in-your.html
Thanks. Just gotta work out how to dis-allow access to the pages listen in the robots.txt If the pages disallowed have nothing to stop people from viewing them then your better off with no robots.txt, since it's a huge vulnerability and is often exploited by hackers.
Thank you... I think I need one for my forum, right? You should have one if you have a vBulletin forum?