I recently had the need to exclude dynamic URLs with the robots.txt file (the keyword suggestion tool was spawning hundreds of pages when someone would link directly to a results page). So I added this: User-agent: * Disallow: /tools/suggestion/? The interesting thing though is only some spiders seem to be able to understand the exclusion. Googlebot is smart enough to do it properly for example. The new MSN Bot on the other hand is not. - Shawn
You don't need the '?' You need only this: User-agent: * Disallow: /tools/suggestion/ I use also this trick in my site to disable lots of dynamic pages
Except I *do* want /tools/suggestion/ to be indexed. But *not* any page that starts with /tools/suggestion/? - Shawn
That is not in the standard. AFAIK the standart allows you only to disabble files or directories, althought google accepts wildcards (*.cgi for example).
I know it's not part of the official robots standard, but Google does adhere to it properly. Google uses it in their own robots file: http://www.google.com/robots.txt - Shawn
Building on Shawn's question... I have a nuke site where the structure for the content is /modules.php?name=ContentType Using .htaccess and mod_rewrite all sorts of good stuff gets done to this to get it looking search engine friendly. But, if I want to exclude some types of content but not others can I use my new urls? I'm guessing that because the bots look at robots.txt before getting any content that they will obey the dummy name. Is this right?