School Filters - Daily Horoscopes - Car Insurance - Free Advertising - Homeowner Loans

PDA

View Full Version : robots.txt Exclusion On Dynamic URLs


digitalpoint
Mar 16th 2004, 9:15 am
I recently had the need to exclude dynamic URLs with the robots.txt file (the keyword suggestion tool was spawning hundreds of pages when someone would link directly to a results page). So I added this:

User-agent: *
Disallow: /tools/suggestion/?

The interesting thing though is only some spiders seem to be able to understand the exclusion. Googlebot is smart enough to do it properly for example. The new MSN Bot on the other hand is not.

- Shawn

nlopes
Apr 3rd 2004, 7:09 am
You don't need the '?'

You need only this:
User-agent: *
Disallow: /tools/suggestion/

I use also this trick in my site to disable lots of dynamic pages

digitalpoint
Apr 3rd 2004, 9:19 am
Except I *do* want /tools/suggestion/ to be indexed. But *not* any page that starts with /tools/suggestion/?

- Shawn

nlopes
Apr 3rd 2004, 10:44 am
That is not in the standard.
AFAIK the standart allows you only to disabble files or directories, althought google accepts wildcards (*.cgi for example).

digitalpoint
Apr 3rd 2004, 11:01 am
I know it's not part of the official robots standard, but Google does adhere to it properly.

Google uses it in their own robots file:

http://www.google.com/robots.txt

- Shawn

sarahk
Apr 27th 2004, 11:16 pm
Building on Shawn's question...

I have a nuke site where the structure for the content is

/modules.php?name=ContentType

Using .htaccess and mod_rewrite all sorts of good stuff gets done to this to get it looking search engine friendly.

But, if I want to exclude some types of content but not others can I use my new urls? I'm guessing that because the bots look at robots.txt before getting any content that they will obey the dummy name.

Is this right?