robots.txt Exclusion On Dynamic URLs

digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#1

I recently had the need to exclude dynamic URLs with the robots.txt file (the keyword suggestion tool was spawning hundreds of pages when someone would link directly to a results page). So I added this:

User-agent: *
Disallow: /tools/suggestion/?

The interesting thing though is only some spiders seem to be able to understand the exclusion. Googlebot is smart enough to do it properly for example. The new MSN Bot on the other hand is not.

- Shawn

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Mar 16, 2004 IP
nlopes Guest

Messages:

103

Likes Received:

1

Best Answers:

0

Trophy Points:

0

#2

You don't need the '?'

You need only this:
User-agent: *
Disallow: /tools/suggestion/

I use also this trick in my site to disable lots of dynamic pages

nlopes, Apr 3, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#3

Except I *do* want /tools/suggestion/ to be indexed. But *not* any page that starts with /tools/suggestion/?

- Shawn

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Apr 3, 2004 IP
nlopes Guest

Messages:

103

Likes Received:

1

Best Answers:

0

Trophy Points:

0

#4

That is not in the standard.
AFAIK the standart allows you only to disabble files or directories, althought google accepts wildcards (*.cgi for example).

nlopes, Apr 3, 2004 IP
digitalpoint Overlord of no one Staff

Messages:

38,334

Likes Received:

2,613

Best Answers:

462

Trophy Points:

710

Digital Goods:

29

#5

I know it's not part of the official robots standard, but Google does adhere to it properly.

Google uses it in their own robots file:

http://www.google.com/robots.txt

- Shawn

If you contact me privately for support, I'll direct you to the correct support forum. Save time and go there first.
Ingress Intel

digitalpoint, Apr 3, 2004 IP
sarahk iTamer Staff

Messages:

29,016

Likes Received:

4,584

Best Answers:

124

Trophy Points:

665

#6

Building on Shawn's question...

I have a nuke site where the structure for the content is

/modules.php?name=ContentType

Using .htaccess and mod_rewrite all sorts of good stuff gets done to this to get it looking search engine friendly.

But, if I want to exclude some types of content but not others can I use my new urls? I'm guessing that because the bots look at robots.txt before getting any content that they will obey the dummy name.

Is this right?

► PayPal and the negative balance
► Cabin Hire Prices
► If I go AWOL, check my insta @itamernz

sarahk, Apr 27, 2004 IP
Alahad Peon

Messages:

10

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#7

i need command to allow sitemap.xml for robots.txt

Alahad, Jul 31, 2009 IP

Log in or Sign up

robots.txt Exclusion On Dynamic URLs

digitalpoint Overlord of no one Staff

nlopes Guest

digitalpoint Overlord of no one Staff

nlopes Guest

digitalpoint Overlord of no one Staff

sarahk iTamer Staff

Alahad Peon

Log in or Sign up

robots.txt Exclusion On Dynamic URLs

digitalpoint Overlord of no one Staff

nlopes Guest

digitalpoint Overlord of no one Staff

nlopes Guest

digitalpoint Overlord of no one Staff

sarahk iTamer Staff

Alahad Peon

Useful Searches