Log in or Sign up

How to match patterns in robots.txt

Discussion in 'robots.txt' started by kiransarv, Nov 2, 2008.

kiransarv Peon

Messages:

6

Likes Received:

0

Best Answers:

0

Trophy Points:

0

#1

Hi all,

I have two dynamic URL pages;

1.http://mydomain.com/index?id=(.*)&query=(.*)
2.http://mydomain.com/index?id=(.*)&query=(.*)&start=10&pager.offset=(.*)

I want to allow robots to crawl the first page but i don't want robots to crawl the page with "&start"...How can i do this.

If I use

"Disallow: /index?id" will block both the URL patterns. So How can i be specific..

Please help me..

regards
kiran

kiransarv, Nov 2, 2008 IP
Aldo Peon

Messages:

99

Likes Received:

1

Best Answers:

0

Trophy Points:

0
#2
If I am correct, I belive you can use * for wild cards, so I think:

Disallow: /index.php?*&start=*

Code (markup):

However, someone correct me if I am wrong. I do know if you have your website setup with Google Webmasters Google will allow you to enter a URL and tell you whether or not it can index it from the data it got from the robots.txt
Aldo, Nov 8, 2008 IP

(You must log in or sign up to reply here.)