Cheap Car Insurance - Remortgages - Personal Loans - Free ringtones and mp3 music - Loans

PDA

View Full Version : Help with using wildcard for dynamic URL


AlfaNet
Apr 29th 2008, 3:25 am
Hi all,
I run a script which generates plenty of pages with mtdomain.tld/?g_XXXXX, which are dupe contents.

I think these cold be blocked with the following lines in robot.txt file:

User-agent: *
Disallow: /*?


But I need to Allow bots to crawl pages starts only with /?g_page=XX. So I'm thinking of robot.txt as below:

User-agent: *
Disallow: /*?
Allow: /?g_pages=*

is the above order correct? or I need to put "Allow" first?

Will the above lines tell bots to follow ONLY urls having "/?g_page" in them and not any other urls with /?g ?

Any suggestion will be appreciated.

mistoovrool
May 13th 2008, 5:37 am
In google webmaster tool one facility available called
Analyze Robots.txt
From there you can place your robots.txt code and check whether crawler crawl that page or not.

mamina
Jun 12th 2008, 11:03 am
From what I understand wildcards cannot be used in Robots.txt files. If I am wrong please let me know but that is what I have read.

Zelo

manish.chauhan
Jun 13th 2008, 4:49 am
From what I understand wildcards cannot be used in Robots.txt files. If I am wrong please let me know but that is what I have read.

Zelo
Wildcards can be used in robots.txt as Google and yahoo bot supports and follow the wildcards in robots.txt..but MSN doesn't support the wildcards in robots.txt...:)