Hi all, I run a script which generates plenty of pages with mtdomain.tld/?g_XXXXX, which are dupe contents. I think these cold be blocked with the following lines in robot.txt file: User-agent: * Disallow: /*? Code (markup): But I need to Allow bots to crawl pages starts only with /?g_page=XX. So I'm thinking of robot.txt as below: User-agent: * Disallow: /*? Allow: /?g_pages=* Code (markup): is the above order correct? or I need to put "Allow" first? Will the above lines tell bots to follow ONLY urls having "/?g_page" in them and not any other urls with /?g ? Any suggestion will be appreciated.
In google webmaster tool one facility available called Analyze Robots.txt From there you can place your robots.txt code and check whether crawler crawl that page or not.
From what I understand wildcards cannot be used in Robots.txt files. If I am wrong please let me know but that is what I have read. Zelo
Wildcards can be used in robots.txt as Google and yahoo bot supports and follow the wildcards in robots.txt..but MSN doesn't support the wildcards in robots.txt...