I have a dynamic site that is producing duplicate content. My problem is the cgi program produces both of the following URLs for the same page and the writers of the program say there is no way to block them from being produced. I want to keep this version:http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147 and block this version:http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets Can I do this with a line in my robots file? Would the following work to block the longer of the two URLs? User-Agent: * Disallow: /dir.cgi/
You can avoid the duplicate content with robots.txt, but the one you suggest is not going to do what you expect. Use this robots.txt : User-Agent: * Disallow: /cgi-bin/pseek/dirs.cgilv Code (markup): Jean-Luc
Why did you add the "lv" after the "dirs.cgi"? Btw, I have the same problem too and I am looking for an answer. Basically, I have this: http://www.project4hire.com/web-development-promotion-projects.php and this: http://www.project4hire.com/index.php?a=myareas&area=504&mode=&order=timeleft_ASC& They are basically the same content. I want to block all index.php?a=myareas&...... How do I do that?
Hi, Disallow: /cgi-bin/pseek/dirs.cgilv disallows access to all URL's starting with /cgi-bin/pseek/dirs.cgilv. Disallow: /cgi-bin/pseek/dirs.cgi would have disallowed access to all URL's starting with /cgi-bin/pseek/dirs.cgi. So, it was not necessary to add the lv at the end. To block all URL's starting with /index.php?a=, you can use : Disallow: /index.php?a= Jean-Luc