I would like to block some duplicate pages that my script is producing. I want to block this page: http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets But want to keep this page: http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147 Would this work to block the first URL without hurting the second one? User-Agent: * Disallow: /cgi-bin/pseek/dirs.cgilv Or would it be better to write out the full URL for each page I want to block like this. User-Agent: * Disallow: /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets I need to be very careful not to block the second URL (dirs2.cgi). Would there be any danger of blocking the second URL with any of the above robots.txt disallow's?
My understanding is that you want to block using the full URL. Others may have input on this as well.
I will explain how it works. Disallow: /blah_blah_blah Code (markup): This line blocks every URL starting with /blah_blah_blah. It does not block any other URL. It means that it disallows access to all these URL's : - /blah_blah_blah - /blah_blah_blah/ - /blah_blah_blah123 - /blah_blah_blah?who=you&where=here - /blah_blah_blah/subdir/my_file.html Jean-Luc
But if I use: Disallow: /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets It wouldn't inadvertently block other URL that contain /cgi-bin/pseek/ would it?
You would not block an URL like http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147, but you would block all URL's starting with /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets, including http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets. Jean-Luc
One more question. Would this Disallow: /cgi-bin/pseek/dirs.cgi?lv=2 also block this "/cgi-bin/pseek/dirs.cgi?st" or would it allow it.
"/cgi-bin/pseek/dirs.cgi?st" would not be blocked as it does not start with "/cgi-bin/pseek/dirs.cgi?lv=2". Jean-Luc