Help with blocking duplicate content

northstar Peon

Messages:: 44

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

I have a dynamic site that is producing duplicate content. My problem is the cgi program produces both of the following URLs for the same page and the writers of the program say there is no way to block them from being produced.

I want to keep this version:http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147
and block this version:http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

Can I do this with a line in my robots file? Would the following work to block the longer of the two URLs?

User-Agent: *
Disallow: /dir.cgi/

northstar, Sep 7, 2006 IP

Jean-Luc Peon

Messages:: 601

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#2

You can avoid the duplicate content with robots.txt, but the one you suggest is not going to do what you expect.

Use this robots.txt :
User-Agent: * 
Disallow: /cgi-bin/pseek/dirs.cgilv
Code (markup):
Jean-Luc

Jean-Luc, Sep 8, 2006 IP

northstar Peon

Messages:: 44

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

Thank you the help. I will give it a try.

northstar, Sep 8, 2006 IP

ablaye Well-Known Member

Messages:: 4,024

Likes Received:: 97

Best Answers:: 0

Trophy Points:: 150

#4

Why did you add the "lv" after the "dirs.cgi"?
Btw, I have the same problem too and I am looking for an answer.

Basically, I have this:
http://www.project4hire.com/web-development-promotion-projects.php
and this:
http://www.project4hire.com/index.php?a=myareas&area=504&mode=&order=timeleft_ASC&

They are basically the same content.

I want to block all index.php?a=myareas&......

How do I do that?

ablaye, Sep 16, 2006 IP

Jean-Luc Peon

Messages:: 601

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#5

Hi,

Disallow: /cgi-bin/pseek/dirs.cgilv disallows access to all URL's starting with /cgi-bin/pseek/dirs.cgilv.

Disallow: /cgi-bin/pseek/dirs.cgi would have disallowed access to all URL's starting with /cgi-bin/pseek/dirs.cgi. So, it was not necessary to add the lv at the end.

To block all URL's starting with /index.php?a=, you can use :
Disallow: /index.php?a=

Jean-Luc

Jean-Luc, Sep 17, 2006 IP

Log in or Sign up

Help with blocking duplicate content

northstar Peon

Jean-Luc Peon

northstar Peon

ablaye Well-Known Member

Jean-Luc Peon

Useful Searches