Help with blocking duplicate content

Discussion in 'robots.txt' started by northstar, Sep 7, 2006.

  1. #1
    I have a dynamic site that is producing duplicate content. My problem is the cgi program produces both of the following URLs for the same page and the writers of the program say there is no way to block them from being produced.

    I want to keep this version:http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147
    and block this version:http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

    Can I do this with a line in my robots file? Would the following work to block the longer of the two URLs?

    User-Agent: *
    Disallow: /dir.cgi/
     
    northstar, Sep 7, 2006 IP
  2. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You can avoid the duplicate content with robots.txt, but the one you suggest is not going to do what you expect.

    Use this robots.txt :
    User-Agent: * 
    Disallow: /cgi-bin/pseek/dirs.cgilv
    Code (markup):
    Jean-Luc
     
    Jean-Luc, Sep 8, 2006 IP
  3. northstar

    northstar Peon

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thank you the help. I will give it a try.
     
    northstar, Sep 8, 2006 IP
  4. ablaye

    ablaye Well-Known Member

    Messages:
    4,024
    Likes Received:
    97
    Best Answers:
    0
    Trophy Points:
    150
    #4
    ablaye, Sep 16, 2006 IP
  5. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Hi,

    Disallow: /cgi-bin/pseek/dirs.cgilv disallows access to all URL's starting with /cgi-bin/pseek/dirs.cgilv.

    Disallow: /cgi-bin/pseek/dirs.cgi would have disallowed access to all URL's starting with /cgi-bin/pseek/dirs.cgi. So, it was not necessary to add the lv at the end.

    To block all URL's starting with /index.php?a=, you can use :
    Disallow: /index.php?a=

    Jean-Luc
     
    Jean-Luc, Sep 17, 2006 IP