Robots.txt help

Discussion in 'robots.txt' started by northstar, Sep 12, 2006.

  1. #1
    I would like to block some duplicate pages that my script is producing.

    I want to block this page: http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

    But want to keep this page: http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147

    Would this work to block the first URL without hurting the second one?

    User-Agent: *
    Disallow: /cgi-bin/pseek/dirs.cgilv

    Or would it be better to write out the full URL for each page I want to block like this.

    User-Agent: *
    Disallow: /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

    I need to be very careful not to block the second URL (dirs2.cgi). Would there be any danger of blocking the second URL with any of the above robots.txt disallow's?
     
    northstar, Sep 12, 2006 IP
  2. noppid

    noppid gunnin' for the quota

    Messages:
    4,246
    Likes Received:
    232
    Best Answers:
    0
    Trophy Points:
    135
    #2
    My understanding is that you want to block using the full URL. Others may have input on this as well.
     
    noppid, Sep 12, 2006 IP
  3. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I will explain how it works.
    Disallow: /blah_blah_blah
    Code (markup):
    This line blocks every URL starting with /blah_blah_blah. It does not block any other URL.

    It means that it disallows access to all these URL's :
    - /blah_blah_blah
    - /blah_blah_blah/
    - /blah_blah_blah123
    - /blah_blah_blah?who=you&where=here
    - /blah_blah_blah/subdir/my_file.html

    Jean-Luc
     
    Jean-Luc, Sep 12, 2006 IP
  4. northstar

    northstar Peon

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    But if I use:
    Disallow: /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

    It wouldn't inadvertently block other URL that contain /cgi-bin/pseek/ would it?
     
    northstar, Sep 12, 2006 IP
  5. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #5
    google sitemaps has a robots.txt checker that works very well.
     
    mad4, Sep 12, 2006 IP
  6. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Jean-Luc, Sep 12, 2006 IP
  7. northstar

    northstar Peon

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Thanks for all your help. That answered all my questions.
     
    northstar, Sep 12, 2006 IP
  8. northstar

    northstar Peon

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    One more question.
    Would this
    Disallow: /cgi-bin/pseek/dirs.cgi?lv=2

    also block this "/cgi-bin/pseek/dirs.cgi?st" or would it allow it.
     
    northstar, Sep 12, 2006 IP
  9. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #9
    "/cgi-bin/pseek/dirs.cgi?st" would not be blocked as it does not start with "/cgi-bin/pseek/dirs.cgi?lv=2".

    Jean-Luc
     
    Jean-Luc, Sep 12, 2006 IP