Robots.txt help

northstar Peon

Messages:: 44

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

I would like to block some duplicate pages that my script is producing.

I want to block this page: http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

But want to keep this page: http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147

Would this work to block the first URL without hurting the second one?

User-Agent: *
Disallow: /cgi-bin/pseek/dirs.cgilv

Or would it be better to write out the full URL for each page I want to block like this.

User-Agent: *
Disallow: /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

I need to be very careful not to block the second URL (dirs2.cgi). Would there be any danger of blocking the second URL with any of the above robots.txt disallow's?

northstar, Sep 12, 2006 IP

noppid gunnin' for the quota

Messages:: 4,246

Likes Received:: 232

Best Answers:: 0

Trophy Points:: 135

#2

My understanding is that you want to block using the full URL. Others may have input on this as well.

noppid, Sep 12, 2006 IP

Jean-Luc Peon

Messages:: 601

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#3

I will explain how it works.
Disallow: /blah_blah_blah
Code (markup):
This line blocks every URL starting with /blah_blah_blah. It does not block any other URL.

It means that it disallows access to all these URL's :
- /blah_blah_blah
- /blah_blah_blah/
- /blah_blah_blah123
- /blah_blah_blah?who=you&where=here
- /blah_blah_blah/subdir/my_file.html

Jean-Luc

Jean-Luc, Sep 12, 2006 IP

northstar Peon

Messages:: 44

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

But if I use:
Disallow: /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

It wouldn't inadvertently block other URL that contain /cgi-bin/pseek/ would it?

northstar, Sep 12, 2006 IP

mad4 Peon

Messages:: 6,986

Likes Received:: 493

Best Answers:: 0

Trophy Points:: 0

#5

google sitemaps has a robots.txt checker that works very well.

mad4, Sep 12, 2006 IP

Jean-Luc Peon

Messages:: 601

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#6

northstar said: ↑

But if I use:
Disallow: /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets

It wouldn't inadvertently block other URL that contain /cgi-bin/pseek/ would it?
Click to expand...

You would not block an URL like http://www.example.com/cgi-bin/pseek/dirs2.cgi?cid=147,
but you would block all URL's starting with /cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets,
including http://www.example.com/cgi-bin/pseek/dirs.cgilv=2&ct=category_widgets.

Jean-Luc

Jean-Luc, Sep 12, 2006 IP

northstar Peon

Messages:: 44

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#7

Thanks for all your help. That answered all my questions.

northstar, Sep 12, 2006 IP

northstar Peon

Messages:: 44

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#8

One more question.
Would this
Disallow: /cgi-bin/pseek/dirs.cgi?lv=2

also block this "/cgi-bin/pseek/dirs.cgi?st" or would it allow it.

northstar, Sep 12, 2006 IP

Jean-Luc Peon

Messages:: 601

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#9

"/cgi-bin/pseek/dirs.cgi?st" would not be blocked as it does not start with "/cgi-bin/pseek/dirs.cgi?lv=2".

Jean-Luc

Jean-Luc, Sep 12, 2006 IP

Log in or Sign up

Robots.txt help

northstar Peon

noppid gunnin' for the quota

Jean-Luc Peon

northstar Peon

mad4 Peon

Jean-Luc Peon

northstar Peon

northstar Peon

Jean-Luc Peon

Useful Searches