Robots.txt disallow

beeweb Peon

Messages:: 54

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#1

Hi, I'm trying to block some pages for spidering, but I'm afraid that I'll block too much.

I do want to block this part of my site:
/index.php?topic=blabla

But I do not want to block this:
/index.php?topic=blabla&page=2

So, should this do the trick, or do I block too much this way:

User-agent: *
/index.php?topic=blabla

beeweb, Dec 29, 2006 IP

Raizous Peon

Messages:: 39

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#2

just put the Allow: /index.php?topic=.... and so on dude...

Raizous, Dec 29, 2006 IP

TheMadHat Peon

Messages:: 88

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 0

#3

User-agent: *
Disallow: /index.php?topic=*

User-agent: *
Allow: /index.php?topic=*&page=*

TheMadHat, Dec 29, 2006 IP

beeweb Peon

Messages:: 54

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#4

Eh, I believe there is no â€œAllowâ€ command in the standard robots.txt protocol. So if "Allow" doesn't work, I'll block too much (just a few hundred pages too much...)

Guess I'll have to find another way of solving this problem.

beeweb, Dec 29, 2006 IP

TheMadHat Peon

Messages:: 88

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 0

#5

yes there is. http://www.robotstxt.org/wc/norobots-rfc.html

I also tested that with the robots.txt validation tool in google's webmaster console.

TheMadHat, Dec 29, 2006 IP

Monty Peon

Messages:: 1,363

Likes Received:: 132

Best Answers:: 0

Trophy Points:: 0

#6

TheMadHat said: ↑

yes there is. http://www.robotstxt.org/wc/norobots-rfc.html

I also tested that with the robots.txt validation tool in google's webmaster console.
Click to expand...

No, "Allow" isn't a standard robots.txt command.
Some bots recognize and respect it, but it's not a standard, only "disallow" is.

Monty, Dec 29, 2006 IP

beeweb Peon

Messages:: 54

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#7

Hey, that's great. Indeed lot's of SEO blogs etc. say that "Allow" isn't allowed, but since Google uses it and Google is 95% of my traffic I'll start adapting my robots.txt file right away.

Happy New Year from the Netherlands!

beeweb, Dec 29, 2006 IP

TheMadHat Peon

Messages:: 88

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 0

#8

It's supported by Yahoo as well, not sure about MSN

TheMadHat, Dec 29, 2006 IP

Log in or Sign up

Robots.txt disallow

beeweb Peon

Raizous Peon

TheMadHat Peon

beeweb Peon

TheMadHat Peon

Monty Peon

beeweb Peon

TheMadHat Peon

Useful Searches