Hi, I'm trying to block some pages for spidering, but I'm afraid that I'll block too much. I do want to block this part of my site: /index.php?topic=blabla But I do not want to block this: /index.php?topic=blabla&page=2 So, should this do the trick, or do I block too much this way: User-agent: * /index.php?topic=blabla
Eh, I believe there is no “Allow†command in the standard robots.txt protocol. So if "Allow" doesn't work, I'll block too much (just a few hundred pages too much...) Guess I'll have to find another way of solving this problem.
yes there is. http://www.robotstxt.org/wc/norobots-rfc.html I also tested that with the robots.txt validation tool in google's webmaster console.
No, "Allow" isn't a standard robots.txt command. Some bots recognize and respect it, but it's not a standard, only "disallow" is.
Hey, that's great. Indeed lot's of SEO blogs etc. say that "Allow" isn't allowed, but since Google uses it and Google is 95% of my traffic I'll start adapting my robots.txt file right away. Happy New Year from the Netherlands!