Hey y'all I've been looking for this, ie google, yahoo and in this forum; cant find it though I know there are a few lines of text that can go into the robots.txt file to stop bots crawling pages with querystrings, I remember Shawn posting about it quite a while back Does anyone know how to?
What about putting rel="nofollow" on the end of URLs that you don't want the G/Y/M bots to follow? -jay
I found Shawn's post over at http://forums.digitalpoint.com/showthread.php?t=106&highlight=robots.txt From reading that, I reckon this will work User-agent: * Disallow: /? Does that mean everypage on my site with a querystring will not be indexed? Thats what I want to happen
Aye - that should work. You can also use the NOFOLLOW tag, or since you're programming the page, you can check the user agent and not hit the database if they're still hitting your querystring pages. We've had problems with unfriendly robots not following directions -- generally they're from overseas or they're a directory site here in the states coming back to steal more content. We've either had to block their ip range or serve up bad static data just because they don't play nice. One directory didn't have quality control over their web scraping, so we filled in with text pointing out our original website was better than their directory and they should come to the official website. Worked great for about a year till they finally caught on.