I'm looking for a foolproof method to prevent google spidering/indexing a portion of my website without affecting it's ability to spider/index the remainder of the site. The only way I could think of doing this was with htaccess using code I found on the net. <Limit GET> order allow,deny deny from 128.23.45. deny from 207.158.255.213 allow from all </Limit> Would this do the job, or would it negatively affect the ranking of pages in other folders. I know I can use the noindex/nofollow tags, but there are search engines that don't obey these tags.
Why not just use robots.txt and select all Google partners? All Google spiders obey the robots.txt properly. Additionally, if you're adding 207.x IPs, those are msnbot, not Googlebot.
It's not so much just google, but all spiders I'd like to block...since not all spiders obey robots.txt, I thought IP blocking would be best.
IP blocking isn't going to work because it's impossible to know the IP address of every spider in the world.
You might be able to use REGEX and block UserAgents based on commonly known bot Agents. Additional to using the IP blocking and robots.txt. The combination of the three may be effective. But if you really don't want bots going to it, why not just password protect it? No one will find it except people you know because it wont be indexed anywhere.
But regex may cause a lot of processing load the best option is the suggested one which is the robots.txt and the htaccess options but be careful with the htaccess option and limit it to a particular directory.
But ultimately, your best bet is to just password protect the page(s) since that's what you're doing against the searche engines already, in essence.