Denying Google

carmen Peon

Messages:: 162

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

I'm looking for a foolproof method to prevent google spidering/indexing a portion of my website without affecting it's ability to spider/index the remainder of the site.

The only way I could think of doing this was with htaccess using code I found on the net.

<Limit GET>
order allow,deny
deny from 128.23.45.
deny from 207.158.255.213
allow from all
</Limit>

Would this do the job, or would it negatively affect the ranking of pages in other folders. I know I can use the noindex/nofollow tags, but there are search engines that don't obey these tags.

carmen, May 3, 2006 IP

fsmedia Prominent Member

Messages:: 5,163

Likes Received:: 262

Best Answers:: 0

Trophy Points:: 390

#2

Why not just use robots.txt and select all Google partners? All Google spiders obey the robots.txt properly. Additionally, if you're adding 207.x IPs, those are msnbot, not Googlebot.

fsmedia, May 3, 2006 IP

carmen Peon

Messages:: 162

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

It's not so much just google, but all spiders I'd like to block...since not all spiders obey robots.txt, I thought IP blocking would be best.

carmen, May 3, 2006 IP

Jean-Luc Peon

Messages:: 601

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#4

carmen said:

since not all spiders obey robots.txt, I thought IP blocking would be best.
Click to expand...

As long as your know all the IP's of all these spiders you want to block...

Jean-Luc

Jean-Luc, May 4, 2006 IP

digitalpoint Overlord of no one Staff

Messages:: 38,334

Likes Received:: 2,613

Best Answers:: 462

Trophy Points:: 710

Digital Goods:: 29

#5

IP blocking isn't going to work because it's impossible to know the IP address of every spider in the world.

digitalpoint, May 4, 2006 IP

fsmedia Prominent Member

Messages:: 5,163

Likes Received:: 262

Best Answers:: 0

Trophy Points:: 390

#6

You might be able to use REGEX and block UserAgents based on commonly known bot Agents. Additional to using the IP blocking and robots.txt. The combination of the three may be effective. But if you really don't want bots going to it, why not just password protect it? No one will find it except people you know because it wont be indexed anywhere.

fsmedia, May 4, 2006 IP

theblight Peon

Messages:: 246

Likes Received:: 9

Best Answers:: 0

Trophy Points:: 0

#7

But regex may cause a lot of processing load the best option is the suggested one which is the robots.txt and the htaccess options but be careful with the htaccess option and limit it to a particular directory.

theblight, May 5, 2006 IP

fsmedia Prominent Member

Messages:: 5,163

Likes Received:: 262

Best Answers:: 0

Trophy Points:: 390

#8

But ultimately, your best bet is to just password protect the page(s) since that's what you're doing against the searche engines already, in essence.

fsmedia, May 5, 2006 IP

theblight Peon

Messages:: 246

Likes Received:: 9

Best Answers:: 0

Trophy Points:: 0

#9

fsmedia said:

But ultimately, your best bet is to just password protect the page(s) since that's what you're doing against the searche engines already, in essence.
Click to expand...

yup i agree use the .htpasswd

theblight, May 6, 2006 IP

Log in or Sign up

Denying Google

carmen Peon

fsmedia Prominent Member

carmen Peon

Jean-Luc Peon

digitalpoint Overlord of no one Staff

fsmedia Prominent Member

theblight Peon

fsmedia Prominent Member

theblight Peon

Useful Searches