Denying Google

Discussion in 'Site & Server Administration' started by carmen, May 3, 2006.

  1. #1
    I'm looking for a foolproof method to prevent google spidering/indexing a portion of my website without affecting it's ability to spider/index the remainder of the site.

    The only way I could think of doing this was with htaccess using code I found on the net.

    <Limit GET>
    order allow,deny
    deny from 128.23.45.
    deny from 207.158.255.213
    allow from all
    </Limit>

    Would this do the job, or would it negatively affect the ranking of pages in other folders. I know I can use the noindex/nofollow tags, but there are search engines that don't obey these tags.
     
    carmen, May 3, 2006 IP
  2. fsmedia

    fsmedia Prominent Member

    Messages:
    5,163
    Likes Received:
    262
    Best Answers:
    0
    Trophy Points:
    390
    #2
    Why not just use robots.txt and select all Google partners? All Google spiders obey the robots.txt properly. Additionally, if you're adding 207.x IPs, those are msnbot, not Googlebot.
     
    fsmedia, May 3, 2006 IP
  3. carmen

    carmen Peon

    Messages:
    162
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    It's not so much just google, but all spiders I'd like to block...since not all spiders obey robots.txt, I thought IP blocking would be best.
     
    carmen, May 3, 2006 IP
  4. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #4
    As long as your know all the IP's of all these spiders you want to block...

    Jean-Luc
     
    Jean-Luc, May 4, 2006 IP
  5. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #5
    IP blocking isn't going to work because it's impossible to know the IP address of every spider in the world.
     
    digitalpoint, May 4, 2006 IP
  6. fsmedia

    fsmedia Prominent Member

    Messages:
    5,163
    Likes Received:
    262
    Best Answers:
    0
    Trophy Points:
    390
    #6
    You might be able to use REGEX and block UserAgents based on commonly known bot Agents. Additional to using the IP blocking and robots.txt. The combination of the three may be effective. But if you really don't want bots going to it, why not just password protect it? No one will find it except people you know because it wont be indexed anywhere.
     
    fsmedia, May 4, 2006 IP
  7. theblight

    theblight Peon

    Messages:
    246
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #7
    But regex may cause a lot of processing load the best option is the suggested one which is the robots.txt and the htaccess options but be careful with the htaccess option and limit it to a particular directory.
     
    theblight, May 5, 2006 IP
  8. fsmedia

    fsmedia Prominent Member

    Messages:
    5,163
    Likes Received:
    262
    Best Answers:
    0
    Trophy Points:
    390
    #8
    But ultimately, your best bet is to just password protect the page(s) since that's what you're doing against the searche engines already, in essence.
     
    fsmedia, May 5, 2006 IP
  9. theblight

    theblight Peon

    Messages:
    246
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #9
    yup i agree use the .htpasswd
     
    theblight, May 6, 2006 IP