Text that I don't want google to see

Discussion in 'Site & Server Administration' started by Master_Shake, Oct 29, 2007.

  1. #1
    I have certain text on a site that I don't want google to see because it is a site I don't want found via a search engine. How can I block the content on my site from being seen by the google bots.
     
    Master_Shake, Oct 29, 2007 IP
  2. mooiness

    mooiness Peon

    Messages:
    409
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Add a "robots.txt" file in the top level of your web folder, with the following in it:
    User-agent: *
    Disallow: /*
     
    mooiness, Oct 29, 2007 IP
  3. Shazz

    Shazz Prominent Member

    Messages:
    8,395
    Likes Received:
    453
    Best Answers:
    0
    Trophy Points:
    330
    #3
    That what I would use to block the google bots, but you can't hide from google itself :)
     
    Shazz, Oct 29, 2007 IP
    sufi likes this.
  4. ForgottenCreature

    ForgottenCreature Notable Member

    Messages:
    7,473
    Likes Received:
    173
    Best Answers:
    0
    Trophy Points:
    260
    #4
    But why would someone want to block google from certain pages?
     
    ForgottenCreature, Oct 29, 2007 IP
  5. ____________

    ____________ Peon

    Messages:
    117
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    yes you can, redirect google to a 404
     
    ____________, Oct 29, 2007 IP
  6. mooiness

    mooiness Peon

    Messages:
    409
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #6
    @ForgottenCreature: that rule specified is a bit more restrictive than normal, but ppl do restrict certain pages from being crawled, for various reasons. Most of them are to do with duplicate content and the like.

    As for the OP's intentions, perhaps he/she wants to make the material available to family and friends over the net but don't want randoms to drop by.
     
    mooiness, Oct 29, 2007 IP
  7. Master_Shake

    Master_Shake Member

    Messages:
    68
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #7
    It's a page with some private information that isn't intended for the public that I don't want coming up in google on some search terms (which it already is, so I am getting it fixed.)
     
    Master_Shake, Oct 29, 2007 IP
  8. Master_Shake

    Master_Shake Member

    Messages:
    68
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #8
    I just want to block it from 1 page, not the whole site, how can I do that?
     
    Master_Shake, Nov 1, 2007 IP
  9. vpguy

    vpguy Guest

    Messages:
    275
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Use robots.txt and add a disallow statement for each page you don't want to be visible on Google.

    Disallow: /privatepage1.htm
    Disallow: /privatepage2.htm


    etc.

    Since the pages in question are already in Google you may want to rename them to something different if you can easily do that without much hassle.

    And while it's slightly redundant, you could also add a rel="nofollow" to any links you have on your site which point to those pages.

    If you want to get a little more advanced, you could return a 401 Unauthorized if someone attempts to access it from a referring site other than your own by checking the HTTP_REFERER variable.

    Or you could put the pages in a password-protected directory. There are lots of ways to go about it.
     
    vpguy, Nov 1, 2007 IP
  10. ermac0

    ermac0 Peon

    Messages:
    9
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Or you can add a tag to your HEAD section on that page
    <meta name="robots" content="noindex">
     
    ermac0, Nov 2, 2007 IP
  11. kirby009

    kirby009 Peon

    Messages:
    608
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #11
    what ever is on the web google will see.
     
    kirby009, Nov 2, 2007 IP
  12. murku

    murku Peon

    Messages:
    407
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Not true, G follows the rules
     
    murku, Nov 2, 2007 IP
  13. Sythe

    Sythe Peon

    Messages:
    24
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    You can forcefully disallow it with modrewrite

    .htacess:
    
    RewriteEngine On
    RewriteBase /
    RewriteCond %{HTTP_USER_AGENT} (crawl|bot|google|yahoo) [NC]
    RewriteRule your_page - [F,L]
    Code (markup):
    This simply forbids any http request which provides a useragent containing 'crawl', 'bot', 'google' or 'yahoo' to the page 'your_page'. They will get a 403 - Forbidden message.
     
    Sythe, Nov 2, 2007 IP