How to block spiders?

Discussion in 'robots.txt' started by mvhs, Jan 20, 2007.

  1. #1
    How do i block spiders like google from crawling a certain page?
     
    mvhs, Jan 20, 2007 IP
  2. giovanni

    giovanni Peon

    Messages:
    1,174
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I beleive you have to do something in the .htaccess file. Someone else may know
     
    giovanni, Jan 20, 2007 IP
  3. mvhs

    mvhs Active Member

    Messages:
    824
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    58
    #3
    Anyone can shed some light? Also will this affect rankings of my main site?
     
    mvhs, Jan 20, 2007 IP
  4. ajsa52

    ajsa52 Well-Known Member

    Messages:
    3,426
    Likes Received:
    125
    Best Answers:
    0
    Trophy Points:
    160
    #4
    ajsa52, Jan 20, 2007 IP
  5. Clive

    Clive Web Developer

    Messages:
    4,507
    Likes Received:
    297
    Best Answers:
    0
    Trophy Points:
    250
    #5
    Add the "nofollow" attribute to the links on your website that you want robots to ignore and not crawl.
     
    Clive, Jan 20, 2007 IP
  6. ajsa52

    ajsa52 Well-Known Member

    Messages:
    3,426
    Likes Received:
    125
    Best Answers:
    0
    Trophy Points:
    160
    #6
    NO, because you can't add "nofollow" on links (to that page) placed on other sites.
     
    ajsa52, Jan 20, 2007 IP
  7. Clive

    Clive Web Developer

    Messages:
    4,507
    Likes Received:
    297
    Best Answers:
    0
    Trophy Points:
    250
    #7
    Ah so the problem is of that nature.. Well, links on other websites can sometimes be controlled, can't they.. If not, then robots.txt is the way to go. Lines like the one below should do the trick:
     
    Clive, Jan 20, 2007 IP
  8. AlienGG

    AlienGG Banned

    Messages:
    983
    Likes Received:
    29
    Best Answers:
    0
    Trophy Points:
    0
    #8
    AlienGG, Jan 26, 2007 IP
  9. jonbt

    jonbt Peon

    Messages:
    11
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    The best way I think is a combination of robots.txt and meta tags.

    meta tags like the following in the header section of your pages will work the best.

    <meta name="googlebot" content="noarchive,noindex,nofollow,nosnippet" />
    <meta name="robots" content="noarchive,noindex,nofollow" />
    
    Code (markup):
     
    jonbt, Jan 30, 2007 IP