.htaccess help in order to NOT get indexed.

Discussion in 'Search Engine Optimization' started by Uban, Aug 31, 2006.

  1. #1
    Hello,

    I have just developed a highly optimized website which I hope will do well in organic searches.

    I would like to test run a duplicate of the site on PPC arena's for conversions and do not want to get penalized in the search engines for duplicate content.

    Anyone have pointers? I'm sure it's something to do with adding some lines in my .htaccess file in my root directory, but I'm not sure what. Note: the site does have a coupld of direcotries, i.e, mysite.com/directory1 and mysite.com/directory2.

    Thanks!
     
    Uban, Aug 31, 2006 IP
  2. SEO Tutor©

    SEO Tutor© Peon

    Messages:
    370
    Likes Received:
    23
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Just create a robots.txt file with the following in it and upload to the root...

    User-agent: *
    Disallow: /

    You could also add some meta tags inside the head:

    <meta name="robots" content="noindex,nofollow">
     
    SEO Tutor©, Aug 31, 2006 IP
  3. rehash

    rehash Well-Known Member

    Messages:
    1,502
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    150
    #3
    yes, robots.txt is the easiest way to do it
     
    rehash, Sep 1, 2006 IP
  4. SEO Tutor©

    SEO Tutor© Peon

    Messages:
    370
    Likes Received:
    23
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Access Restriction

    Blocking of Robots

    Description:
    How can we block a really annoying robot from retrieving pages of a specific webarea? A /robots.txt file containing entries of the "Robot Exclusion Protocol" is typically not enough to get rid of such a robot.​

    Solution:
    We use a ruleset which forbids the URLs of the webarea /~quux/foo/arc/ (perhaps a very deep directory indexed area where the robot traversal would create big server load). We have to make sure that we forbid access only to the particular robot, i.e. just forbidding the host where the robot runs is not enough. This would block users from this host, too. We accomplish this by also matching the User-Agent HTTP header information. ​

    RewriteCond %{HTTP_USER_AGENT}   ^NameOfBadRobot.*      
    RewriteCond %{REMOTE_ADDR}       ^123\.45\.67\.[8-9]$
    RewriteRule ^/~quux/foo/arc/.+   -   [F]
    Code (markup):
    source: http://httpd.apache.org/docs/1.3/misc/rewriteguide.html

    .
    .
     
    SEO Tutor©, Sep 1, 2006 IP
    ahkip likes this.
  5. Uban

    Uban Peon

    Messages:
    144
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Awesome. Thanks everyone. :cool:
     
    Uban, Sep 1, 2006 IP