Hello, I have just developed a highly optimized website which I hope will do well in organic searches. I would like to test run a duplicate of the site on PPC arena's for conversions and do not want to get penalized in the search engines for duplicate content. Anyone have pointers? I'm sure it's something to do with adding some lines in my .htaccess file in my root directory, but I'm not sure what. Note: the site does have a coupld of direcotries, i.e, mysite.com/directory1 and mysite.com/directory2. Thanks!
Just create a robots.txt file with the following in it and upload to the root... User-agent: * Disallow: / You could also add some meta tags inside the head: <meta name="robots" content="noindex,nofollow">
Access Restriction Blocking of Robots Description: How can we block a really annoying robot from retrieving pages of a specific webarea? A /robots.txt file containing entries of the "Robot Exclusion Protocol" is typically not enough to get rid of such a robot. Solution: We use a ruleset which forbids the URLs of the webarea /~quux/foo/arc/ (perhaps a very deep directory indexed area where the robot traversal would create big server load). We have to make sure that we forbid access only to the particular robot, i.e. just forbidding the host where the robot runs is not enough. This would block users from this host, too. We accomplish this by also matching the User-Agent HTTP header information. RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot.* RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.[8-9]$ RewriteRule ^/~quux/foo/arc/.+ - [F] Code (markup): source: http://httpd.apache.org/docs/1.3/misc/rewriteguide.html . .