Okay, I have never written robots.txt before, because I'm scared about making a mistake. Now I need to do it and do it right. I discovered that some of my indexing problems are related to my SSL certificate, which generates duplicate copies of my pages that Google is confiusing with my main page. A 301 redirect fixed this right up-but the problem is that my SSL certifcate, and hence my entire ordering set-up, no longer worked, since it was trying to access SSL URL's but being directed to the http instead. Here's what I want to do: exclude ONLY the ssl pages from indexing. .sslpowered.mydomain.com/mydomain.com but I want all of my regular pages to still be crawled. How can I write this properly, and how can I verify that I did not exclude my http: pages?
Okay, so this has turned out to be more complicated than I originally thought. I found a two-part solution involving mod_rewrite and two robots.txt here: http://www.seoworkers.com/seo-articles-tutorials/robots-and-https.html In .htaccess RewriteEngine on Options +FollowSymlinks RewriteCond %{SERVER_PORT} ^443$ RewriteRule ^robots.txt$ robots_ssl.txt in robots.txt User-agent: * Allow: / in robots_ssl.txt: User-agent: * Disallow: / So this ought to redirect all robots to the disallow file for any SSL/https pages, which is great, since I don't want those to be indexed, while allowing all robots to crawl http pages on my five domains. My only problem at this point is how to verify this. The regular robots.txt file is in my root directory, as is the ssl version, so I don't know if that's correct. There isn't really an address for it there-it's just www/robots.txt. Should I have this in the domain folder as well/instead? Anybody familiar with this who can put me on steadier ground with this?
Does Search engines search for https://www.domain.com/robots.txt before downloading any urls with https: ?? If yes, it will work, else it will not. Yes. It should be in the root folder of your https: url.
Well, webmaster tools gave me the first answer: it delivered a 404 for the robots.txt that wasn't in my domain directory, so, I've added the Allow file to all my directories. Still haven't decided if I need to have the SSL Robots in each directory either, since it's location is URL accessible as is. @jimkarter- As far as i can tell and according to this article, secure requests use a different port. That's what the mod_rewrite is for. It doesn't call the robots.txt first, it goes to the page and then gets directed to the alternate robots.txt for the disallow command. I'll let you know if it works. So far, it seems sound. Anybody friends with a Googlebot?
If you have a https folder u should use: RewriteCond %{HTTPS} on RewriteRule ^robots\.txt$ robots-https.txt Code (markup):
If your pages are .php, you may try adding the following in your document headers: <?php if ( isset($_SERVER['HTTPS']) || (isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS'])) == 'on' ) {echo '<meta name="robots" content="noindex,nofollow,noarchive" />'."\n";} else {echo '<meta name="robots" content="index,follow" />'."\n";} ?>