Seriously, I'm pissed off. I have a domain that's for tests only, and G is indexing more pages of it everyday. The content of it's robots.txt is: ----------------------- User-Agent: * Disallow: / ----------------------- Please somebody tell me how to completely remove my test domain from G, or it will start hurting my actual domain ratings(duplicated content). Thanks....
Besides the robots.txt file, you can put noindex in the header of your pages. I dont remember the exact code, so you will have to look it up. But its a header tag that tells search engines not to index that page.
Thanks for the reply! The problem is that I'd have to noindex all pages... Doesn't Google provide a form for webmasters to remove their domains? That'd be the best IMO.
You could use a 301 redirect Webmaster tools remove tool Or password protect that directory That'll stop 'em.
You should not rely on robots.txt file only. I had some cases that while google visiting the pages the server didn't show robots file, or show it blank. So the spider seeing no restrictions indexed everything. If you want to restrict the whole content, yes, you should put meta tags to all the pages in the header. BTW in webmasters there is an option to remove pages from an index.
Even if you go into your google webmaster tools and request that the pages be removed - it clearly states that unless you take measures to stop the indexing, the pages will get indexed again. So even if you ask google to remove the page, its still going to be indexed again. You could use htaccess and htpasswd to password protect the directory. Just do some google searches for htpasswd and you should be able to find something. I'am using this same method to protect a backup script for my sites. Once you have the file in place, when someone goes to view the site, or any directory that is under the .htaaccess file, they will be promoted for a username and password.