0 down vote favorite Hello briliant pips. I'm actually scouring the web for the right terms for this question but after a few hours I decided to post my question here. The scenario is: we have a website running on two servers. So the files/website is synchronized in these two servers. We have a second server for internal purposes. Let's name the first server as www and ww2 for the second server. ww2 is automatically updated once the files are updated in www. Now, Google is indexing the ww2 which I want to stop and just let www be crawled and indexed. My questions are: 1. How can I removed those crawled pages in ww2 removed from Google index? 2. How can I stop google from indexing ww2? Thank you in advance.
Well - why don't you use robots.txt? Block google for indexing anything on your second server. Use this for robots.txt: http://www.robotstxt.org/robotstxt.html
hi! thanks for your reply but I'm worried because whatever I applied to robots.txt on server2 will automatically synchronize to robots.txt of server1 and vice versa.
nirajkum. your response really enlightened my deeming hope to solve this issue. Anyways, I'll try to coordinate with the person in charge there and see if it's possible with our current set up. Thanks again!
Servers 1 & 2 should have the same domain name, even if it means setting #2 as an alias. Googlebot can only follow links, and can not browse a folder. So, it really has no idea what machine it is reaching when it requests a page. It simply requests a URL. If the request is routed to one machine today and a different machine on the same LAN tomorrow, Googlebot has no way to know it. All it knows is that it requests a page and gets a server response (header information) and page content. You can use something like Lynx Viewer to see what is returned to the bot when a request is made. If both machines return the same info, you're fine.