Hi, I've got a huge number of URL's that are currently being indexed by Google in particular that I want to remove from the index. They create a huge concern from a duplicate content perspective and some are having an impact on my users journey via specific keywords. My thoughts were to implement "noindex" meta tags on all these pages to first remove them from the index and then to add the directory that a majority of these pages sit in to the robots.txt file. Another course of action would be to simply add the directory to the robots.txt file and hope that the URL's would drop out of the index that way. Which is the best way to do this so that I can be sure all these URL's will drop out of the index? Thanks,
The "noindex" meta tags will work regardless of if you also add the links to your robots.txt file as well. As soon as Google Re-Crawls your website they will immediately see the meta tags and take those pages out of the index.
Do you want to make the pages not to be crawled by Google, then no index can be implemented, but will Gogole remove it from its own indexed servers??
i prefer using robots.txt.. the shorter the codes on your pages more easy for bots to crawl and index.. compiling them in one robots.txt is more easier to recognize those pages that you don't want to index..
robot.txt is more beneficial then nofollow meta tags. your pages will be deindexed by search engines.
I laugh when I read people saying - "the shorter the codes on your pages more easy for bots to crawl and index", this no longer truth, stop repeating what you read around. The crawlers are smarter in 2011 than 2005. Crawlers now days respect more the robots.txt rules, they download more bytes in-order to understand and categorized better a page. Crawlers are beginning to understand a real website vs a adsence website. You should mark with nofollow,noindex the pages you want to remove, also add the directory on your robots.txt file to ensure that they wont visit again. But only add the directory once they remove the pages other wise they will keep the pages for a while since the robots.txt file is not letting the crawlers get in to that folder.
To remove the page from Google's search result you can try these options suggested by Google http://www.google.com/support/webmasters/bin/answer.py?answer=164734 Add the directory and do NOT clog your robots.txt