My site has over 90,000 urls were indexed by Google. Lots of them are useless. Do you know any method to make them excluded by Google? I already tried disallow them in robots file and add noindex tag on pages a month ago, but most of them are still indexed. Do you know any method to make them out in a short time? Someone suggests me remove them in Google Webmaster Tools, but GWT says only 404 and 401 pages can be removed from the tool... Appreciate your time!
I am having similar problem. I recently migrated from wordpress to blogger and while my blog was on wordpress lot of duplicate pages were indexed in google e.g. mysite.com/tag/. Now google webmaster tools is detecting those pages slowly but steadily and deleting them. These pages show up in my crawl errors page in webmaster tools. @OP I suggest you use google webmaster tools. That is very useful. But I think there is no way to manually remove these pages from index. It will take time. I hope my serp will improve after all these pages are deleted from index. correct me if I am wrong
robots.txt file on the search engines crawl guide is great, if it does not work, then do not in your site to do those useless links on the page, so there is not a robot to crawl
robots.txt file will not stop the pages from appearing in Googles Index as they have already been indexed. If you no longer want the pages remove them from your server and 301 re-direct the url to the homepage or a page with similar content. Its not an issue having that many pages in the index anyway because Google will probably consider most of them useless and put them in the supplementary index.
you can remove those files from server otherwise robot.txt is sufficient to exclude url from search engines
If your site is static like html pages you can add noindex to the header. If it is dynamic database driven you can remove the posts and wait for google to not find them, add them to robots.txt is borderline not worth it. Once they are removed from the site or noindexed google will eventually remove them. It takes some time especially if you have a lot of them. You can request removal in webmaster tools, but best if you also remove them totally from your hosting.