You know, I have a similar sort of problem, set up robots.txt, google has a sitemap and i use nofollow everywhere i can and it still seems to have some pages in google that i dont want indexing. Think these were from before i set up the robots.txt file but even so its been long enough since it changed! Would be great if someone knows a way to force a page out that google has indexed.
1)Verify your site in GOOGLE WEBMASTER TOOL 2) There is an option for remove URL from Google INdex.... 3) Select the URL and submit it... On the next Google cache in will be removed from Google Index... 4) If you just want to avoid Google cache use a robots.txt Or meta tags..... Use noindex , nofollw but you should select the search engine ... which you want./....... thank you
This method works when the page is giving 404 error. If the page is live, you have to put noindex,nofollow or block it through robots.txt
is it possible to block with this? http://www.searchenginepromotionhelp.com/m/robots-text-creator/simple-robots-creator.php what do i type there? seems so difficult
It still works whether you have a 404 error or not - I know, because I done it recently. It will de-index any pages or directories specified in around 24 hours - Yahoo Site Explorer also offers a similar function which works just as well.
This is correct - one of the very few useful functions of Google Webmasters Tools. The so called top searches are often incorrect and way off target, and the back links listed are often trash links and sometimes even "rel-no follow" links.
Another solution would be to upload a .htaccess file containing a 301 redirect for each old page to each new page e.g. redirect 301 /oldfile1.htm http://www.yourdomain.com/newfile1.htm redirect 301 /oldfile2.htm http://www.yourdomain.com/newfile2.htm Code (markup):
you mean that google can delete old cache from a page even its still live that page?and the cache doesnt return back?
If you only want to remove the cached pages and keep the site indexed in google best thing is to use this meta tag. <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE"> I do it for my site. it works fine. If you use robots.txt Google will not index those pages.