Could anyone tell me, what is the best way of a. blocking google’s search spiders from individual pages while b. still allowing google adsense bots to see the page content and place contextual ads correctly. Background Two of my sites deal with redwidgets and bluewidgets. redwidget.com is loved by all Search Engines. bluewidget.com is sandboxed by Google and loved by Yahoo and MSN. I am going to transfer some of the sandboxed material on bluewidget.com onto redwidget.com then block the googlebot from those pages on redwidget.com so there are no duplicate content concerns. Thank you.
The spider for AdSense has a different name than the search spider. So you can use robots.txt to block Googlebot from certain pages and the AdSense spider will still be allowed to go there unless you specifically deny it. User-agent: Googlebot Allow: / Disallow: /nogooglesearch.html Disallow: /nogooglesearch2.html If you subscribe to the Google Sitemaps tool (even if you don't have a sitemap to submit) you can test your robots.txt file against different pages on your site and different Google spiders to make sure it is working how you intended.
Thanks tflight. I've decided not to block googlebot because of this: http://www.mattcutts.com/blog/crawl-caching-proxy/ Google can take content from their mediabot for the search engine results. Although it may not present a danger right now, it may in future. I see, as a result of your answer rather than the original question, this thread has been moved from adsense to the robots.txt forum. Perhaps it's not the best place for it.
Hi, You can use this in the pages that should not be indexed in Google search engine : <meta name="googlebot" content="noindex,nofollow"> Code (markup): Jean-Luc