Blogspot.com is now using robots.txt to block certain parts of blogs for the first time. I wrote a post about this in DP's "Blogging" forum, but since that section isn't read as much as this one and Blogspot is owned by Google; I wanted to call attention to this change. I have written almost 2000 post on my blogspot account over the past few years and until I checked today the robots.txt file on all blogspot.com blogs was always just a blank txt file. While this move might look innocent, Blogspot has encourage folks to use labels to compile there content and those labels are all placed in a subdirectory called /search/labels which is now banned by robots.txt across the board on all blogspot.com blogs.
Blog posts will be indexed, but site topic's that are sorted by label will not. Considering most people use labels to make it easier for search engines to find relevant information, this move by Blogspot will probably sink confidence in there whole operation because many people will have thoughts in the back of their head about what future restrictions will be placed on content. Google, via Matt Cutts, has always encourage a wide open Internet when it comes to there competitors like Yahoo, but this move by Blogspot might just cause other search engines to start to block their own propriety content from Googlebot, too. I feel in my gut that this move was made because of money and not for best interest of keeping Internet content available to everyone. There are big problems involved in indexing Internet content for all major search engines and this decision by Google to limit their content from themselves and others is probably only the first step to a different Internet in the future.
did you notice that they have a sitemap on it now? Also its propper usage of the robots.txt to block "/search" which stops non relivant pages showing up in the serps. Pierce
I think Google has become too large and too rich and therefore too powerful. Putting your trust in Google's blogspot gives them too much control over you. I think using WordPress blogs on your own domain is the best way to go as you can control everything. If you do lots of blogs, using subdomains is an economical way to go. I agree with you, MarkHutch. This is probably just the beginning of more control tactics from Google that will affect the Internet as we know it. A few years ago, I loved Google. Now I am developing a strong dislike of them. I wish Yahoo and MSN would get their acts together and really give Google a run for their money. Just a few thoughts. Chris
It is a good thing. In this way Spiders will only index single posts. In this way you will avoid duplicate content. For wordpress users, I recommend you amend your robots.txt See this article for more info askapache.com/seo/seo-with-robotstxt.html
I agree that it is probably a good thing from a search engine point of view, but for people like me that use Adsense on my blog this means that even the Adsense crawler is banned from pages with labels which means that if people follow a labels link from an individual post, the ads on labels pages will not be relevant because Google is not allowing the Adsense bot to see what is on that labels page. I just wish there was some way around this because I have dozens of people per day that visit my initial post via search engines and then end up exiting via a blog post label page. Update: Now that I have thought about this some more, it should be possible for Google to place relevant ads on labels pages by tracking back the visitor to the page before. So maybe this isn't as big of a deal as I first thought.
this is a good move....the are blocking content that can be reached from multiple urls....it is duplicate content that they are blocking....they recommend that people with blogs do that or the algo will choose one of the urls to index and drop the rest.