I just read an interesting post on Shoemoney.com and have one question about it. Here's his article: http://www.shoemoney.com/2008/03/03/wordpress-robotstxt-tips-against-duplicate-content/ In his robots.txt file he has it set to disallow index.php like this: Disallow: /index.php I'm wondering for WordPress if this is a bad idea? I don't like to question a man who brings home da bling, but it does leave me scratching my head a bit. Any thoughts about this?
i see in the robots.txt the disallow for index.php, why is that and how is that beneficial at all? im confused
He doesn't say it, it's in his robots.txt file for that website. Taking a guess i think it is because on his index.php he is siimply displaying the posts which can also be found on there permanant links therefore there is a duplicate content issue.
I suppose his sitemap will pick up the the blog posts somehow? If he has one? I suppose that's the only reasoning behind blocking index.php... to avoid dup content, but I wonder how the posts get indexed if he doesn't have a sitemap?
I think the thought behind disallowing index.php is because index.php and the root domain is the same thing. For example: www.yourdomain.com/ is the same as www.yourdomain.com/index.php. So disallowing index.php will get rid of duplicate content from your root. If that makes sense.
Yeah, I think you're right. I'm taking a wild guess that the ultimate goal is to perhaps only index the blog posts.