In this constant struggle to try to obtain at least some results from Google, Yahoo, and MSN, I've found that duplicate content is a no-no. Here is my problem: I'm running a website based on Mambo CMS with a completely seperate phpBB system. I'm running a very promising SEO mod that converts each post into something like "this-is-the-post-title-vt1934.html" which I think should do wonders for my SEO results. The only problem is, the mod tends to create a LOT of duplicate content. For example, if you click on the newest post link it takes you to a page "latestpost-vt1934.html" which is basically a duplicate of the original page. Likewise for a few other features -- 'view next, view previous' etc. Now i've gone and excluded these filenames in my robots.txt file, and I want to know if this will be enough to keep Google/others happy in regards to original content. Another quick question, inside Mambo.. any 404's are automatically redirected to the main index.php page. Will this harm anything? Should I just create a "404 - Page Not Found - Click here for the main page" or leave as is?
First, you really do NOT need that rewrite "SEO mod". What you need to do for php sites is remove session IDs for "Guests", which includes SE spiders. Do that and the whole issue goes away because you don't have all those redundant and unnecessary html pages. However, if you insist on using that mod, why on earth are you creating html pages (presumably because you believe that will help your SEO) only to tr6y to exclude them in robots.txt? That makes no sense to me at all.
One reason to use "SEO" or "friendly urls" mod is to make this: http://www.movies.zzz/index.php?option=com_content&task=view&id=144&Itemid=1 to look like this: http://www.movies.zzz/reviews/titanic.html It makes perfect sense from SEO point of view as well as from human readability point of view. The current version of Mambo is famous for its Itemid "feature" that creates tons of duplicate pages. Even if SEO mod is not installed. The way get rid of it is to use OpenSEF (former Xaneon extentions) mod - http://opensef.org/ (it's much easier than doing it manually trough robots.txt) I think that creating a dedicated 404 page is a good idea - I found that google don't understand redirects too good even if the redirects generate 404 code. Don't ask me why
Yeah I would get rid of the mod-rewrite part of the seo stuff. I just yanked it on several of my phpbb boards. It doesn't actually help with duplicate content, it actually creates MORE of it. For example, let us say you had 5 different possible combinations to get to a particular thread p=, t=, highlight= etc etc. Well you use the rewrite mod and you have all of those still showing up, and on top of that you have re-write version of them all. So now instead of 5 possible variations for a thread you have 10. It actually makes things worse. I think the duplicate issue is not really a problem as the search engine handles it itself, but I do think you potentially reduce the effectiveness of your indexed pages by creating so many more duplicate options. By the way if you pull the modrewrite code I suggest you just reverse the mod-rewrite in .htaccess and then just yank it from pageheader.php and overallfooter.tpl (I think that is where the bulk of it is)