My issue is that I am managing a large retail e-commerce site and I am running into duplicate content issues. The problem is we use a CMS system that automatically creates two (or three or four) different versions of our sites pages. One page is a keyword rich url and the other is cms generated garbage. I just ran a scan of our site and just about every page has 3 of 4 copies of itself. Given my infinite SEO wisdom I came to the conclusion that I should create a robots.txt wild-card and block all URLs that contain these garbled urls. My question is....because some of these garbles urls are indexed, will the correct keyword rich urls replace them in the index? do you think the robots wild card is the best solution? Getting rid of the cms is not an option at this juncture.
it was developed years ago and we don't have the resources...its very old...its just not an option right now
The "best" solution is to come up with a .htaccess rewrite rule to 301 redirect all these alternate copies to the one primary version. You won't need a rule for every URL, by using wildcard entries you will be able to match things like whole directories. Using Robots.txt is not the best idea, Pagerank is still passed to URL's blocked by Robots.txt and people will still link to these garbage pages which is a waste of links.
would i have to redirect them to all their corresponding correct pages. thats 2000 lines in the htaccess
As i just said, Wildcards. You can redirect all bad-product.php?=1234 to good-product.php?=1234 type pages in like 2 lines of .htaccess, this alone can be thousands of URL's.
I must be having a brain lapse because I can make this workout in my head today.... every 2000 unique bad pages has one corresponding unique good page. How could I write the incoming wildcard for it to fall exactly into the proper correct. Please excuse my ignorance here.
That's like you telling me what color my car is. Also BTW 2,000 products is nowhere near a large E-commerce site. Edit: Is it the site in your sig? If so i have no idea about rewites on Windows servers, the words Windows and Server should of never been combined.