Duplicate content on a large e-commerce site

Discussion in 'Search Engine Optimization' started by seoindetroit, Jul 10, 2008.

  1. #1
    My issue is that I am managing a large retail e-commerce site and I am running into duplicate content issues. The problem is we use a CMS system that automatically creates two (or three or four) different versions of our sites pages. One page is a keyword rich url and the other is cms generated garbage. I just ran a scan of our site and just about every page has 3 of 4 copies of itself. Given my infinite SEO wisdom I came to the conclusion that I should create a robots.txt wild-card and block all URLs that contain these garbled urls. My question is....because some of these garbles urls are indexed, will the correct keyword rich urls replace them in the index? do you think the robots wild card is the best solution? Getting rid of the cms is not an option at this juncture.
     
    seoindetroit, Jul 10, 2008 IP
  2. priyakochin

    priyakochin Banned

    Messages:
    4,740
    Likes Received:
    138
    Best Answers:
    0
    Trophy Points:
    0
    #2
    why can't you change that CMS ?
     
    priyakochin, Jul 10, 2008 IP
  3. seoindetroit

    seoindetroit Banned

    Messages:
    65
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    it was developed years ago and we don't have the resources...its very old...its just not an option right now
     
    seoindetroit, Jul 10, 2008 IP
  4. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #4
    The "best" solution is to come up with a .htaccess rewrite rule to 301 redirect all these alternate copies to the one primary version. You won't need a rule for every URL, by using wildcard entries you will be able to match things like whole directories.

    Using Robots.txt is not the best idea, Pagerank is still passed to URL's blocked by Robots.txt and people will still link to these garbage pages which is a waste of links.
     
    sweetfunny, Jul 10, 2008 IP
  5. seoindetroit

    seoindetroit Banned

    Messages:
    65
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    hmmm... with over 2000 products pages how would that wildcard look like?
     
    seoindetroit, Jul 10, 2008 IP
  6. seoindetroit

    seoindetroit Banned

    Messages:
    65
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    would i have to redirect them to all their corresponding correct pages. thats 2000 lines in the htaccess
     
    seoindetroit, Jul 10, 2008 IP
  7. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #7
    As i just said, Wildcards.

    You can redirect all bad-product.php?=1234 to good-product.php?=1234 type pages in like 2 lines of .htaccess, this alone can be thousands of URL's.
     
    sweetfunny, Jul 10, 2008 IP
  8. seoindetroit

    seoindetroit Banned

    Messages:
    65
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I must be having a brain lapse because I can make this workout in my head today.... every 2000 unique bad pages has one corresponding unique good page. How could I write the incoming wildcard for it to fall exactly into the proper correct. Please excuse my ignorance here.
     
    seoindetroit, Jul 10, 2008 IP
  9. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #9
    That's like you telling me what color my car is.

    Also BTW 2,000 products is nowhere near a large E-commerce site. ;)

    Edit: Is it the site in your sig? If so i have no idea about rewites on Windows servers, the words Windows and Server should of never been combined. :)
     
    sweetfunny, Jul 10, 2008 IP
  10. seoindetroit

    seoindetroit Banned

    Messages:
    65
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #10
    forget it...
     
    seoindetroit, Jul 10, 2008 IP