Major duplicate content issue?

Discussion in 'Directories' started by enQuira, Apr 11, 2007.

  1. #1
    This is very important
    This is valid for both esyndicat and phpld,
    try the following:
    http://www.youdirectory.com/Arts
    http://www.youdirectory.com/arTs
    http://www.youdirectory.com/ARTS
    and so on ....

    1. You will get the same content for all these pages.
    2. Check the headers of those pages. You will get 200 OK which means that if google find those urls, they all will be indexed.

    The problem is even more serious for phpld because they use relative paths. so you will end up with urls like:
    http://www.youdirectory.com/arTS/Animation/
    http://www.youdirectory.com/arTS/Antiques/
    and so on... you will have a huge number of pages indexed by the search engines which can get you easily penalized.

    Am I missing something or they should really work on this? anyone can mess with your directory by simply linking to one of those wrong paths.
    I found a solution for esyndicat, not for phpld yet.
     
    enQuira, Apr 11, 2007 IP
  2. aspidov

    aspidov Well-Known Member

    Messages:
    2,875
    Likes Received:
    272
    Best Answers:
    0
    Trophy Points:
    175
    #2
    i dont think lower and upper case matters for a robot. www.realwd.com is still the same as www.REALWD.com so it shouldnt be considered duplicate.
     
    aspidov, Apr 11, 2007 IP
  3. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #3
    enQuira, Apr 11, 2007 IP
  4. nytrokiss

    nytrokiss Peon

    Messages:
    123
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Well sometimes it will be case sensitive and sometimes not i think it depends on the url rewrite!
     
    nytrokiss, Apr 11, 2007 IP
  5. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #5
    enQuira, Apr 11, 2007 IP
  6. paidhosting

    paidhosting Peon

    Messages:
    4,822
    Likes Received:
    483
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Well question is should i be worried? Google has not penalized my site for ages now will it decide to do so all the sudden?
     
    paidhosting, Apr 11, 2007 IP
  7. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #7
    because google didn't find those wrong urls, If somebody links to one of them, google will start the crawling.
     
    enQuira, Apr 11, 2007 IP
  8. paidhosting

    paidhosting Peon

    Messages:
    4,822
    Likes Received:
    483
    Best Answers:
    0
    Trophy Points:
    0
    #8
    So what u mean is that if someone linked to my site using say MYSITE.com instead of mysite.com , google will think there are actually 2 sites?
     
    paidhosting, Apr 11, 2007 IP
  9. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #9
    mysite.com/page and mysite.com/Page are two different urls if you know!! so the problem is not the domain but the full url.
     
    enQuira, Apr 11, 2007 IP
  10. paidhosting

    paidhosting Peon

    Messages:
    4,822
    Likes Received:
    483
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Yikes :eek: , so i need to come up with some rewrite rules to convert all capitals to small letters like u add www to non www https
     
    paidhosting, Apr 11, 2007 IP
  11. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #11
    - Botw are 301 redirecting those paths
    - Dmoz are 302 redirecting them
    - Dir.Google are sending 404 not found, better than 302.
    - Yahoo dir is not doing anything about it


    I am also returning a 404 not found in rakCha, but I use esyndicat, I don't know how to do it for phpld. They should come up with s solution.
     
    enQuira, Apr 11, 2007 IP
  12. malcolm1

    malcolm1 Prominent Member

    Messages:
    7,148
    Likes Received:
    758
    Best Answers:
    0
    Trophy Points:
    310
    #12
    Hello..

    its all the same .... just make sure that when you create new catagories that the _ <-----
    isnt added after the word and or catagory...

    thx
    malcolm
     
    malcolm1, Apr 11, 2007 IP
  13. rtchar

    rtchar Peon

    Messages:
    415
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Actually you are talking about two different problems ...

    Your server interprets different case file names as unique if it is Unix/Linux based. Windows servers are case insensitive.

    Google will try to get a file using the same case as the link ... but if your server does not return a file then it is simply marked "404 error" and dropped. Which is the correct thing to do.

    Don't try to fix this behavior :eek:

    You may introduce errors on your site, if you don't handle EVERY possible situation.
     
    rtchar, Apr 11, 2007 IP
  14. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #14
    that's not true, domain.com/page and domain.com/Page are considered two different urls by apache. the problem is with the scripts. you are giving google different urls with the same content.
    nobody seems to be concerned :rolleyes:
     
    enQuira, Apr 11, 2007 IP
  15. Claudek

    Claudek Well-Known Member

    Messages:
    1,379
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    165
    #15
    I force all autogenerated urls for my websites to be lowercase on a Linux based webserver (using .htaccess file). All manually created pages follow the same standard. It should not be all that difficult to do this for any scripts you use.

    You are correct in saying that domain.com/page and domain.com/Page are considered two different urls by apache. Again, this is easy enough to resolve using url rewrites.
     
    Claudek, Apr 11, 2007 IP
  16. enQuira

    enQuira Peon

    Messages:
    1,584
    Likes Received:
    250
    Best Answers:
    0
    Trophy Points:
    0
    #16
    yes that it is possible to rewrite the whole thing to lower case, but I am not sure that directory owners will do that because they will have PR0 pages until the next update, after the upcoming one.

    Anyways, can you please post the code you are using in .htaccess?
     
    enQuira, Apr 11, 2007 IP