1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Google's Duplicate Content Filter

Discussion in 'Google' started by Will.Spencer, Aug 8, 2004.

  1. #1
    Has anyone been testing this filter?

    What does the filter use to determine duplicate pages?

    Are there simple ways to beat the filter?
     
    Will.Spencer, Aug 8, 2004 IP
  2. nohaber

    nohaber Well-Known Member

    Messages:
    276
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    138
    #2
    nohaber, Aug 8, 2004 IP
  3. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #3
    Don't use duplicate content. :)
     
    digitalpoint, Aug 9, 2004 IP
  4. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #4
    Well, that's a wee bit more complex...

    You see... this Internet thingy existed before Google came 'round.

    Actually, this Internet thingy existed before the world wide web.

    And, back in those dark ages, before HTML existed, we allowed each other to copy what we wrote and store it on FTP servers. Then we updated to Gopher. Eventually, we learned HTML and carried that philosophy to the world wide web.

    The unpleasant(?) side effect is that, after a recent domain name change, one of my mirrors is now knocking me out of the SERPS for quite a few of my (our?) pages.

    It really shouldn't bother me. It really shouldn't. I really shouldn't care whether the users are looking at my content on my server or on one of the mirrors.

    I dunno. It's bugging me. :rolleyes:

    But... not enough to change the way we have been working since before the web was invented. :D
     
    Will.Spencer, Aug 9, 2004 IP
  5. nohaber

    nohaber Well-Known Member

    Messages:
    276
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    138
    #5
    Spencer,
    read the patents. When there's duplicate content in the SERPs, Google shows the one page that it thinks is best (the one with the highest PageRank). The problem with duplicate documents is that Google might decide to crawl them very infrequently and that way, your mirrors will outrank the main pages for a longer time than you would want to.
     
    nohaber, Aug 9, 2004 IP
  6. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #6
    nohaber, you are correct!

    Googlebot visits every night, but I just checked and found that the set of mirrored pages where I am not winning the (friendly) duplicate content war are not being visited by Googlebot.

    I used to believe that Google chose the duplicate with the higher PR. However, Google seems to have chosen randomly between me and my #1 mirror. I win some pages and he wins others. PR distribution should be a lot more even than that. Right now, I'm not sure what to believe on that point.

    Ah well, all of the (current) mirrors are also mirroring my ads. :D
     
    Will.Spencer, Aug 9, 2004 IP
  7. nacho45

    nacho45 Peon

    Messages:
    61
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Why do you need a mirror? Why not just use a permanent redirect?
     
    nacho45, Aug 9, 2004 IP
  8. Old Welsh Guy

    Old Welsh Guy Notable Member

    Messages:
    2,699
    Likes Received:
    291
    Best Answers:
    0
    Trophy Points:
    205
    #8
    Will gooogle looks at pages not sites, so your saying that G is choosing some pages off your site & some off the mirror is right. Keep in mind that the toolbar PR is not the actual PR, it is PR rounded to a whole number.

    Your pages are likely to be linked to individually and this could give it an edge over another.
     
    Old Welsh Guy, Aug 9, 2004 IP
  9. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #9
    I'm wandering off-topic, but...

    I don't need a mirror. Mirrors were important in the late 80's and early 90's, but today their function is largely performed by Google cache and Archive.org's WayBackMachine.

    However, people like to mirror and I agreed to this arrangement years ago. I'm not going to back out now because of some silly search engine algorithm.
     
    Will.Spencer, Aug 9, 2004 IP
  10. Bompa

    Bompa Active Member

    Messages:
    461
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    58
    #10
    It amazes me how many webmasters believe that Google would reveal
    portions of their ranking methods by filing patent applications that would
    never be enforceable, *IF* the patents are granted.


    Oh well, we believe what we want to believe.

    Bompa
     
    Bompa, Dec 8, 2005 IP
  11. Old Welsh Guy

    Old Welsh Guy Notable Member

    Messages:
    2,699
    Likes Received:
    291
    Best Answers:
    0
    Trophy Points:
    205
    #11
    Bompa, care to better explain what your saying. I am Old Bald and Stupid of course.
     
    Old Welsh Guy, Dec 8, 2005 IP
  12. Bompa

    Bompa Active Member

    Messages:
    461
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    58
    #12

    Sure, what is your questions?

    :)


    Bompa
     
    Bompa, Dec 8, 2005 IP
  13. alext

    alext Active Member

    Messages:
    406
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    68
    #13
    If the content is of a static nature, how about placing something dynamic (a few lines of randomly selected text, rss etc - or even manualy editing something) on the pages you prefer Google to look at? Possibly the pages that Google sees as more recently updated might change its mind?

    Just a thought.
     
    alext, Dec 8, 2005 IP
  14. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #14
    Ah yes... I have been doing that.

    I have three different sets of server-side dynamic content on the primary site which do not appear on the mirror sites.

    Unfortunately, this has no effect.

    Well, perhaps unfortunately. Really, only one of these sets of pages should be showing up in the index. :D

    I currently have Googlebot banned from the mirror site. This is unfortunate, because almost every keyword from the primary site was dropped several pages in the SERPS with the arrival of Jager1.

    I allowed Googlebot back to the mirror site for awhile, and it did reasonably well in the SERPS. I've disallowed Googlebot again due to administrative/security issues on the mirrored site.

    So now the mirror gets almost no traffic and the main site gets little more.

    Thankfully, my #2 (unrelated) site has more than doubled in revenue in the last two months. :)
     
    Will.Spencer, Dec 8, 2005 IP
  15. DarrenC

    DarrenC Peon

    Messages:
    3,386
    Likes Received:
    154
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I have the same problem - and have had to "train" clients to write unique text to ensure that the listing isn't picked up as duplicate content. This is a tiresome job, but has solved what was a major issue on one of my websites.
     
    DarrenC, Dec 8, 2005 IP
  16. zanet

    zanet Peon

    Messages:
    104
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #16
    surely RSS feeds make the whole thing a mockery
     
    zanet, Dec 9, 2005 IP
  17. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #17
    The point being this:

    The duplicate content filter is not per-page; it is per-paragraph or even per-sentence.
     
    Will.Spencer, Dec 9, 2005 IP
  18. alext

    alext Active Member

    Messages:
    406
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    68
    #18
    I just did an experiment. I went to a re-use article site. In the SEO section I sorted by oldest and picked something in the middle with a unique title. I copied & pasted the title into G. It came up with 300+ sites. I scanned the results and they are links to sites with that article for the greater part.

    Have I missed the point or does my experiment refute your claim? (Honestly I do not know)
     
    alext, Dec 10, 2005 IP
  19. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #19
    I do not know.

    I know that sometimes it works that way also!

    It is very frustrating. :confused:
     
    Will.Spencer, Dec 10, 2005 IP
  20. Barre Tire

    Barre Tire Peon

    Messages:
    1,193
    Likes Received:
    79
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Barre Tire, Dec 10, 2005 IP