Google's Duplicate Content Filter

Will.Spencer NetBuilder

Messages:: 14,789

Likes Received:: 1,040

Best Answers:: 0

Trophy Points:: 375

#1

Has anyone been testing this filter?

What does the filter use to determine duplicate pages?

Are there simple ways to beat the filter?

Will.Spencer, Aug 8, 2004 IP

nohaber Well-Known Member

Messages:: 276

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 138

#2

Has anyone been testing this filter?
Click to expand...

Interesting terminology. Did this filter thing started in the SEO communities?

All that it is to know about duplicate content is in the two patents Google has about dup content:
Detecting duplicate and near-duplicate files
Detecting query-specific duplicate documents

If anyone needs help understanding them, I'll be happy to help.

nohaber, Aug 8, 2004 IP

digitalpoint Overlord of no one Staff

Messages:: 38,334

Likes Received:: 2,613

Best Answers:: 462

Trophy Points:: 710

Digital Goods:: 29

#3

Will.Spencer said:

Are there simple ways to beat the filter?
Click to expand...

Don't use duplicate content.

digitalpoint, Aug 9, 2004 IP

Will.Spencer NetBuilder

Messages:: 14,789

Likes Received:: 1,040

Best Answers:: 0

Trophy Points:: 375

#4

digitalpoint said:

Don't use duplicate content.
Click to expand...

Well, that's a wee bit more complex...

You see... this Internet thingy existed before Google came 'round.

Actually, this Internet thingy existed before the world wide web.

And, back in those dark ages, before HTML existed, we allowed each other to copy what we wrote and store it on FTP servers. Then we updated to Gopher. Eventually, we learned HTML and carried that philosophy to the world wide web.

The unpleasant(?) side effect is that, after a recent domain name change, one of my mirrors is now knocking me out of the SERPS for quite a few of my (our?) pages.

It really shouldn't bother me. It really shouldn't. I really shouldn't care whether the users are looking at my content on my server or on one of the mirrors.

I dunno. It's bugging me.

But... not enough to change the way we have been working since before the web was invented.

Will.Spencer, Aug 9, 2004 IP

nohaber Well-Known Member

Messages:: 276

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 138

#5

Spencer,
read the patents. When there's duplicate content in the SERPs, Google shows the one page that it thinks is best (the one with the highest PageRank). The problem with duplicate documents is that Google might decide to crawl them very infrequently and that way, your mirrors will outrank the main pages for a longer time than you would want to.

nohaber, Aug 9, 2004 IP

Will.Spencer NetBuilder

Messages:: 14,789

Likes Received:: 1,040

Best Answers:: 0

Trophy Points:: 375

#6

nohaber, you are correct!

Googlebot visits every night, but I just checked and found that the set of mirrored pages where I am not winning the (friendly) duplicate content war are not being visited by Googlebot.

I used to believe that Google chose the duplicate with the higher PR. However, Google seems to have chosen randomly between me and my #1 mirror. I win some pages and he wins others. PR distribution should be a lot more even than that. Right now, I'm not sure what to believe on that point.

Ah well, all of the (current) mirrors are also mirroring my ads.

Will.Spencer, Aug 9, 2004 IP

nacho45 Peon

Messages:: 61

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#7

Why do you need a mirror? Why not just use a permanent redirect?

nacho45, Aug 9, 2004 IP

Old Welsh Guy Notable Member

Messages:: 2,699

Likes Received:: 291

Best Answers:: 0

Trophy Points:: 205

#8

Will gooogle looks at pages not sites, so your saying that G is choosing some pages off your site & some off the mirror is right. Keep in mind that the toolbar PR is not the actual PR, it is PR rounded to a whole number.

Your pages are likely to be linked to individually and this could give it an edge over another.

Old Welsh Guy, Aug 9, 2004 IP

Will.Spencer NetBuilder

Messages:: 14,789

Likes Received:: 1,040

Best Answers:: 0

Trophy Points:: 375

#9

nacho45 said:

Why do you need a mirror? Why not just use a permanent redirect?
Click to expand...

I'm wandering off-topic, but...

I don't need a mirror. Mirrors were important in the late 80's and early 90's, but today their function is largely performed by Google cache and Archive.org's WayBackMachine.

However, people like to mirror and I agreed to this arrangement years ago. I'm not going to back out now because of some silly search engine algorithm.

Will.Spencer, Aug 9, 2004 IP

Bompa Active Member

Messages:: 461

Likes Received:: 20

Best Answers:: 0

Trophy Points:: 58

#10

It amazes me how many webmasters believe that Google would reveal
portions of their ranking methods by filing patent applications that would
never be enforceable, *IF* the patents are granted.

Oh well, we believe what we want to believe.

Bompa

Bompa, Dec 8, 2005 IP

Old Welsh Guy Notable Member

Messages:: 2,699

Likes Received:: 291

Best Answers:: 0

Trophy Points:: 205

#11

Bompa, care to better explain what your saying. I am Old Bald and Stupid of course.

Old Welsh Guy, Dec 8, 2005 IP

Bompa Active Member

Messages:: 461

Likes Received:: 20

Best Answers:: 0

Trophy Points:: 58

#12

Old Welsh Guy said:

Bompa, care to better explain what your saying. I am Old Bald and Stupid of course.
Click to expand...

Sure, what is your questions?

Bompa

Bompa, Dec 8, 2005 IP

alext Active Member

Messages:: 406

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 68

#13

Will.Spencer said:

Googlebot visits every night, but I just checked and found that the set of mirrored pages where I am not winning the (friendly) duplicate content war are not being visited by Googlebot.
Click to expand...

If the content is of a static nature, how about placing something dynamic (a few lines of randomly selected text, rss etc - or even manualy editing something) on the pages you prefer Google to look at? Possibly the pages that Google sees as more recently updated might change its mind?

Just a thought.

alext, Dec 8, 2005 IP

Will.Spencer NetBuilder

Messages:: 14,789

Likes Received:: 1,040

Best Answers:: 0

Trophy Points:: 375

#14

Ah yes... I have been doing that.

I have three different sets of server-side dynamic content on the primary site which do not appear on the mirror sites.

Unfortunately, this has no effect.

Well, perhaps unfortunately. Really, only one of these sets of pages should be showing up in the index.

I currently have Googlebot banned from the mirror site. This is unfortunate, because almost every keyword from the primary site was dropped several pages in the SERPS with the arrival of Jager1.

I allowed Googlebot back to the mirror site for awhile, and it did reasonably well in the SERPS. I've disallowed Googlebot again due to administrative/security issues on the mirrored site.

So now the mirror gets almost no traffic and the main site gets little more.

Thankfully, my #2 (unrelated) site has more than doubled in revenue in the last two months.

Will.Spencer, Dec 8, 2005 IP

DarrenC Peon

Messages:: 3,386

Likes Received:: 154

Best Answers:: 0

Trophy Points:: 0

#15

I have the same problem - and have had to "train" clients to write unique text to ensure that the listing isn't picked up as duplicate content. This is a tiresome job, but has solved what was a major issue on one of my websites.

DarrenC, Dec 8, 2005 IP

zanet Peon

Messages:: 104

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#16

surely RSS feeds make the whole thing a mockery

zanet, Dec 9, 2005 IP

Will.Spencer NetBuilder

Messages:: 14,789

Likes Received:: 1,040

Best Answers:: 0

Trophy Points:: 375

#17

The point being this:

The duplicate content filter is not per-page; it is per-paragraph or even per-sentence.

Will.Spencer, Dec 9, 2005 IP

alext Active Member

Messages:: 406

Likes Received:: 26

Best Answers:: 0

Trophy Points:: 68

#18

Will.Spencer said:

The point being this:

The duplicate content filter is not per-page; it is per-paragraph or even per-sentence.
Click to expand...

I just did an experiment. I went to a re-use article site. In the SEO section I sorted by oldest and picked something in the middle with a unique title. I copied & pasted the title into G. It came up with 300+ sites. I scanned the results and they are links to sites with that article for the greater part.

Have I missed the point or does my experiment refute your claim? (Honestly I do not know)

alext, Dec 10, 2005 IP

Will.Spencer NetBuilder

Messages:: 14,789

Likes Received:: 1,040

Best Answers:: 0

Trophy Points:: 375

#19

I do not know.

I know that sometimes it works that way also!

It is very frustrating.

Will.Spencer, Dec 10, 2005 IP

Barre Tire Peon

Messages:: 1,193

Likes Received:: 79

Best Answers:: 0

Trophy Points:: 0

#20

Will an interesting discussion is going on at WMW about this http://www.webmasterworld.com/forum30/32129-98-10.htm

Barre Tire, Dec 10, 2005 IP

Log in or Sign up

Google's Duplicate Content Filter

Will.Spencer NetBuilder

nohaber Well-Known Member

digitalpoint Overlord of no one Staff

Will.Spencer NetBuilder

nohaber Well-Known Member

Will.Spencer NetBuilder

nacho45 Peon

Old Welsh Guy Notable Member

Will.Spencer NetBuilder

Bompa Active Member

Old Welsh Guy Notable Member

Bompa Active Member

alext Active Member

Will.Spencer NetBuilder

DarrenC Peon

zanet Peon

Will.Spencer NetBuilder

alext Active Member

Will.Spencer NetBuilder

Barre Tire Peon

Useful Searches