Duplicate Content, what is the real deal?

DesertWarrior Banned

Messages:: 160

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#1

Hello guys,

We have noticed thru CopyScape that a few websites have literally copied some of our articles. Now, will Google penalize our website for this? Some of our articles are not being indexed by Google, could this be the reason?

It wouldn't make sense if Google penalizes us. We are the original writers of those articles... right? Does Google have a way of knowing which one was the original publisher and penalize all the others? What's the best way of protecting our content from plagiarism?

I have read many contradicting opinions on this matter from different webmasters, so I seek some Expert advice here...

Thanks!

DesertWarrior, Apr 2, 2009 IP

contentboss Peon

Messages:: 3,241

Likes Received:: 54

Best Answers:: 0

Trophy Points:: 0

#2

ITs unlikely you'll be 'penalized'. However, you may end up not being regarded as the definitive source for your own article. It's not as simple as 'who posts first' because it also relies on when the googlebot finds an example of the text. You can read about the dupe content myth here - dupe content myth

contentboss, Apr 2, 2009 IP

£££ likes this.

Komicwords Well-Known Member

Messages:: 1,016

Likes Received:: 9

Best Answers:: 0

Trophy Points:: 108

#3

I think this issue no longer considered important to Google serp system,and I think more years to come less people will care about this,and I believe with only 300 words quality of you within 3 months you will got some one who "stole your content

Komicwords, Apr 2, 2009 IP

DesertWarrior Banned

Messages:: 160

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#4

Thanks for your replies but I am not sure that google doesn't consider this important anymore... because if that was the case, anybody can just start a website and copy valuable articles like wikipedia's and make profit out of it.

DesertWarrior, Apr 2, 2009 IP

OnInternetBusinessGuide Well-Known Member

Messages:: 330

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 108

#5

It looks like search engines are wise about analyzing the page content. It seems they can take fragment of the pages and not just the whole page itself. A lot of websites have fragments from other websites. For example, you can have fragments of ads or citations. May be a simple fragment is not a problem, but what is exactly a fragment? The majority of websites are different because the navigation structures are different: menus and links at the bottom for example. In principle, we should not have the exact same content between two pages, but someone could copy the whole text from another website (without the navigation structure and tables). To the human eye the different is pretty obvious, but from a computer eye it may not be. Does anyone have more information about this aspect?

OnInternetBusinessGuide, Apr 2, 2009 IP

Earnest01 Peon

Messages:: 373

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#6

Seems like I have already posted on a similar issue, but I guess you wouldn't mind if I repeated myself and posted the link I referred to once again. That will be http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=66359.

I hope you will find this helpful.

Earnest01, Apr 2, 2009 IP

paganheart Peon

Messages:: 107

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#7

You can use duplicate content, if done wisely. If you are running an autoblog of ONLY dupe content, well um.. yeah that's a waste of time. It won't rank and it's not likely to gain PR. Without PR you won't get traffic etc and so on. Sure, you could spin or "wrangle" your articles to seem less duplicate.. but I have yet to find something that works well and is NOT time consuming (yes I've tried content boss too). So I'm back to buying original content or creating it myself. It's the only thing that REALLY works.

IF you insist that dupe content is beneficial in any way, then be smart (as I suggested above) and add at least %50 original content or some kind of mix. I use articles that are syndicated. But my sites are not full of them either.

paganheart, Apr 2, 2009 IP

mynetincome.com Peon

Messages:: 31

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#8

I'm also skeptical on just how or if a site is penalized for dupe content. A source I will cite, as i'm a nascar fan, is jayski.com. The guy that runs that site just posts related news stories from all over the internet (simple copy and paste). He cites his sources well and makes no claims about the uniqueness of his content.

However, his website has a large PR and is extremely popular in google. So much ESPN offered to buy his site. So this leads me to believe that a site's page rank may have a bit to do with who's content is worth more to google than who it actually belongs to.

mynetincome.com, Apr 2, 2009 IP

BardAzima Peon

Messages:: 40

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#9

My understanding, after also doing much research on this topic, is that duplicate content is only a bad thing if practically everything on the site is duplicated. As the nascar fan said, it is not inappropriate for anyone to take any else's article and post it on their site as long as it is referenced. The robots can't tell if it's been referenced or not, and it's obviously not affecting that site's ranking. I think this is because that 'borrowed' article is only one small bit of writing within a site that has a lot of content - I think that's the key. I guess the downside, though, is that if your site does not rank high, and a higher ranking site publishes your article, then they will rank higher in the search engines - even though you wrote it and it's on your site.

BardAzima, Apr 2, 2009 IP

mynetincome.com Peon

Messages:: 31

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#10

BardAzima said: ↑

I think this is because that 'borrowed' article is only one small bit of writing within a site that has a lot of content - I think that's the key.
Click to expand...

Good theory. I did not look at it like that...Kinda makes sense why a lot of these "scraper" blogs/sites do fairly well. Real good insight...Thanks!

mynetincome.com, Apr 2, 2009 IP

catanich Peon

Messages:: 1,921

Likes Received:: 40

Best Answers:: 0

Trophy Points:: 0

#11

Reality check everyone. Look at your Google Tool Bar. If the TB is gray (graybarred), then some where on that page is a sentence or paragraph that is copied. If you drag and drop each sentence in the search bar, the other site will be displayed.

Copyscape is good for gross duplications, but the "duplicate content" penalty (graybar) can only be detected by searching on each sentence.

Our site is indexed every 4 to 5 days by Google, but if I create a brand new page with 100% orginal content, the "scraper" blogs will get there first. Google will give them credit for the content, no me.

This is one of the major problems currently and Google has not addressed it. Whats worse, they are off shore and no legal action (copyright) can be done.

catanich, Apr 2, 2009 IP

DesertWarrior Banned

Messages:: 160

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#12

it's really scary that nothing can be done to get protected...

DesertWarrior, Apr 2, 2009 IP

bigmny4you Peon

Messages:: 113

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#13

The best thing you can do to protect yourself is to "brand" all of your posts with some sort of signature with your name or business name. I do this with all of my posts and have had no problems so far. Be sure Google 'can' visit often.

bigmny4you, Apr 2, 2009 IP

wizlor Peon

Messages:: 95

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#14

paganheart said: ↑

You can use duplicate content, if done wisely. If you are running an autoblog of ONLY dupe content, well um.. yeah that's a waste of time. It won't rank and it's not likely to gain PR. Without PR you won't get traffic etc and so on. Sure, you could spin or "wrangle" your articles to seem less duplicate.. but I have yet to find something that works well and is NOT time consuming (yes I've tried content boss too). So I'm back to buying original content or creating it myself. It's the only thing that REALLY works.

IF you insist that dupe content is beneficial in any way, then be smart (as I suggested above) and add at least %50 original content or some kind of mix. I use articles that are syndicated. But my sites are not full of them either.
Click to expand...

I have done a few autoblogs, all using duplicated content. Here is my findings:

1) Out of 5 new domain, 3 gain PR1 in less than 3 months. I am still waiting for the next PR update to see if it stays.

2) I have also done autoblogs on 3 aged domains. They are all PR0 when bought. After a month, they are now a PR4, a PR2 and PR 0.

3) Not much back linkings have been done except submit to social network, and bookmarkings, and some link exchanges.

4) Some of the better ones are earning 10-50cent per day using adsense. BTW, they are not targetting any high cost per click keywords.

5) All the blogs are indexed. Although i never check how often google indexed all my blogs, but some of the better ones are visit by Google bot everyday.

I understand a lot of people claimed duplicated content don't work. I have also been "drilled" with this kind of mindset when I first learn what is blogging about. If i have never tried, maybe till now, i still have this mindset of duplication can never rank, can never gain traffic etc.

Personally, i have seen blogs with full aggregated content with high PR rank and a good numbers visitors based on alexa ranking. Traffic is not all about Google. There are lots of social network site around. If your blogs give good user experiences, and useful content, i don't see why you don't gain traffics.

Last of all, internet full of duplicated content. If Google is to fully eliminated them, then it will be left with less than 20% of the page indexed in Google. By then nobody will be using Google and it won't be the present Google now. Think about it!

wizlor, Apr 2, 2009 IP

Canonical Well-Known Member

Messages:: 2,223

Likes Received:: 141

Best Answers:: 0

Trophy Points:: 110

#15

contentboss said: ↑

ITs unlikely you'll be 'penalized'. However, you may end up not being regarded as the definitive source for your own article. It's not as simple as 'who posts first' because it also relies on when the googlebot finds an example of the text. You can read about the dupe content myth here - dupe content myth
Click to expand...

contentboss is correct... There is no penalty for duplicate content. A penalty would prevent duplicate content from ever making it to page 1 by forcing your URL far back in the SERPs or removing it from the index all together. But duplicate content can make it to page 1.

Basically, the version of the content that is found first by Googlebot 'typically' gets deemed the original... all subsequent copies discovered by Googlebot 'typically' get deemed duplicate. I say 'typically' because there is a way according to Matt Cutts to help Googlebot figure out which really is the originator... more on that later.

There are 200+ factors about a URL - it's contents, it's inbound links, the pages that link to it, the relevance of the pages that link to it, the site's trust, etc. - that Google's algorithm looks at each time it ranks URLs for a particular keyword to display in the SERPs. All of the factors that are based on the content of the page (things like <title>, <h1>...<h6>, keyword density, position on page, etc) are devalued for the duplicates while the originator gets full credit...

Note: This does NOT mean that the duplicates cannot rank well. Duplicate content can actually outrank original content. However to do so the URL flagged as duplicate will need to have a higher overall ranking score than the URL flagged as the original. This means they have to score substantially better on the NON-content related ranking factors. The most obvious way to do this is for the duplicate URL to have a much stronger backlink profile.

Komicwords said: ↑

I think this issue no longer considered important to Google serp system,and I think more years to come less people will care about this,and I believe with only 300 words quality of you within 3 months you will got some one who "stole your content
Click to expand...

I disagree with everything in this statement except, "I believe with only 300 words quality of you within 3 months you will got some one who "stole your content". And while I can't quite make out what the poster was trying to say, I'm guessing they were trying to say that if you write good content, someone is probably going to copy it within 3 months time.

I disagree with the first two sentences though. Google is constantly battling duplicate content. They don't want it showing up in their SERPs unless someone has added value to it...

Google is constantly changing their algo to deal with duplicate content. They would not have this notion of original and duplicate content baked into their algorithm if they no longer cared about duplicate content. Google would not have hired PhDs and spent likely millions to come up with algorithms to detect complete and partial duplication of content. They would not have spent tens of thousands or hundreds of thousands of dollars to patent their algorithms for detecting duplicate content (I believe they have several but here is one of them for your viewing pleasure http://patft.uspto.gov/netacgi/nph-...7,158,961.PN.&OS=PN/7,158,961&RS=PN/7,158,961 ).

There is not a lot you as a webmaster can do to combat duplicate content. As long as there are parasites on the web, spammers, etc. who try to scam a buck off of other peoples hard work because they don't have the skills to come up with an original thought themselves, there will be content theives.

According to Matt Cutts, if embed links in your content pointing back to the original copy of the same content on your site, this will help Google differentiate who the 'real' originator is and prevent them from having to 'guess' at who the originator is based on which version is crawled and indexed by Googlebot first. Of course some content thieves will remove the links... but many are just plain lazy. Otherwise they would have written their own content. So some of the copies of your content will likely retain the links. If they do, it's enough for Google to figure out which URL is the 'real' originator of the content.

Using absolute URLs in all of your links makes it a bit more tedious for people to copy your content as well.

Canonical, Apr 2, 2009 IP

Canonical Well-Known Member

Messages:: 2,223

Likes Received:: 141

Best Answers:: 0

Trophy Points:: 110

#16

wizlor said: ↑

I have done a few autoblogs, all using duplicated content. Here is my findings:

1) Out of 5 new domain, 3 gain PR1 in less than 3 months. I am still waiting for the next PR update to see if it stays.

2) I have also done autoblogs on 3 aged domains. They are all PR0 when bought. After a month, they are now a PR4, a PR2 and PR 0.

3) Not much back linkings have been done except submit to social network, and bookmarkings, and some link exchanges.

4) Some of the better ones are earning 10-50cent per day using adsense. BTW, they are not targetting any high cost per click keywords.

5) All the blogs are indexed. Although i never check how often google indexed all my blogs, but some of the better ones are visit by Google bot everyday.

I understand a lot of people claimed duplicated content don't work. I have also been "drilled" with this kind of mindset when I first learn what is blogging about. If i have never tried, maybe till now, i still have this mindset of duplication can never rank, can never gain traffic etc.

Personally, i have seen blogs with full aggregated content with high PR rank and a good numbers visitors based on alexa ranking. Traffic is not all about Google. There are lots of social network site around. If your blogs give good user experiences, and useful content, i don't see why you don't gain traffics.

Last of all, internet full of duplicated content. If Google is to fully eliminated them, then it will be left with less than 20% of the page indexed in Google. By then nobody will be using Google and it won't be the present Google now. Think about it!
Click to expand...

Your pages going from PR0 to PR4 has absolutely NOTHING to do with whether content is duplicate or original. PR is TOTALLY based on your inbound links. It has nothing to do with content.

Content affects how you rank for a particular keyword. It's used by the ranking algorithm. It's NOT used by the PR calculation algorithm.

And since there are so many other factors that Google is looking at (more than 200 signals total of which some subset is based on the content at a given URL) duplicate content can rank well... it just makes it harder to do so than if you are the originator.

It's virtually impossible for this to occur in the wild, but hypothetically if 2 pages were evaluated by the ranking algorithm and

1) both had the exact same scores for each of the NON-content based ranking factors and
2) one was an exact copy of the other's content

then the originator will win in the SERPS everytime.

Canonical, Apr 2, 2009 IP

DesertWarrior Banned

Messages:: 160

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#17

controversy... controversy

by the way, to the dude making money with autoblogs... shame on you

DesertWarrior, Apr 2, 2009 IP

Canonical Well-Known Member

Messages:: 2,223

Likes Received:: 141

Best Answers:: 0

Trophy Points:: 110

#18

No controversy... Just a lot of people with misconceptions.

Canonical, Apr 2, 2009 IP

wizlor Peon

Messages:: 95

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#19

Canonical said: ↑

Your pages going from PR0 to PR4 has absolutely NOTHING to do with whether content is duplicate or original. PR is TOTALLY based on your inbound links. It has nothing to do with content.

Content affects how you rank for a particular keyword. It's used by the ranking algorithm. It's NOT used by the PR calculation algorithm.

And since there are so many other factors that Google is looking at (more than 200 signals total of which some subset is based on the content at a given URL) duplicate content can rank well... it just makes it harder to do so than if you are the originator.

It's virtually impossible for this to occur in the wild, but hypothetically if 2 pages were evaluated by the ranking algorithm and

1) both had the exact same scores for each of the NON-content based ranking factors and
2) one was an exact copy of the other's content

then the originator will win in the SERPS everytime.
Click to expand...

My message was towards the guy who say it is impossible to rank for duplicated content.

BTW, PR has something to do with content, not just on backlinks, even though backlinks play a more important part. I only know about it from my observation when trying to rank for certain keyword.

The answer to a successful website is "good visitors experiences".

wizlor, Apr 2, 2009 IP

£££ likes this.

£££ Peon

Messages:: 245

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#20

DesertWarrior said: ↑

Hello guys,

We have noticed thru CopyScape that a few websites have literally copied some of our articles. Now, will Google penalize our website for this? Some of our articles are not being indexed by Google, could this be the reason?

It wouldn't make sense if Google penalizes us. We are the original writers of those articles... right? Does Google have a way of knowing which one was the original publisher and penalize all the others? What's the best way of protecting our content from plagiarism?

I have read many contradicting opinions on this matter from different webmasters, so I seek some Expert advice here...

Thanks!
Click to expand...

Google will not penalise you, don't worry about it.

If your proposed situation were true, the article directories would not rank individually as well as they do, nor would a great number of news sites around the internet.

If you require more information on this, may I humbly suggest that you click on my name and go to read my latest, recent posts on this topic in reply to others, asking a near identical question, hence saving me having to write it all out yet again and saving me time.

£££, Apr 2, 2009 IP

Log in or Sign up

Duplicate Content, what is the real deal?

DesertWarrior Banned

contentboss Peon

Komicwords Well-Known Member

DesertWarrior Banned

OnInternetBusinessGuide Well-Known Member

Earnest01 Peon

paganheart Peon

mynetincome.com Peon

BardAzima Peon

mynetincome.com Peon

catanich Peon

DesertWarrior Banned

bigmny4you Peon

wizlor Peon

Canonical Well-Known Member

Canonical Well-Known Member

DesertWarrior Banned

Canonical Well-Known Member

wizlor Peon

£££ Peon

Useful Searches