I am getting more confused the more I learn the less I know. I have read about duplicate content for months. At first I thought this meant WITHIN my site, ie having multiple pages with the same information posted. Then I read that having the same content on multiple sites is duplicate content. I am not sure which of these is correct. I can understand the 'spiders' finding duplicate content within the site. But with MILLIONS of active websites how, even with the great databases available, would Google or anyone else for that matter be able to identify duplicate content on what has to be hundreds of million of pages on the web. And every page is slightly different to begin with from an html aspect. Can someone please help. Thanks
If you've been reading about duplicate content for "months" and you still haven't figured it out, what makes you think that some replies in this thread are going to enlighten you?
As far as i know having google index your site as example .com and as www example.com is duplicate content also.
Hi Guys, Let me add my 2p worth. Duplicate content means having largely the same or exact content on one page which is also contained on another, whether it is on the same site or on any other page on the web. In the case of the latter, its "who got there first" is the winner. Those which the search engine deems came second may get penalised. Basically, every web page where ever it is should be unique. The point by isolete above is correct although most search engines will allow for this and choose one or the other and not penalise the site.
From what i can understand (even though am i confused due to the high nature of contrasting threads) Duplicate content can be from other people re-posting your content, or you re-posting somebody elses content exactly word for word. I believe you can use copyscape or something to check for this. However you can have duplicate content issues within your own site, for example if your site can be accessed from http://yoursite.com and www.yoursite.com your backlinks will be shared out also. Also there are known duplicate content issues within WordPress for example archives. However from reading Matt Cutts blog i think he suggests that 1-3 duplicate content on your site you shouldn't really worry about it if i remember right. I am no expert this is only my basic knowledge.
Duplicate content is compare all the pages on the search engine's database.not one site but all the site
Correct sort of, it should be 2 different URL's having the same content or largely the same content. Incorrect, who gets content indexed first has nothing to do with anything. Pagerank is generally the determination on which URL gets filtered or not. Also there is no "penalty" for duplicate content, one URL may be subject to a "filter" to ensure only one version of a document is shown in the results.
you do not confuse between duplicate content. The same content on a single websites multiple pages is called as duplicate content and like that the content which is being copied and posted from another website to our own website is called as duplicate content according to google. But the website which is going to be cached first for the same content will be treated as the original one of publishing the content either the second one is the real owner of the content. http://mobilestormer.wordpress.com
Completely wrong, why do people keep saying this rubbish about being cached first? It has NOTHING to do with anything, you can have a document online and cached for 10 Years and someone can copy it and outrank you for it. Also duplicate content on the one domain, or on two domains is treated the same. The result will be one is filtered, and this is largely based on the documents authority and NOT what is cached first.
I have gone through both these websites, Both of are having PR 5, could you please let me know why G is not penalized them?
Decided to search Google for how they determine and treat duplicate content. This is what I found. http://googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html
Based on that theory, how would this bode for article directories? Their purpose is to distribute articles to websites that request them. The article content is going to be the same on all websites that post the content on their sites. So the search engines would look at all of the websites that have an article on say "How to Change a Lightbulb" and penalize most of the sites that post said article on their website?
I personally think it works as if you have the original article you are less penalized, or perhaps not penalized at all. I don't believe I have ever seen a website that only scrapes content ranked very highly in the SE's. I know ezinearticles.com still has their articles weighted heavily by Google. However, I had a article on there that was copyed by several blogs and eventually my article disappeared from the #1 ranking. ezinearticles is still looked at as having quality content according to Google, but perhaps those articles that have been copied lose their link juice after a while.
Why do people find this so difficult to understand? There is no penalty theres a filter, and there is no original article theres a most authorative article.