No, it doesn't work that way, and that leads me to today's article on the subject: How does a search engine decide which duplicate to show in search results?
Well, it's more complex than that. Duplicate content can be checked at the time a search is performed, and within a result set, but it's done by comparing shingleprints from the pages. Think of each page you put online as a finger and think of duplicate content checking as fingerprinting. When the fingerprints are a close match across a number of documents, they are considered duplicates. My new article I just put up covers how they may treat these duplicate documents when you actually perform a search.
Hmm, interesting read, thanks for sharing - although I'm not conviced either way yet. The argument was in part always whether duplicate content will get indexed (the penalties don't really interested me, if they even exist). And so far I haven't seen this occuring unless the content has been spun sufficiently. If you can show me 10 identical copies of content on different domains which have all been indexed by Google then I might reconsider. Social bookmarking sites mostly only have snipets of the original content together with unique comments/summaries so I'm not sure whether you can consider that duplicate content.
Duplicate content has no problem getting indexed. Read my new article posted today for an explanation of what happens after indexing.
I agree. Perhaps in general that is the case, because the original post is likely to come from a very established and search engine friendly site (hence all copies of it) but this is because of those other factors, nothing to do with it being indexed from them first.
I was thinking the same thing.. But i thought, if people are getting a news RSS feed in PHP format on their site..Or any feed for that matter.. Then isn't that classed as duplicate content..?
If this leads you to believe that you can now start copying stuff and duplicating it on your site, give it a rest: dupe content reporting still works. As Blackhat360 put it, it's difficult for the SEs to determine what's dupe and what not, so it leaves the choire to the average joe webmaster.
Start reconsidering then. These are all articles written many months ago, all still indexed in Google and all duplicate. You can find millions of examples of this if you bother to look.
Sure, reporting works, but that rests on the individual webmaster which in reality means that 999 out of 1000 times you can get away with it. Information moves far too fast these days.
Hmm... I've always been a follower of standard rules and I know next to nothing about different shady techniques. I've tried to steer clear of duplicate content as much as I can without actually knowing if it really exists or if its possible. Reading this article made me think more about this.. hmm..
hi there I enjoyed your article I can only speak from my own experience. I had a site deleted from google and yahoo index for three months and the only reason there could have been for this was duplicate content
This is exactly what I have been thinking. I have an arcade script that scrapes other sites for games and I've had a few people freaking out over "Google duplicate content", yet look at how many other sites have the games.
I am currently overseeing several sites that (legally) aggregate RSS content with no noticeable "duplicate content" issues. There is a little additional, original, quality niche content on those sites as well, however, which might help. My take on it is that if the site as a whole provides additional value over and above the syndicated (duplicate) content, then SE's will be happy.
I think google is working hard now to put an end to duplicate content, sooner or later it will be panelized