How does a search engine decide which duplicate to show in search results? Following up in my series of articles about duplicate content, it seems fitting that we would next discuss how search engines determine which article to show when they have dozens or even hundreds of duplicates to chose from. Lets start with a question we have all thought about at one point or another. A question that our past two days articles have been leading up to. “How does a search engine decide which duplicate to show in search results, and which ones not to show?†How do they choose? Pagerank? First one published? Shortest url? Article with the most links? It doesn’t seem to be any one signal. It’s not pagerank alone, or distance from root directory. It’s probably not the first one published, because many sites are dynamic, and the time stamp on the original may be later than on the copy, and the first copy spidered might be the one the search engines think is the oldest. It doesn’t appear to be perceived authority. It could have something to do with the number and quality of inbound and outbound links from a page. It could be a mix of all of those things and others. So what is it then? Lets dive into some research papers and find out! . . . .
Page rank seems to be the first and most important factor. But you're right with your suggestions. There are some more issues token in consideration.
I doubt very much search engines would use the timestamps provided on a website as these are easily faked. Almost certainly a search engine would count the first copy it spiders as the original. Although its difficult for a spider to figure out the original if it finds 2 copies in a short time frame. Thus by my reckoning the only other (almost) reliable indicator that a spider can use is looking at which copy has the most links, as the original version is more likely to have gotten the majority of references.
I have some experience with that. Once I wrote an unique article in some unknown article directory and left it there for a while. It got cached in some weeks. When I searched for it in google I saw somebody else also published it - there were 2 results total . Then I copy-pasted it in an old blogspot. And you will probably guess - the blogspot toke the lead in just few weeks. So it is the trust involved for sure.
Hmm thats not too conclusive though. Remember, blogspot is owned by Google they might give their own pages bias. For instance Youtube videos often rank higher than they should and its been reported that Google Knol pages are also doing better than they should.
My view is that there's a lot of stuff that goes into this although I wouldn't use PR here I'd use site rep which actually is the real PR. Toolbar PR is just a ruse. In addition the normal stuff that would rank a site higher on a search anyway factors in pretty big. Ken King Cobra Poker
All search engines are striving to create good user experiences for people who search using their services. all of them want to avoid duplicate results filling up the early spots on search result pages.