Hello, Ive heard that if you have content that is the same as another site, it affects your PR. What im wondering is, is it based on which was crawled first by google? ie: I write content - google crawls - Google saves the content somewhere and compares it with the billions of other sites that it crawls? I read on a site somewhere that google has "4,285,199,774" sites indexed. So wouldnt that mean for every site that it crawls it has to compare all the context on the entire site with 4,285,199,774 other sites? That would mean that after a full crawl, it would have done this context comparison 18,362,937,103,089,651,076 times....So let's just say it takes a millisecond to do each comparison for every site, that would take 581,898,790 years. Well one can argue that they probably just have ultimate computers.. Ok let's say it takes them a millionth of a second, that would still take 581,899 years... That can mean 1 of a few things: 1: Rumor 2: They have a category system that each site is assigned to when crawled, and it would only be compared to that, -Which would still be a bit unbelievable-, but more likely. 3: They file each site by date and compare it to other sites that have been crawled within a certain time span. 4: They have super google bio technology surpassing all others. I also wonder, would google base duplicate content on who wrote it first? What exactly would their method be of doing so? Would both get a pr penalty? Maybe it is based on which site was crawled first? It makes me wonder how much truth there is behind some of the google talk.
It won't affect PR since PR is only based on links. Some think the higher PR page 'wins', the lower ones get put as 'supplemental'. Others reckon G will regard the one crawled first as the original which is probably true in 95% of the cases. Only Google knows. You can easily avoid the issue by only having unique content.
sorry but I dont think you read my entire thread, because I dont really see how what you replied with has anything to do with my original post. Thanks for your thoughts nevertheless.
No, it doesn't work like that because a db is... omg ... well let's just say a spider can find duplicate content in a split second, it doesn't have to look through each and every page for it... If it did, our searches would take forever, too I don't know exactly why, it's just the way it is.
You ask about 6 questions. I answered the first one. Got to leave some for the rest here to show off their knowledge
I guess duplicate content is detected when executing a query not when indexing. Otherwise it would take an eternity to find the duplicate content which can even be a duplicate sentence, imagine how long this will take. Looks like google is not doing very well in detecting the duplicate content between small sites. don't expect to get credit from the content of wikipedia.. duplicate content can't be detected based on PR. Most likely on their cache.