What criteria does google use to determine if a webpage has duplicate content from another site? Does google use something similiar to Copyscape?
Well Softstor as far as i heard they have their own technology to see if any site is using duplicated content.. but, its not sure how much of % is counted as Duplicated, i personally think if its >50% .. you will be penalize for duplicated content... it was somehwere on seomoz.com but dont remember the link sorry
My own personal opinion is it has to be downright identical for it to be penalized HOWEVER the 50% rule is a good one to follow. Dave
I personally do not believe that google penalizes duplicate content. Speaking from experience as a searcher though. I've been seeing lots of duplicate content out there.
I have noticed if you copy a few sentences from one site, Copyscape will pick this up. How sensitive is google?
I know that Google does penalize sites for having duplicate content, I have seen it. I can't post any links but I do know that duplicate content can get a site knocked out of Google.
this might not be a good example but how about lyric sites? don't they basically have the same content? again, this might not really be a good example but it shows my point.
Although Google can penalize duplicate content, from what I've seen it has to be darn near identical. Look at how many scraper sites contain nothing but duplicate content, indentical word for word from the original that gets and remains indexed. Dave
If anyone can actually figure out, and prove, what the criteria is for duplicate content in google I'll send you my next *affiliate check. Footnote *don't expect to retire *you might be able to go to the movies or something though
I'm curious about this too, I'm opening an article website and some of my content will be coming from my forum. I wonder how they'd see this ...
I think that they are looking for phrases similarities hey just search google for a normal phrase, only 4-5 words, and you will see how little results you will get! For example search for "I want to leave New York" http://www.google.com/search?source...:2006-11,GGGL:en&q="I+want+to+leave+New+York" see it?! got it?! I think they are able by doing what in programming it's called backtracking to detect any duplicated content on the web ! just creating some 4-5 keywords phrases! G is very smart! and can also detect your website "template" from your website page. So it will only read & count your original page content, not the text &menus (like a news box) site-wides ... everybody saw that?! Now, here is a trick to not let google think your website have dublicated content. If you have 1000 pages, then all keywords like : word1,word2,word3,word4 and then word2,word3,word4,word5 must to be compared. If on the internet are let's say 300 billions indexed pages then google will must to do If you got 300 word on a page, and you take word1,word2,word3,word4 and then word2,word3,word4,word5 , etc , there are 300 * 4 = 1200 arangaments 300 billions * 1000 * 1200 = quite a little more calculations (i also find the thing with pages on cache, and pages without on cache also resolv because with the problem of so many calculations) I saw that google is using a quick trick for not wasting to much time with processing data. It's really looking at website title + number of links! If you got the same title in your website + the same number of outgoing links! it will usualy mark it as dubilicated content ! if you write an 300kb uniq content, it will think it's duplicated!