Hi all! Does anybody know how Google finds duplicate content? For example, I have the same two articles. Google find it duplicate content. How many % of article text I have to change to become them not duplicate? I'm interesting in Google algorithm of finding duplicate content. Where can I read about it?
Just like everything google does, I am sure there is an algorithm is involved. Will it hurt your efforts having duplicate content? Probably... but not for just 2 articles. All you would need to do is just re-word the document. Make it seem like a different writer wrote the article. Make the articles different lengths. If one is 1000 words, then make the other 1200 words. If you add to it and re-word some of it, it should be unique enough that it isn't considered duplicate content. But this is just me speculating.
I don't think it's posible to compare word by word millions of pages every day. I know, to make a new article is easy. You have to add some text, use some synonym, change some words. But does anybody know how many changes you have got to do to make it unique? I think, there's some% of changes in text. And how can I check duplicate content? Only in supplemental index or there's any other way?
My take is that as long as it passes a Copyscape check, you've probably redacted enough for avoiding a duplicate hit. Anything less than that... who knows?
So what do you think they do every day, just counting incomes? I say - there is possible to have one or two sentences same like the other one - but the more, possibility is less and then it can start to be suspicious. The second thing is that, they have probably quite sophisticated alghoritms, to filtering and comparing data with multiple passes etc... but no one knows that, so it's not easy to answer. If you like to know, start couple of blogs and check it by yourself.
I think 30-40% of uniques on dupecop.com is enought. It can compare uniques of article a to article b
normally google would not penalize you for duplicate content if you link back to the original one if there is any but if you are trying to change do it to about 25-30% and you should be ok
Thanks for the link eyeflare. I'm pretty new to this whole website promoting thing but I've learned quite a lot on this site already. Thanks
are both sites on the same server? Google might find them dups because they reside on the same IP. I also think it depends on the keyword and the phrases leading up to it and after it for google to call it duplicate with their algorithm. So my suggestion would be to analyze your keywords that you are wanting to be indexed by and only worry about changing those sentences. Copyscape, although great, will drive you nuts if you try to get a "no results found" pass from them as they will tell you about sites that only match once sentence completely. So you would have to change the whole article to make it appear as "no results found".. Take care!
Google is using a tool for knowing duplicate contents. try this one as an example.... duplicate content checker tool. I have used it and found it very useful in knowing if someone else duped your content or if you have duped your own. reminder: The tool is not always that accurate but nevertheless, it will give you some results on your query.