’ve read seemingly hundreds of forum posts discussing duplicate content, none of which gave the full picture, leaving me with more questions than answers. I decided to spend some time doing research to find out exactly what goes on behind the scenes. Here is what I have discovered. Most people are under the assumption that duplicate content is looked at on the page level when in fact it is far more complex than that. Simply saying that “by changing 25 percent of the text on a page it is no longer duplicate content†is not a true or accurate statement. Lets examine why that is. To gain some understanding we need to take a look at the k-shingle algorithm that may or may not be in use by the major search engines (my money is that it is in use). I’ve seen the following used as an example so lets use it here as well.. . . . . . Read the rest here: Duplicate Content Dissected
No matter what algorithm is used. As we all know that stop words are eliminated and action words are the valued ones. Good article mate.