Does anyone know how much content duplication is allowed by Google, in terms of word frequency? I use Similar page checker to establish the % of similarity, but there are no data on how they obtain the value. I have a page generator and I need to adjust it for optimal results.
as a rule of thumb: when you ask yourself such question, or when you have to calculate the content of your content then most likely you have NO self created unique content in ANY self created unique content there never is any need to count or verify the keywords or similarity - human creativity assures all is different each time you publish a new page. be creative - use and deploy your God given creative potential - there are billions of unique pages than can be written each day to meet the information hunger and information NEED of all globe
Nobody can answer your question except the Google guys.. Do not think in the context of Search Engines, just make sure your users do not have that 'duplicate content' feeling when they are browsing your site.
70% of the content has to be different or 70% can be the same? I remember reading something over at seochat.com about the threshold for duplicate content being very low and that you only need 30% of original stuff. However I don't think it's as cut and dry as a percentage. I think google looks at many factors such as link popularity, grammer, html structure, outgoing links etc.
they do.... most likely but know one knows what each of the factors way per each of those factors you mentioned. most people seem to think it's all based on text only.
Are you implying that images and layout can be classed as duplicate content? I can sort of understand images - as google doesnt want to index layout-related images such as logos and menus... Although could this apply to a gallery, for example, where thumbnails and images are repeated throughout a site?
What does Google compare your content with anyways? There are so much information out there on the web and google probably have indexed billions of pages.
Try copyscape.com on other sites and then on your own site. You will know what to do and what not to do.
Hey troops…thought I’d chime in. To start with Duplicate content issues are a FILTER per se not a penalty. So having dupe content is not necessarily going to tank you. You may not rank for the page in question, but that’s another story. Now, if your site meets other criteria ( dupe content aggregate, spammy techniques etc..) then it could certainly become a penalty, but without a large degree of dups on your site (aggregate) there is not a lot to worry about really. The other topic mentioned here relating to page structure/layouts is ‘page segmentation’ aspects of the algo. They most certainly understand page segmentation and it can be called into play during the retrieval stages. While not entirely constructed for dupe detection (can U say editorial links?) it certainly has the potential to be inclusive of those operations