Hi everyone, I am in the final stages of a new travel related website that is about to embark into the world wide web. The site allows visitors to rate and review local hotels and condominiums in my local city (much like the way Amazon lets you review products). The site contains lots of fresh and original guides that have been written for people interested in exploring the city. However, I am concerned about one thing: Content Duplication. I am importing hotel and condo data from hotel affiliate websites that make the content available for people like me. No doubt, there have been thousands of other people copying and pasting just like I am doing. Will Google penalize my site for copying and pasting this content that hotel affiliates make available? Does anyone know if I should re-write this content in my own words rather than just copying and pasting the info from the affiliates? There are like 100s of hotels and condos descriptions that I'd have to re-write. Any thoughts? Tips? Opinions? Thanks.
I actually have sod all duplicated content on my sites but on a technical level have wondered about this issue. As I see it to detect duplicated content, google would have to contextulise each page into content units and generate a checksum to cross check with other sites. As aside from how big a task this is, there is nothing hard about them doing this. But do they really do it?
It's actually remarkably difficult to do. If you generate a checksum, then each page will have a unique checksum. What the SE's have to do is decided which part of the code is the content and which part is incidental (like navigational, adverts etc). By making sure you mix around your sites (compared to other affiliates) then it is incredibly difficult to say whether they use the same content or not. This can be seen with te various article sites around, essentially they (and the sites that use the articles) have the same content. But by using different layouts and incorporating extra content it becomes very difficult for the SE's to decipher that the content isn't unique.
There's no penalty for copying parts of text. However, when someone searches for keywords that are contained in the syndicated text, Google will filter out all duplicates except the one page that it considers most important. This is not a penalty, Google just does not want to return pages that give the same info to surfers.