I have seen a lot of discussion on this topic.....there is a view that Google is currently penalising sites that have duplicate content on there pages. My question is.....what constitutes duplicate content. There is a lot of advice about various products out there. It is a reasonable assumption that a lot of sites carry the same articles. Health related articles come to mind straight away. Several sites could have the same article from a source in relation to symptoms and strategies to deal with....acne or mumps spring to mind. Recipes are another area where duplicate content could apply...moms apple pie recipe could be on several sites. There are millions of examples where this can quite legitimately happen. Is this duplicate content ?. How does Google decide what is and what isn't duplicate content ?. Where is the line drawn ?. Cyclops.
I've been asking this question for months ... I hope someone out there can shead some light on this...
This is sometimes a big gray area. I'm sure you will see a lot of opinions about this issue. Here's my viewpoint. Identical pages (code and content) under either the same or different domains are a clear candidate for a duplicate content penalty. Syndicated articles appear to bypass the penalty, which leads me to believe that if you frame the content in a different code structure, you probably won't see the duplicate page penalized. I sometimes post similar articles on multiple Web sites, but I alter the content on each so that they are different. I think this adds a layer of protection.
If you're worried about a specific pages you can use this tool to check how similar they are as a percentage... http://www.webconfs.com/similar-page-checker.php
we have one site that appears to have been hit with this filter. Placing &filter=0 at the end of a google search query reveals the site at position #7 for a certain term....without it, it is nowhere to be seen anymore (used to rank for a number of terms, now it ranks for none). I noticed yesterday that Gbot was grabbing pages without the www. in front of the domain name as well as with our secure CName (secure.domain. com). It was grabbing regular pages, but putting secure. in front or dropping the www and doing the same. I am hoping this is why they have hit us with some kind of penalty. Yesterday, we placed a 301 redirect to send non www. requests to www and excluded any bot from grabbing pages with the secure cname. Hopefully, this will clear things up soon. Last year we were hit with a similar penalty as we had 2 domains pointing to the same content (we had forgotten about) and eventually G penalized it. After 301'ing one to the other, the problem cleared up, but took a number of months to get back to "normal". Hopefully, this will not take as long.
We share a lot of content that we write ourselves between sites that we run ourselves. Is there a way to tell if you have been penalized already?
I was checking a couple of my keywords this morning and noticed this site: www.odlp.com/go.php?id=95 Which I guess I was added to at some point during a link building stent. Did a Live HTTP Headers (firefox) and sure enough, its a 302 redirect. Would this cause a duplicate content penalty? The title and everything is identical to my site. What could be done to take care of this? I guess I could simply have them remove me from the directory. Any thing else?
I have done extensive testing with duplicate content and have come up with very little in the way of useful results. What seems to be the case is this: 1. Google can detect duplicate content down to the sentence level and penalize for it. 2. The choice of which site gets penalized appears somewhat random. 3. MSN and Yahoo are significantly less savvy than Google at detecting and supressing duplicate content. Right now I have two web sites which swap about 20 SERPS a day. Each day the first site goes from page one to nowhere on about 20 terms and the second site goes from nowhere to page one for those terms. Similarly, the second site goes from page one to nowhere for a different set of terms and the first site goes from nowhere to page one for those terms. This is not the behavior I was seeing a few months ago, when the first site lost all SERPS, the second site won all SERPS, and the results didn't change until I redesigned both sites and started the war all over again. Very occassionally, both sites will show up on page one. It's a mystery.
I had one site that is now a redirect to our current page take the top rankings for a day while the other dropped to outer darkness. Its all back to normal, but kind of weird.
Will.Spencer, I'm just curious about the test you did, how much of a duplicate are your test subjects to each other? In my opinion, the SEs as of present are only capable of restricting the same content over and over again for a certain search. It's not a penalty but more like a filter. They don't de-index a site because it has the same articles with another site. I think SEs understand articles and how it works for a site. They'd pick site/s that they feel are important and then that's the only things they list in serps. Sometimes, your page would appear, sometimes it wont. It's not a penalty but would feel like it. And there's a certain percentage to how similar a page can get to another page (code and content wise). If a page is the exact duplicate of another, then that's something else. And i'm guessing it would be in Google's best of interest if we all still believe that there is a duplicate content penalty. Of course, this is just from my viewpoint and other people's opinion will vary and i could be totally wrong.
Mine are fairly duplicate. The differences are: 1. On one site 15 forum post subjects are done server-side in PHP, so they are seen by Googlebot. On the other site the post subjects are displayed client-side by JavaScript, so they are not seen by Googlebot. 2. One site is displaying 5 co-op ads. The other site displays none. 3. Both sites run a server-side PERL script which displays one random link on each page, so no two pages will ever be the same. 4. One site is frequently updated. The mirror site is only mirrored occassionally, so the amount of difference varies based upon how long it has been since I've allowed the mirror to update. From my experience I am led to believe that Google's duplicate content filter is amazingly intelligent. However, I then look at the spammers at Answer.com and their cheesy mirroring of Wikipedia and I am led to the exact opposite conclusion. In summary: I have no idea.
You could be sued for copyright infringement -- but copyright laws are effectively impossible to enforce unless the copyrighted material is worth millions of dollars.