Duplicate Content, what is the real deal?

Discussion in 'Search Engine Optimization' started by DesertWarrior, Apr 2, 2009.

  1. #1
    Hello guys,

    We have noticed thru CopyScape that a few websites have literally copied some of our articles. Now, will Google penalize our website for this? Some of our articles are not being indexed by Google, could this be the reason?

    It wouldn't make sense if Google penalizes us. We are the original writers of those articles... right? Does Google have a way of knowing which one was the original publisher and penalize all the others? What's the best way of protecting our content from plagiarism?

    I have read many contradicting opinions on this matter from different webmasters, so I seek some Expert advice here...

    Thanks!:)
     
    DesertWarrior, Apr 2, 2009 IP
  2. contentboss

    contentboss Peon

    Messages:
    3,241
    Likes Received:
    54
    Best Answers:
    0
    Trophy Points:
    0
    #2
    ITs unlikely you'll be 'penalized'. However, you may end up not being regarded as the definitive source for your own article. It's not as simple as 'who posts first' because it also relies on when the googlebot finds an example of the text. You can read about the dupe content myth here - dupe content myth
     
    contentboss, Apr 2, 2009 IP
    £££ likes this.
  3. Komicwords

    Komicwords Well-Known Member

    Messages:
    1,016
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    108
    #3
    I think this issue no longer considered important to Google serp system,and I think more years to come less people will care about this,and I believe with only 300 words quality of you within 3 months you will got some one who "stole your content
     
    Komicwords, Apr 2, 2009 IP
  4. DesertWarrior

    DesertWarrior Banned

    Messages:
    160
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks for your replies but I am not sure that google doesn't consider this important anymore... because if that was the case, anybody can just start a website and copy valuable articles like wikipedia's and make profit out of it.
     
    DesertWarrior, Apr 2, 2009 IP
  5. OnInternetBusinessGuide

    OnInternetBusinessGuide Well-Known Member

    Messages:
    330
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    108
    #5
    It looks like search engines are wise about analyzing the page content. It seems they can take fragment of the pages and not just the whole page itself. A lot of websites have fragments from other websites. For example, you can have fragments of ads or citations. May be a simple fragment is not a problem, but what is exactly a fragment? The majority of websites are different because the navigation structures are different: menus and links at the bottom for example. In principle, we should not have the exact same content between two pages, but someone could copy the whole text from another website (without the navigation structure and tables). To the human eye the different is pretty obvious, but from a computer eye it may not be. Does anyone have more information about this aspect?
     
  6. Earnest01

    Earnest01 Peon

    Messages:
    373
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Earnest01, Apr 2, 2009 IP
  7. paganheart

    paganheart Peon

    Messages:
    107
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #7
    You can use duplicate content, if done wisely. If you are running an autoblog of ONLY dupe content, well um.. yeah that's a waste of time. It won't rank and it's not likely to gain PR. Without PR you won't get traffic etc and so on. Sure, you could spin or "wrangle" your articles to seem less duplicate.. but I have yet to find something that works well and is NOT time consuming (yes I've tried content boss too). So I'm back to buying original content or creating it myself. It's the only thing that REALLY works.

    IF you insist that dupe content is beneficial in any way, then be smart (as I suggested above) and add at least %50 original content or some kind of mix. I use articles that are syndicated. But my sites are not full of them either.
     
    paganheart, Apr 2, 2009 IP
  8. mynetincome.com

    mynetincome.com Peon

    Messages:
    31
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I'm also skeptical on just how or if a site is penalized for dupe content. A source I will cite, as i'm a nascar fan, is jayski.com. The guy that runs that site just posts related news stories from all over the internet (simple copy and paste). He cites his sources well and makes no claims about the uniqueness of his content.

    However, his website has a large PR and is extremely popular in google. So much ESPN offered to buy his site. So this leads me to believe that a site's page rank may have a bit to do with who's content is worth more to google than who it actually belongs to.
     
    mynetincome.com, Apr 2, 2009 IP
  9. BardAzima

    BardAzima Peon

    Messages:
    40
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    My understanding, after also doing much research on this topic, is that duplicate content is only a bad thing if practically everything on the site is duplicated. As the nascar fan said, it is not inappropriate for anyone to take any else's article and post it on their site as long as it is referenced. The robots can't tell if it's been referenced or not, and it's obviously not affecting that site's ranking. I think this is because that 'borrowed' article is only one small bit of writing within a site that has a lot of content - I think that's the key. I guess the downside, though, is that if your site does not rank high, and a higher ranking site publishes your article, then they will rank higher in the search engines - even though you wrote it and it's on your site.
     
    BardAzima, Apr 2, 2009 IP
  10. mynetincome.com

    mynetincome.com Peon

    Messages:
    31
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Good theory. I did not look at it like that...Kinda makes sense why a lot of these "scraper" blogs/sites do fairly well. Real good insight...Thanks!
     
    mynetincome.com, Apr 2, 2009 IP
  11. catanich

    catanich Peon

    Messages:
    1,921
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Reality check everyone. Look at your Google Tool Bar. If the TB is gray (graybarred), then some where on that page is a sentence or paragraph that is copied. If you drag and drop each sentence in the search bar, the other site will be displayed.

    Copyscape is good for gross duplications, but the "duplicate content" penalty (graybar) can only be detected by searching on each sentence.

    Our site is indexed every 4 to 5 days by Google, but if I create a brand new page with 100% orginal content, the "scraper" blogs will get there first. Google will give them credit for the content, no me.

    This is one of the major problems currently and Google has not addressed it. Whats worse, they are off shore and no legal action (copyright) can be done.
     
    catanich, Apr 2, 2009 IP
  12. DesertWarrior

    DesertWarrior Banned

    Messages:
    160
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #12
    it's really scary that nothing can be done to get protected...
     
    DesertWarrior, Apr 2, 2009 IP
  13. bigmny4you

    bigmny4you Peon

    Messages:
    113
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #13
    The best thing you can do to protect yourself is to "brand" all of your posts with some sort of signature with your name or business name. I do this with all of my posts and have had no problems so far. Be sure Google 'can' visit often.
     
    bigmny4you, Apr 2, 2009 IP
  14. wizlor

    wizlor Peon

    Messages:
    95
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #14
    I have done a few autoblogs, all using duplicated content. Here is my findings:

    1) Out of 5 new domain, 3 gain PR1 in less than 3 months. I am still waiting for the next PR update to see if it stays.

    2) I have also done autoblogs on 3 aged domains. They are all PR0 when bought. After a month, they are now a PR4, a PR2 and PR 0.

    3) Not much back linkings have been done except submit to social network, and bookmarkings, and some link exchanges.

    4) Some of the better ones are earning 10-50cent per day using adsense. BTW, they are not targetting any high cost per click keywords.

    5) All the blogs are indexed. Although i never check how often google indexed all my blogs, but some of the better ones are visit by Google bot everyday.

    I understand a lot of people claimed duplicated content don't work. I have also been "drilled" with this kind of mindset when I first learn what is blogging about. If i have never tried, maybe till now, i still have this mindset of duplication can never rank, can never gain traffic etc.

    Personally, i have seen blogs with full aggregated content with high PR rank and a good numbers visitors based on alexa ranking. Traffic is not all about Google. There are lots of social network site around. If your blogs give good user experiences, and useful content, i don't see why you don't gain traffics.

    Last of all, internet full of duplicated content. If Google is to fully eliminated them, then it will be left with less than 20% of the page indexed in Google. By then nobody will be using Google and it won't be the present Google now. Think about it! ;)
     
    wizlor, Apr 2, 2009 IP
  15. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #15
    contentboss is correct... There is no penalty for duplicate content. A penalty would prevent duplicate content from ever making it to page 1 by forcing your URL far back in the SERPs or removing it from the index all together. But duplicate content can make it to page 1.

    Basically, the version of the content that is found first by Googlebot 'typically' gets deemed the original... all subsequent copies discovered by Googlebot 'typically' get deemed duplicate. I say 'typically' because there is a way according to Matt Cutts to help Googlebot figure out which really is the originator... more on that later.

    There are 200+ factors about a URL - it's contents, it's inbound links, the pages that link to it, the relevance of the pages that link to it, the site's trust, etc. - that Google's algorithm looks at each time it ranks URLs for a particular keyword to display in the SERPs. All of the factors that are based on the content of the page (things like <title>, <h1>...<h6>, keyword density, position on page, etc) are devalued for the duplicates while the originator gets full credit...

    Note: This does NOT mean that the duplicates cannot rank well. Duplicate content can actually outrank original content. However to do so the URL flagged as duplicate will need to have a higher overall ranking score than the URL flagged as the original. This means they have to score substantially better on the NON-content related ranking factors. The most obvious way to do this is for the duplicate URL to have a much stronger backlink profile.

    I disagree with everything in this statement except, "I believe with only 300 words quality of you within 3 months you will got some one who "stole your content". And while I can't quite make out what the poster was trying to say, I'm guessing they were trying to say that if you write good content, someone is probably going to copy it within 3 months time.

    I disagree with the first two sentences though. Google is constantly battling duplicate content. They don't want it showing up in their SERPs unless someone has added value to it...

    Google is constantly changing their algo to deal with duplicate content. They would not have this notion of original and duplicate content baked into their algorithm if they no longer cared about duplicate content. Google would not have hired PhDs and spent likely millions to come up with algorithms to detect complete and partial duplication of content. They would not have spent tens of thousands or hundreds of thousands of dollars to patent their algorithms for detecting duplicate content (I believe they have several but here is one of them for your viewing pleasure http://patft.uspto.gov/netacgi/nph-...7,158,961.PN.&OS=PN/7,158,961&RS=PN/7,158,961 ).

    There is not a lot you as a webmaster can do to combat duplicate content. As long as there are parasites on the web, spammers, etc. who try to scam a buck off of other peoples hard work because they don't have the skills to come up with an original thought themselves, there will be content theives.

    According to Matt Cutts, if embed links in your content pointing back to the original copy of the same content on your site, this will help Google differentiate who the 'real' originator is and prevent them from having to 'guess' at who the originator is based on which version is crawled and indexed by Googlebot first. Of course some content thieves will remove the links... but many are just plain lazy. Otherwise they would have written their own content. So some of the copies of your content will likely retain the links. If they do, it's enough for Google to figure out which URL is the 'real' originator of the content.

    Using absolute URLs in all of your links makes it a bit more tedious for people to copy your content as well.
     
    Canonical, Apr 2, 2009 IP
  16. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #16
    Your pages going from PR0 to PR4 has absolutely NOTHING to do with whether content is duplicate or original. PR is TOTALLY based on your inbound links. It has nothing to do with content.

    Content affects how you rank for a particular keyword. It's used by the ranking algorithm. It's NOT used by the PR calculation algorithm.

    And since there are so many other factors that Google is looking at (more than 200 signals total of which some subset is based on the content at a given URL) duplicate content can rank well... it just makes it harder to do so than if you are the originator.

    It's virtually impossible for this to occur in the wild, but hypothetically if 2 pages were evaluated by the ranking algorithm and

    1) both had the exact same scores for each of the NON-content based ranking factors and
    2) one was an exact copy of the other's content

    then the originator will win in the SERPS everytime.
     
    Canonical, Apr 2, 2009 IP
  17. DesertWarrior

    DesertWarrior Banned

    Messages:
    160
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #17
    controversy... controversy :)

    by the way, to the dude making money with autoblogs... shame on you :)
     
    DesertWarrior, Apr 2, 2009 IP
  18. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #18
    No controversy... Just a lot of people with misconceptions.
     
    Canonical, Apr 2, 2009 IP
  19. wizlor

    wizlor Peon

    Messages:
    95
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #19
    My message was towards the guy who say it is impossible to rank for duplicated content.

    BTW, PR has something to do with content, not just on backlinks, even though backlinks play a more important part. I only know about it from my observation when trying to rank for certain keyword.

    The answer to a successful website is "good visitors experiences".
     
    wizlor, Apr 2, 2009 IP
    £££ likes this.
  20. £££

    £££ Peon

    Messages:
    245
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Google will not penalise you, don't worry about it.

    If your proposed situation were true, the article directories would not rank individually as well as they do, nor would a great number of news sites around the internet.

    If you require more information on this, may I humbly suggest that you click on my name and go to read my latest, recent posts on this topic in reply to others, asking a near identical question, hence saving me having to write it all out yet again and saving me time.
     
    £££, Apr 2, 2009 IP