Google and Duplicate Content

Discussion in 'Google' started by deriklogov, Nov 17, 2007.

  1. #1
    Hello,

    I was wondering does anybody know how google track duplicate pages over the sites ? Any idea how does they compare it ? Do they compare number of words over a page or the whole text ?
     
    deriklogov, Nov 17, 2007 IP
  2. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,078
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #2
    so big forum and no ideas ?
     
    deriklogov, Nov 18, 2007 IP
  3. gr8liverpoolfan

    gr8liverpoolfan Notable Member

    Messages:
    6,719
    Likes Received:
    538
    Best Answers:
    0
    Trophy Points:
    285
    #3
    They look at the page structure. ( That includes everything- the look of the page, the content in it etc)

    The supplemental index doesn't exist now, but it used to be one way of checking whether your page was considered to be duplicate by Google.
     
    gr8liverpoolfan, Nov 18, 2007 IP
  4. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #4
    It "probably" stores a page as a hash value that's determined by keyword placement, proximity, page size, words per page etc. When two pages have a matching or similar hash it's determined as duplicate.
     
    sweetfunny, Nov 18, 2007 IP
  5. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,078
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #5
    deriklogov, Nov 18, 2007 IP
  6. Divisive Cottonwood

    Divisive Cottonwood Peon

    Messages:
    1,674
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    0
    #6
    A couple of months ago I tried a couple of these 'pay x and have your article on 300 different blogs' offers that you see in the Buy, Sell or Trade forum.

    At first, Google webmaster tools showed up to 300 different inbound links from the same article to the pages that I wanted to point at.

    Over the couple of months since I had them published the different incoming links in Webmaster Tools have now been whittled down to dozens if not single figures.
     
    Divisive Cottonwood, Nov 18, 2007 IP
  7. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,078
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #7
    I think its a bit different , my subject is more about google and supplemental index.


     
    deriklogov, Nov 18, 2007 IP
  8. oseymour

    oseymour Well-Known Member

    Messages:
    3,960
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    135
    #8
    That's a really simple way of determining duplicate content. you are not giving google enough credit, I'm sure they have a complex enough algorithm to determine whether parts of an individual paragraph is a duplicate...

     
    oseymour, Nov 18, 2007 IP
  9. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Duplicate documents and Supplementals have very little correlation, except having duplicates usually results in less people linking to your version.
     
    sweetfunny, Nov 18, 2007 IP
  10. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,078
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #10
    that script which I show you is not really that simple, its not counting words , try to play with it , and you will see that is not that simple.
     
    deriklogov, Nov 18, 2007 IP
  11. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,078
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #11
    Well the main reason for getting to supplemental content is duplicate content.

     
    deriklogov, Nov 18, 2007 IP
  12. gr8liverpoolfan

    gr8liverpoolfan Notable Member

    Messages:
    6,719
    Likes Received:
    538
    Best Answers:
    0
    Trophy Points:
    285
    #12
    Having duplicate content was not the only reason why sites landed up in the supplemental index. Weak pages, having no authority and weight, used to land there too.
     
    gr8liverpoolfan, Nov 18, 2007 IP
  13. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,078
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #13
    Well, I would say its more like duplicate content, at least for me I know for sure that it is duplicate content.
     
    deriklogov, Nov 18, 2007 IP
  14. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,078
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #14
    So any idea how to find similar tool which check duplicate content like google or compare tool that I show you at the top to google ? find something in the middle
     
    deriklogov, Nov 18, 2007 IP
  15. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #15
    That's simple not true at all.

     
    sweetfunny, Nov 18, 2007 IP
  16. oseymour

    oseymour Well-Known Member

    Messages:
    3,960
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    135
    #16
    No thanks...I don't have the spare time to look at the script, plus google has more complex algorithm and a lot more processing power than any simple script can and ever will........I would be surprised if any such tool existed, no one knows how google checks for dupes....I have seen some people use www.copyscape.com to check for dupes but it is right a low percentage of the time.
     
    oseymour, Nov 18, 2007 IP
  17. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,078
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #17
    www.copyscape.com Well that site is not always find duplicate content, I think they doing more like counting words


     
    deriklogov, Nov 18, 2007 IP
  18. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #18
    The best tool to find duplicate content is Google itself.

    Your sentence is unique
     
    sweetfunny, Nov 18, 2007 IP
  19. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,078
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #19
    So basicly, there is no way to see which pages google count as duplicate and which ones dont ?
     
    deriklogov, Nov 18, 2007 IP
  20. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #20
    As i said, take an unique string of text from the document and Google it in quotes. Google will show all pages containing that exact string.
     
    sweetfunny, Nov 18, 2007 IP