Google's patent on duplicate content detection

Discussion in 'Google' started by toomm, Feb 6, 2007.

  1. #1
    Recently, Google was awarded a patent to address the duplicate content issues. The patent is called 'Methods and apparatus for estimating similarity' and the abstract reads as follows:

     
    toomm, Feb 6, 2007 IP
  2. frankcow

    frankcow Well-Known Member

    Messages:
    4,859
    Likes Received:
    265
    Best Answers:
    0
    Trophy Points:
    180
    #2
    so therefore...
     
    frankcow, Feb 6, 2007 IP
  3. Anita

    Anita Peon

    Messages:
    1,142
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I wonder if this is a problem for some bloggers ... I guess the question is: what happens if a site is determined to be similar to another? And possibly worse, what happens if yours is similar to someone else's, but because they are indexed by Google more frequently your site is marked as the duplicate?

    Would love to know ...
    Anita :)
     
    Anita, Feb 6, 2007 IP
  4. frankcow

    frankcow Well-Known Member

    Messages:
    4,859
    Likes Received:
    265
    Best Answers:
    0
    Trophy Points:
    180
    #4
    let's just say that this technology is not rock solid yet
    I have a purely automated blog, and it has 4300 pages indexed in google...
     
    frankcow, Feb 6, 2007 IP
  5. amnezia

    amnezia Peon

    Messages:
    990
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Just because the pages are indexed doesn't mean they'll rank for anything
     
    amnezia, Feb 6, 2007 IP
  6. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #6
    thegypsy, Feb 6, 2007 IP
  7. ajsa52

    ajsa52 Well-Known Member

    Messages:
    3,426
    Likes Received:
    125
    Best Answers:
    0
    Trophy Points:
    160
    #7
    What do you think they want to detect PRIMARILY ?
    Duplicated pages on same site, or
    Duplicated pages on different sites.

    My vote is for option 2
    .
     
    ajsa52, Feb 6, 2007 IP
  8. lpstong

    lpstong Notable Member

    Messages:
    3,292
    Likes Received:
    216
    Best Answers:
    0
    Trophy Points:
    230
    #8
    Actually I read about their policy from their Google blog on Dup Content. From what I read, if the content is written in many languages, than Google does not consider it dup content. Weird I know. They listed other criteria as well for dup content as well on the Google blog.
     
    lpstong, Feb 6, 2007 IP
  9. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #9
    1. Corollary: If they aren't indexed, they certainly won't rank for anything.

    2. Duplicate content is about NOT indexing duplicate pages, so if his site is "automated" (presumably meaning that it consists of feeds from other sources) and he has 4300 pages showing in Google, that would seem to imply 4300 duplicate pages missed by Google's duplicate content filters.

    That said, I do believe Google is getting better at finding and eliminating duplicate content.
     
    minstrel, Feb 6, 2007 IP
    jdR!pper likes this.
  10. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Here is some stufff on Duplicate Content

    SNIPPET

     
    thegypsy, Feb 6, 2007 IP
  11. toomm

    toomm Guest

    Messages:
    80
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #11
    What about different sites promoting the same product with the same description from the supplier. In such case you can hardly avoid duplicate content issues.
     
    toomm, Feb 6, 2007 IP
  12. lpstong

    lpstong Notable Member

    Messages:
    3,292
    Likes Received:
    216
    Best Answers:
    0
    Trophy Points:
    230
    #12
    Well if you want exact details than read the Official Google Webmaster Central Blog - http://googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html

    Reading through the blog will answer most of the questions you may have. And at the bottom of the page they have links to more concerns and questions you may have. It is pretty clear on the blog. All their links are Google blogs on just about everything.

    Here is something else on Dup Content written at Alaxandra -

     
    lpstong, Feb 6, 2007 IP
  13. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Then generally the authority site wins .... the rest are filtered down the results.

    We ALWAYS encourage clients (where possible) to use their own unique descriptions... if it's a feed.. well that can be a drag...




    Dood.. that useless piece of TRIPE was why I wrote my article.... as with many things Google, it doesn't tell U enough....
    READ the entire piece I posted... it is FAR more detailed
     
    thegypsy, Feb 6, 2007 IP
  14. lpstong

    lpstong Notable Member

    Messages:
    3,292
    Likes Received:
    216
    Best Answers:
    0
    Trophy Points:
    230
    #14
    Ok last time I checked I am not a dude.

    Second - I am considering straight from the source. Official Google Webmaster Central Blog - http://googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html

    Thirdly - Cheers to you for giving your clients a much more detailed plan. I do realize many many be confused or lost in what Google means by duplicate content.
     
    lpstong, Feb 6, 2007 IP
  15. integrity

    integrity Well-Known Member

    Messages:
    1,999
    Likes Received:
    124
    Best Answers:
    0
    Trophy Points:
    180
    #15
    Scary stuff. It will probably be to incorrect to work right.
     
    integrity, Feb 6, 2007 IP
  16. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Well Matt perfected (and tought Adam) the fine art of Google-Speak Circle Talk. So, unfortunately they are not always the best advice... it's partial pictures....

    I went through Matt's blog and other sources compiling it... G just doesn't fill in all the blanks .... another reason I am a Patent Hound... it's something to hold onto at least... he he

    .... and I use dOOd all the time.. U should put yer Beautiful Mug as an Avatar.. it would brighten up the place

    I'd even sing 4 U - [​IMG]
     
    thegypsy, Feb 6, 2007 IP
  17. lpstong

    lpstong Notable Member

    Messages:
    3,292
    Likes Received:
    216
    Best Answers:
    0
    Trophy Points:
    230
    #17
    Ok well all things considered - as soon as I am out of my Avatar contract, I just might possibly flash you a pic. Because things are getting a little hot and heated.

    You will sing 4 me - [​IMG] :eek: . You will have me in your hands like beautiful putty to work with and mold. I will follow Master of all lol - wink, wink
     
    lpstong, Feb 6, 2007 IP
  18. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #18
    hee hee

    Today U have touched, more than my SEO passion
    Woeful is me, selling Avatars is the fashion
    Maybe some day our pixels might meet
    Sweeter than chocolate, my what a treat
    Until that day, these memories shall not go
    ..but enough of this now, it’s back to S-E-O

    [​IMG]

    weeeeeeeeeeeeeeee
     
    thegypsy, Feb 6, 2007 IP
    lpstong likes this.
  19. lkj

    lkj Peon

    Messages:
    729
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    0
    #19
    so think twice about getting content via feeds ;)
     
    lkj, Feb 6, 2007 IP
  20. saneinsight

    saneinsight Guest

    Messages:
    159
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #20
    dude you need a spidey suit!

    As for duplicate content... like product descriptions, I need to provide prescription drug information in the interests of consumer safety. The info is published on an Australian government website (it's copyright free), so how do you think google will react if I use that content on my site?
     
    saneinsight, Feb 6, 2007 IP