Will Google notice automatic rewritten text?

Discussion in 'Google' started by tomten, Sep 10, 2007.

  1. #1
    Hi,

    Lets say there exists a software for manipulating existing text from the net. A scraper that scrapes 10 000 pages about one topic. Lets say there is a software and algorithm that rewrites text automatically to make it unique from its original source. The software automatically groups text and places text in categories so you get a finished homepage with say 3000 pages of content about a specific subject.

    The difference from content generators is that text produced makes sence to anyone reading it but might not be perfect gramatically or logically all the time but is relatively good.

    So the big question we have wondered about is how Google will react to this. Because the text quality is relatively good to humans we asume Google wont notice text is machine written. Google will not see the text as duplicate content since it is rewritten to a slight degree. Lets say a site grows from 100 pages to 3000 in 3 months.
     
    tomten, Sep 10, 2007 IP
  2. pets4homes

    pets4homes Peon

    Messages:
    129
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Whay do people want to do stuff like this, all with the aim of making money by copying other peoples work?

    Why not just create a decent website youself and write the articles and pages yourself?

    If you website is just a big mish mash of automatically rewritten articles and pages from the web, it wont make much sense to users and I am sure they wont visit your site again.
     
    pets4homes, Sep 10, 2007 IP
    dct likes this.
  3. tomten

    tomten Peon

    Messages:
    41
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I am not saying I would do this. I am merely curious what would happen.

    Lets say there is a system of generating a 3000 page site every day with good content. Could you destroy Googles technology then? Because how would they know your text is machine made if it is relatively good quality wise?
     
    tomten, Sep 10, 2007 IP
  4. jerome

    jerome Banned

    Messages:
    1,052
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I think yes...
     
    jerome, Sep 10, 2007 IP
  5. dct

    dct Finder of cool gadgets

    Messages:
    3,132
    Likes Received:
    328
    Best Answers:
    0
    Trophy Points:
    230
    #5
    I'd hope that they would but if not it will just be a matter of time before they can detect such crap and remove it from the index. I totally agree with pets4homes.
     
    dct, Sep 10, 2007 IP
  6. tomten

    tomten Peon

    Messages:
    41
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    How can they remove it from the index if the formula for generating text is random and one-two new sites pops up and grows naturally all around the world every day.

    Do they really have an office full of guys who check a site and decide if it is man made or machine made? what if the writer on a site is bad in english and he writes really bad language, will they delete that site and claim it is machine made?

    And how can they detect a site that grows naturally say anyware from 1 to 100 pages a day ramdomly and at a random time in a timeframe (say 8.00 to 18.00).
     
    tomten, Sep 10, 2007 IP
  7. donttrustthisposter

    donttrustthisposter Peon

    Messages:
    1,477
    Likes Received:
    91
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Where is your traffic going to come from? It certainly won't be organic.
     
    donttrustthisposter, Sep 10, 2007 IP
  8. flash902007

    flash902007 Banned

    Messages:
    750
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #8
    yes they will.
     
    flash902007, Sep 10, 2007 IP
  9. rehash

    rehash Well-Known Member

    Messages:
    1,502
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    150
    #9
    there is not just one software who does this, but the generated sites don't live much because:
    1) google has good text analizers and they can eventually detect generated or poorly written texts
    2) google can count clicks from their serps and they can also use their toolbar to monitor how useful a site is for a user(lets say you manage to get top10 with your generated sites, after entering, people will close the pages asap because of the nonsense)
    3) some people will report those sites
     
    rehash, Sep 10, 2007 IP
  10. bluegrass special

    bluegrass special Peon

    Messages:
    790
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    0
    #10
    I would worry more about the author of the original work. In most countries this would be considered copyright infringement. Simply running text through some conversion is not enough to get around copyright. Since the "program" supposedly would have very good text that suggests to me that only words are changed. That would make it relatively easy for an author to figure out where it came from if they saw their own work.

    Forget Google deindexing the site, what about the potential hundreds of thousands in fines and judgement findings (per article)?
     
    bluegrass special, Sep 10, 2007 IP
  11. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Your kidding right? Try doing some research into 'Black Hat' bubba.... the content generators using Markov get the best results for near usable content.... there are many, many applications out there for this M8

    1. Register 1000s of domain
    2. Grab a server
    3. Auto-site / content generation running creating new domains 24/7
    4. Link spamming applications spamming 24/7
    5. Get ranked and monetize
    6. Google Bans site(s)
    7. Rinse and repeat

    .. dats de basics dooooood .... folks make a living that way. It is going on all around you as we speak.... yeaaaaagghhhhhh

    Welcome to the world of Web Spam... whatcha think Matt Cutts does all day??? bwaaaa ha h aha haha.... the PR Man for Google search? He is the head of the Search Quality team that deals with Web Spam ......

    Film at 11
     
    thegypsy, Sep 10, 2007 IP
  12. FanAddict

    FanAddict Notable Member

    Messages:
    7,017
    Likes Received:
    376
    Best Answers:
    0
    Trophy Points:
    230
    #12

    But since it isn't the original article, how can you be fined for copyright infringement? It's not the same article anymore....

    Just curious? :)
     
    FanAddict, Sep 10, 2007 IP
  13. stock_post

    stock_post Prominent Member

    Messages:
    5,213
    Likes Received:
    249
    Best Answers:
    0
    Trophy Points:
    310
    #13
    The will find it, becase, there is going to be some kind of pattern left and google try to find them.

    Again, the issue is how long do they take?

    -- if you are creating sites like that and sell them, you are going to get away with that.
    ------ And if the new owner add his personal stuff, that site may get out.

    But for sure you will be caught if you only create content off of other sites.
     
    stock_post, Sep 10, 2007 IP
  14. oseymour

    oseymour Well-Known Member

    Messages:
    3,960
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    135
    #14
    You will get banned.....that's how they will react
     
    oseymour, Sep 10, 2007 IP
  15. bluegrass special

    bluegrass special Peon

    Messages:
    790
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    0
    #15

    In the US the term would be derivative work. Other countries use terms like "right of modification" and the like, but it is all the same. The copyright holder has the right to create derivatives of the original.

    The programs that do this type of thing usually just use a thesaurus to change as many words as possible. Think of it like this, while you're at work I paint your house a different color and change out the trim. It must be my house now because it is a different building, right? Of course not.

    Some programs will actually change some of the structure of the original document (move paragraphs and sentences around). These are even worse programs. The more you move stuff around, the less sense the article will make. Think of this as a collage. Do you think that if you made a collage from pictures by some other photographer that he wouldn't come after you?
     
    bluegrass special, Sep 10, 2007 IP
  16. FanAddict

    FanAddict Notable Member

    Messages:
    7,017
    Likes Received:
    376
    Best Answers:
    0
    Trophy Points:
    230
    #16
    I see what your saying, but I still cannot see how it's copyright infringement. If the articles are indeed "unique" is it not a new one?

    Do you know of any cases this has actually been taken to court?
     
    FanAddict, Sep 10, 2007 IP
  17. bluegrass special

    bluegrass special Peon

    Messages:
    790
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    0
    #17
    From the US copyright office:

    Uniqueness is not a factor of copyright. Two things can be very similar but be found not to be infringing. Two items that are substantially different may have a host of violations. Since the rewritten article was not created independently it is a derivitave work. If I were to write a story that only had Lord of the Rings characters and places it would be unique, but still a copyright violation as it would be a derivative work as well.
     
    bluegrass special, Sep 10, 2007 IP
  18. FanAddict

    FanAddict Notable Member

    Messages:
    7,017
    Likes Received:
    376
    Best Answers:
    0
    Trophy Points:
    230
    #18
    Well from your quote.. all I am seeing is that you cannot claim copyright to a derivitave work?

    Anyways, thanks :D It doesn't really matter to me much lol.
     
    FanAddict, Sep 10, 2007 IP
  19. bluegrass special

    bluegrass special Peon

    Messages:
    790
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    0
    #19
    The quote explains that you cannot claim copyright because the author of the original work holds the copyright which includes the right to make derivative works. If you create a derivative work of somebody else's material without their consent, then that is a copyright violation. Read circular 14 from the Copyright Office.
     
    bluegrass special, Sep 11, 2007 IP
  20. the-kids

    the-kids Active Member

    Messages:
    235
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    58
    #20
    No, they won't
    just short answer for the question..........
     
    the-kids, Sep 11, 2007 IP