Hi, Lets say there exists a software for manipulating existing text from the net. A scraper that scrapes 10 000 pages about one topic. Lets say there is a software and algorithm that rewrites text automatically to make it unique from its original source. The software automatically groups text and places text in categories so you get a finished homepage with say 3000 pages of content about a specific subject. The difference from content generators is that text produced makes sence to anyone reading it but might not be perfect gramatically or logically all the time but is relatively good. So the big question we have wondered about is how Google will react to this. Because the text quality is relatively good to humans we asume Google wont notice text is machine written. Google will not see the text as duplicate content since it is rewritten to a slight degree. Lets say a site grows from 100 pages to 3000 in 3 months.
Whay do people want to do stuff like this, all with the aim of making money by copying other peoples work? Why not just create a decent website youself and write the articles and pages yourself? If you website is just a big mish mash of automatically rewritten articles and pages from the web, it wont make much sense to users and I am sure they wont visit your site again.
I am not saying I would do this. I am merely curious what would happen. Lets say there is a system of generating a 3000 page site every day with good content. Could you destroy Googles technology then? Because how would they know your text is machine made if it is relatively good quality wise?
I'd hope that they would but if not it will just be a matter of time before they can detect such crap and remove it from the index. I totally agree with pets4homes.
How can they remove it from the index if the formula for generating text is random and one-two new sites pops up and grows naturally all around the world every day. Do they really have an office full of guys who check a site and decide if it is man made or machine made? what if the writer on a site is bad in english and he writes really bad language, will they delete that site and claim it is machine made? And how can they detect a site that grows naturally say anyware from 1 to 100 pages a day ramdomly and at a random time in a timeframe (say 8.00 to 18.00).
there is not just one software who does this, but the generated sites don't live much because: 1) google has good text analizers and they can eventually detect generated or poorly written texts 2) google can count clicks from their serps and they can also use their toolbar to monitor how useful a site is for a user(lets say you manage to get top10 with your generated sites, after entering, people will close the pages asap because of the nonsense) 3) some people will report those sites
I would worry more about the author of the original work. In most countries this would be considered copyright infringement. Simply running text through some conversion is not enough to get around copyright. Since the "program" supposedly would have very good text that suggests to me that only words are changed. That would make it relatively easy for an author to figure out where it came from if they saw their own work. Forget Google deindexing the site, what about the potential hundreds of thousands in fines and judgement findings (per article)?
Your kidding right? Try doing some research into 'Black Hat' bubba.... the content generators using Markov get the best results for near usable content.... there are many, many applications out there for this M8 1. Register 1000s of domain 2. Grab a server 3. Auto-site / content generation running creating new domains 24/7 4. Link spamming applications spamming 24/7 5. Get ranked and monetize 6. Google Bans site(s) 7. Rinse and repeat .. dats de basics dooooood .... folks make a living that way. It is going on all around you as we speak.... yeaaaaagghhhhhh Welcome to the world of Web Spam... whatcha think Matt Cutts does all day??? bwaaaa ha h aha haha.... the PR Man for Google search? He is the head of the Search Quality team that deals with Web Spam ...... Film at 11
But since it isn't the original article, how can you be fined for copyright infringement? It's not the same article anymore.... Just curious?
The will find it, becase, there is going to be some kind of pattern left and google try to find them. Again, the issue is how long do they take? -- if you are creating sites like that and sell them, you are going to get away with that. ------ And if the new owner add his personal stuff, that site may get out. But for sure you will be caught if you only create content off of other sites.
In the US the term would be derivative work. Other countries use terms like "right of modification" and the like, but it is all the same. The copyright holder has the right to create derivatives of the original. The programs that do this type of thing usually just use a thesaurus to change as many words as possible. Think of it like this, while you're at work I paint your house a different color and change out the trim. It must be my house now because it is a different building, right? Of course not. Some programs will actually change some of the structure of the original document (move paragraphs and sentences around). These are even worse programs. The more you move stuff around, the less sense the article will make. Think of this as a collage. Do you think that if you made a collage from pictures by some other photographer that he wouldn't come after you?
I see what your saying, but I still cannot see how it's copyright infringement. If the articles are indeed "unique" is it not a new one? Do you know of any cases this has actually been taken to court?
From the US copyright office: Uniqueness is not a factor of copyright. Two things can be very similar but be found not to be infringing. Two items that are substantially different may have a host of violations. Since the rewritten article was not created independently it is a derivitave work. If I were to write a story that only had Lord of the Rings characters and places it would be unique, but still a copyright violation as it would be a derivative work as well.
Well from your quote.. all I am seeing is that you cannot claim copyright to a derivitave work? Anyways, thanks It doesn't really matter to me much lol.
The quote explains that you cannot claim copyright because the author of the original work holds the copyright which includes the right to make derivative works. If you create a derivative work of somebody else's material without their consent, then that is a copyright violation. Read circular 14 from the Copyright Office.