I did a duplicate content search on a popular article of mine that hit the front page of Digg. As it turns out, a copied stole it a few months after I had written it, posted it on his blog, and then submitted it to Reddit.com! To my dislike, the story,my story, made the home page! I know it's not up to news sites like Reddit and Digg to be the copyright police, but by linking to a blog stealing my articles on their home page, they are assisting in copyright infringement. I wrote an article in my blog about it: http://www.pcfastlane.com/rambles-raves/web-20-news-sites-must-do-something-about-plagiarism/ It would be so easy for sites like Reddit and Digg to scan Google for duplicate content and not allow duplicate stories to be submitted it's not even funny. The sites could even have a link to flag stolen stories! This is pure negligence on the sites' part. What do you think?
You're right, there is a lot of copied content on those sites. Anyone who regularly reads the threads on Digg or Reddit has seen it a hundred times: "this story is a dupe and blogspam, here's the orignal article: www...." The problem is that not enough people check for duplicate content to recognize it right away. Digg and Reddit admins could do that, but it seems like it would take a lot of time each day. There should be a copyright infringement flag for each story, and if something like 3 or more people click it an admin is notified.
How are they going know it violates copyright? Don't some people upload articles to websites HOPING they get republished?
Well they couldn't do anything about the actual copyright violation, but any duplicate content on those sites is bad and should be eliminated (IMO). The current system basically just allows users to vote down duplicate content. I would like to see that content get removed entirely, especially in cases where the copyright violation is obvious, such as when someone copies a list from cracked.com and posts it in their blog.
It would be very easy to to implement a system that deterred content thieves. These are the two ideas I published in my article: 1. Use a system like Copyscape that searches for duplicates using Google. If a duplicate were found in the search results, the system could simply notify an admin to look at the story and make sure the source was original. It's generally pretty easy to spot out content thieves. Most Reddit submissions should be original content, so this wouldn't put a huge strain on the admins' time and would be very easy to implement. What would be even better is if the system took a random sentence from the article and Googled it. If the link submitted was not the first result, which is the one Google considers to be the original source, then it would not allow the submission to take place. Naturally, there are cases where webmasters have articles republished with permission. That's why an admin would be able to look a submission over before approving it. The system would also cut down on spam drastically, since many spam sites use duplicate content. 2. A simple button for users to report stolen content as others have suggested.