I run an extreme sports blog and update it on a daily basis. Recently it has been doing quite well in the search engines but with this has come a lot of websites who are stealing my content and using it on their own. I have added a notice on my blog which says that all of my work is copyright protected and that if people wish to use my content they should ask for permission beforehand. This still hasn't stopped people, and I am finding at least ten sites a day stealing my content and posting it on their sites - sometimes this happens within a few hours of me posting and this really worries me especially due to the Panda update. Some of the sites I have emailed refuse to remove the content as they say it is part of a 'creative commons license' but why should I have people stealing my content when I put so much effort into creating it? Any ideas?
As far as I know there is no way to stop people from stealing content from your site and although it is extremely annoying and sometimes detremental, you have to just look at it as copying is the greatest form of flattery. One thing you can do is to make your content unique to you as a person or to your site so that would make the content harder to copy. They would essentially have to completely rewrite the content (boring and takes time) not so many people may be up for that. Aso try an official complaint to Google
Thanks Sebastian but most of the sites stealing my content are autoblogs so I don't think it's flattery it's more an easy way for them to get rich quick whilst using my content. Although my content is all unique I am very worried about whether Google will see this in the same way. I have heard (I don't know how true it is) that Google will favour the first site to list the material but if it is stolen within a few hours they are unable to know who published it first. If this is true, they are not only stealing my work but they can potentially harm the site I am working so hard on. So far I have been contacting all the blogs and asking them to remove my content (most ignore me) and I am also emailing the hosts of every blog on a daily basis. This is becoming increasingly time consuming though and I wondered if there was anything else I could do.
Request the other site links to your site or remove the content. Failing that you can submit a DMCA takedown notice to Google. Edit: If its autoblogs scraping content, make sure you only publish excerpts of your RSS feed and not the full article. That way you don't have to worry about duplicate content and any traffic they send is a bonus.
Some of the sites have been linking back to me but I'm still not happy about it. I have just had an email back from the list of WordPress hosted blogs that I complained about though and this is what they said: Hi there, Thanks for getting in touch! The above sites are certainly pure, 100% spam, so I have removed them entirely from WordPress.com. Regards Anthony So at least that's a result! I'll also do what you mentioned dcristo and check out and edit my RSS feed - I didn't know I could do that. Thanks for the help!
Autoblogs are crap and rarely outrank quality sites I wouldn't worry about it too much. In wordpress admin: Settings > Reading > For each article in a feed (set to summary)
There's really no way to stop people from stealing your content, particularly auto-blogs but you can report these sites (as you just did) and they will be taken off. As for actual people posting your work without reference to you and your site, you will just have to leave it be. You can write to the blog/site owner and tell them to take it off or at least reference your site but if they don't, there're not much else you can do. Search engines will look at which site was indexed first with that content so if you publish it first then other sites copy it, you won't get penalized for that. That's what I've read so far.
I hope you're right. It just makes me so angry that people think they are entitled to steal things and portray them as their own. I've noticed a lot of the autoblogs are using hosts which are hard to detect which is making it hard to contact them or their host but I'm not going to let it lie. As for usual sites, in a way I find that even worse. These people have actually taken the time to visit the site, see the clear copyright notices and still steal it. I'm sorry but there's no way I'm just leaving it alone, they have no right to do this and I am taking it further on every occasion.
I understand how frustrating it is. Particularly because I know how hard it is to research for and write an original article. You can continue to write to these blog owners and tell them to reference your site when they are using your articles. The following article on how Google deals with duplicate content can help. There is a link at the bottom where you can file a DMCA request to get those sites shut down.
Thank you I've been filing DMCA requests but I'll check the article out too. It's still happening regularly and it's driving me nuts. Does anyone actually know how autoblogs make money? These are the main sites stealing my content.
You can check with copyscape, it would helps to find links and mail to them to stop copy content from you. It they did't listened you can report to DMCA.
Autoblogs make money through Ads. Some website owners make money off autoblogs by setting them up then funnel the traffic from the autoblogs to their own sites.
it is still possible for a PHP programmer to retrieve the full article with curl and some regex. There is no way you can stop people from stealing your content. Leave it upto Google to penalize them for stealing.
If they're stealing your content (especially autoblogs) you can use them as a link building tool. In every blog post you write, make sure to link to another post on YOUR website. That way, when they scrape your content, they'll be giving a link to your site too. Most scrapers - auto or manual - don't care enough to remove internal links because they're just going for speed & quantity. Use it against them. Also, you can try the RSS Footer plugin (for Wordpress) to insert a keyword rich backlink into RSS scraper sites. Change the anchor text every month to diversify your backlink anchor text.
There are various things you could do but the problem with all of them is that you might end up shooting yourself in your foot and block genuine users as well. One thing that you can do to make it harder though is to constantly change your layout and html structure. Some scrapers will break even if you only put a space somewhere or add a class attribute even if it is empty. You can add a few spans here and there (they shouldn't alter the layout but mess up scarpers), change class names etc.
You could do a whois on the domain name and then use htaccess to ban the sites IP. http://www.clockwatchers.com/htaccess_block.html I'd also recommend inserting ads into your RSS feeds. Like, right above your content or in the middle of your content.
I actually have created a plugin for wordpress which makes it difficult for scrapers. Will extend it and improve it soon. http://forums.digitalpoint.com/showthread.php?t=2345946