I guess this kind of stuff has been going on for a while. But it has justed started happening to me. A scraper site looks like they are scraping from my blog feed. My posts in their entirety show up on their site not long after I post and it looks automated. I even put up one post that talks about them being a scraper site and a short time later it appeared on their site! No response when I emailed them so I filed a DMCA copyright infringement notice with their (U.S. based) webhost today. No response yet from the webhost, still early. Aside from that I guess I could have some fun with the photos they are hotlinking, like substituting a goatse for some of them. But I guess I'll wait until the webhost has a chance to do something, assuming they do. I don't want them to puke on their keyboard when they check on the copyrighted material. I have seen this automated scraping a couple other places recently and it has been done very well, with the scraped content integrated very neatly. How do they do that?
Hell, it makes my day when folks scrape my feeds. I usually have 2-3 internal links in each fed post... heh.. easy way to get relevant backlinks. If a scraper site outranks you then you're not ranking anyways.
it happens to the best of us... spend more time on your site and less worrying about those copying... just my 2 cents as for how they are doing it ? its easy cause you probably supply a feed ?
The best things you can do is try to use it to your advantage. Create deep links to other pages on your site (use absolute urls). Also you may wish to only include the first 300 words of your post to avoid them getting the full content. At the end of the 300 word snippet have something like "Read the rest of this post at mysite.com"
RSS configurationlooks to be the case here. If you have a feed, control it as to how much of the article gets shown. This way people come to your website to view the entire article. They have a legit defence in that you are providing a feed to which they subscribe to. Is there something we are missing in all this?
I'm sorry to hear that. It's disgusting... If you could pop their US hosting account, they will probably go to bullet-proof hosting outside US like China or Russia. I suggest that you detect their IPs and ban them. If you hit the right IP, they will disappear. Try checking your server logs. Look for request that comes between fixed interval. If you can't find their IP, update your blog and wait for their bot to come. Check their site second-by-second. Check the time when their content got updated, and check your server log. (Be smart!) The last thing you could is report to their advertisers about this. Look for Adsense, get the pub-id and email Google immediately. Just email to all the advertisers.
This does not make sense if the submitter was giving the full article as a rss feed. Anyone who subscribed to that feed would have had the entire article. If I had a website which subscribed to a few relevant blogs and one of them configured their feed to show the full article, any retaliation like DMCA, or contacting google, ip banning etc is ridiculous. There are two things to do, if the feed is giving the entire article, change it to only give a bit, enough to interest people to go to the website to read the entire article. Secondly, if this is a such a big issue, remove the feed.
Good comments, thanks all. @mad4, my posts do have deep links so that is working for me. But I am bit concerned about dup content penalties. I took your advice and snipped the feed. @shoemoney, yeah, I don't want to get distracted too much by this. Just don't want to get hurt by it. And what I was asking is what is the technique to use the feed and integrate the posts so seamlessly. They really do a nice job scraping several sites and intergrating them into one all automatically it seems. @claudek, yes I think you are missing the fact that a published feed is not a license to republish copyrighted content in its entirety without permission. @dcristo, I thought about turning the feed off but I think I have a lot of subscribers (problem is I'm crappy at this and don't really know if I do have many subscribers). @brandnewx, that's an interesting idea. I have been digging around in my logs a lot lately for other reason (banning some other IPs that were doing some abusive sort of stuff). Seems like it might be difficult (for me) to do this. Also, the scraper site is a porn site, no Adsense on it, a lot of porn advertisers, guessing I won't get a response from them.
Okay, so that must really suck. But look at the bright side.. For a fact; with your description it can only seem to me that they're cronning your feed. Translating to the bright side. You could show them whose the boss by playing around a bit. For example, if you can find out what they're hosting platform is(Windows, Linux..etc..) you could do some damage to their site, teaching them to take a good look before they steal content . Well, these are my 2 cents. J/K don't do that, wait for the host's response. I just like to take out some evil ideas from from time to time .
I use feedburner too. It's great. It has some great stats and its strangely free. They also have some lovely add-ons. It's amust, IMO, for every blogger.. If you wanna take a look at how one would look, take a look at mine http://feeds.feedburner.com/Damnz.
I agree. It's ridiculous to ban IPs because you give full content via RSS. As RSS stands for Really Simple Syndication and you publish full vis RSS, everyone has right to scrape and duplicate. After all, it's like you're saying "hey! come here and syndicate my blogs" geomark, if you do publish the whole content via RSS, configure the script to publish only headline or small portion of the blogs.
Yeah - I am kinda missing something here - half of my links back to my blog are from scraped content! I love it.
It's the potential for dup content penalty (real or imaginary I don't know) that is the concern. But as you all so clearly pointed out it's my bad for not realizing my feed was publishing the full article (never even checked before). I snipped it so now I'm on board and saying come on and scrape (my summary feed with deep links).
Better remove the rss feed in your site if you don't other scrap your site.... that is the feature of the rss feeds to get more back links from other sites who want to post your rss feeds. You can also adjust your config as other says so that your entire article will not display in your feeds.