They are slimeballs! Lazy slimeballs. The worst kind of slimeball. Well, almost. Slimeball scammers are THE worst. They are second from the bottom rung of the ladder.
The wordpress option to only show parts of an article do not work. As whatever they are using to steal my articles is following the article title of the feed then ripping the entire content. I've e-mailed them about it but they stole another article today. Time for a 'cease' letter to the owner, from my lawyer friend, Joe King.
...but doesn't google say make sites for visitors and not for search engines? Anyway on a serious note to the OP, your best choice would be to copyright your content. Then if anyone rips it, you can contact the webmaster to take your content off, if they don't you should report the site to the search engines (google/yahoo/msn) and they will remove the offending site from their index.
Yep, Google does indeed say that. Visitors wouldn't really like a site that takes 2 years to load though, would they? Thanks for your advice though, I think that's what I'll do. I'll report them to Google and wait a couple of weeks and hopefully I'll be the one laughing
And waiting forever for a giant image to load isn't annoying? Losing any chance of ever ranking in the SEs isn't psychotic? Trust me, your suggestion was psychotic...
^ Dude it wasn't me who suggested it. If you run a huge site and publish articles everyday, it would be dumb to publish it on images. Rankings are psychotic, and people who obsess on them are psychotic as well. People don't flock to your site because of your ranking, it's because of the content of your site. But that another topic anyway.
Google may take them out, but it won't matter one bit if that site doesn't have a high listing. You say this site is pretty big? What is the url? Odds are that practically all sites that steal content don't have enough visitors to fill a walk-in closet. If you have tried everything and he still won't take it down, don't sweat it. The odds are that very few will be reading it anway. It's usually the small, struggling sites that resort to that. Who cares? I can have one of my articles stolen by, say, 50 rinkey dink sites over time, and I am not going to let it bother me. Very few in each site wil read them. The Internet is so fucking huge with so many sites. And, many people don't mind reading duplicate content anyway. It wouldn't bother me all that much. Besides, I post my url a couple of times, along with a short sales pitch to go to my site to read more in a lot of my articles. When they copy and paste it, they are giving me free advertisng. And of the articles I don't have it in, I can, probably, get the "scum" to put my url in them. You can't fight them, so you might as well get your url in the content. Don't sweat the small stuff, brother. You'll go fucking nuts overtime.
Try putting this into your .htaccess, prevent bots of the common sort: RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus RewriteRule ^.* - [F,L] Code (markup):
It is very easy to steal content via script, especially when you publish with things like wordpress. Basically the content is always inside the same html tags, so people build a parser which will look at your rss feed, go to the full article url and then take your content and stuff it in a database. Everything is automated, so it is little effort on the part of the people who steal content, and of course they only need to code one engine to do this and adapt is very slightly for new sites they are stealing from. Anyone who has an intermediate knowledge of PHP can make such a thing. Of course it can be coded to strip links, strip keywords such as any reference to your site - again automatically. It is a real problem and set to get worse. To diversify this topic a little, I think the reason it will continue to get worse is simple economics. As more and more companies advertise on the internet, the cost of doing so will grow due to more people bidding for the same keywords and search terms. Thus if the advertiser pays more, then the person displaying the advertisment recieves more money as the cut is a percentage of what the original advertiser pays. Therefore the profits to be made from content will continue to rise if you look at it from a profit per click point of view. Some people have zero scruples and if that person lives in a country where it is hard to enforce any kind of copyright law then you have a major problem. Blocking IP ranges and subnets is not a soloution, simply as the folks doing this can go to a free host site that has its servers in the USA or europe and run the scripts from there. Publishing as an image is certainly a way around it, however you do have some legal implications regarding the Disibility Discrimination Act - depending on the nature of your site and the country you live in. Basically most westen countries have laws which state you must make your site accesible to those who need to use text readers, magnifiers and the like. While these laws are not heavily enforced, probably due to the sheer volume of content on the internet, I personally believe in access for all and thus would avoid the image route. This article shows that people can and will take you to court over this. I think that the problem will grow to be so sufficent that technolgy will have to be adapted or created to combat this. Things such as encrypted source code so bots can not parse your site, tools to allow you to login to a search engine interface, submit your URL and tell the search engine that you do not wish to allow duplicate copies, or list where duplicate copies are permitted. Of course people will then use automated rewrites which use a list of replacemement words are in heavy use even today. This will also become more popular. However if the ability to automate the inital theft is removed things will get much better.
I posted this link in reply to another topic here, but its applicable to this one also. http://googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html Of most interest is.. Don't fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it's highly unlikely that such sites can negatively impact your site's presence in Google. If you do spot a case that's particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.