Hi DPers, Today I have found that one site is scraping my content and reusing it on his website. From what I have seen he use file_get_contents to parse the pages and a javascript to replace my url with his internal urls. Is there a way through .htaccess or robots I can avoid that domain to parse my content? Thanks
I assume he's always using the same IP to scrape from, therefore in your .htaccess file: <Limit GET PUT POST> order allow,deny deny from 1.2.3.4 allow from all </Limit> Change 1.2.3.4 to his IP address that should block him from your site at least until he changes the IP he uses to scrape.
i have that code on all my sites with the ip series of all popular datacenters. These scrapings come only from servers and not from a machine on a home pc. You can also write to the abuse email of the domain, usually found in the whois and inform about the scraping. either the content will be removed or the domain will get banned