Someone auto-scraping my content. Any solution?

Discussion in 'Site & Server Administration' started by Istvan, Nov 12, 2009.

  1. #1
    Hi DPers,

    Today I have found that one site is scraping my content and reusing it on his website. From what I have seen he use file_get_contents to parse the pages and a javascript to replace my url with his internal urls.

    Is there a way through .htaccess or robots I can avoid that domain to parse my content?

    Thanks
     
    Istvan, Nov 12, 2009 IP
  2. tolra

    tolra Active Member

    Messages:
    515
    Likes Received:
    36
    Best Answers:
    1
    Trophy Points:
    80
    #2
    I assume he's always using the same IP to scrape from, therefore in your .htaccess file:

    <Limit GET PUT POST>
    order allow,deny
    deny from 1.2.3.4
    allow from all
    </Limit>

    Change 1.2.3.4 to his IP address that should block him from your site at least until he changes the IP he uses to scrape.
     
    tolra, Nov 12, 2009 IP
  3. Bohra

    Bohra Prominent Member

    Messages:
    12,573
    Likes Received:
    537
    Best Answers:
    0
    Trophy Points:
    310
    #3
    Just find out the hosts ip he is using and block it that way his server cant connect to yours
     
    Bohra, Nov 13, 2009 IP
    Istvan likes this.
  4. Istvan

    Istvan Well-Known Member

    Messages:
    1,544
    Likes Received:
    43
    Best Answers:
    0
    Trophy Points:
    175
    #4
    Thanks, I have blocked the server ip in htaccess and seems ok now.
     
    Istvan, Nov 14, 2009 IP
  5. ravee1981

    ravee1981 Active Member

    Messages:
    712
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    60
    #5
    i have that code on all my sites with the ip series of all popular datacenters. These scrapings come only from servers and not from a machine on a home pc.

    You can also write to the abuse email of the domain, usually found in the whois and inform about the scraping. either the content will be removed or the domain will get banned
     
    ravee1981, Nov 16, 2009 IP