Scraper sites scraping my blog feed

Discussion in 'Legal Issues' started by geomark, Sep 2, 2006.

  1. #1
    I guess this kind of stuff has been going on for a while. But it has justed started happening to me. A scraper site looks like they are scraping from my blog feed. My posts in their entirety show up on their site not long after I post and it looks automated. I even put up one post that talks about them being a scraper site and a short time later it appeared on their site!

    No response when I emailed them so I filed a DMCA copyright infringement notice with their (U.S. based) webhost today. No response yet from the webhost, still early. Aside from that I guess I could have some fun with the photos they are hotlinking, like substituting a goatse for some of them. But I guess I'll wait until the webhost has a chance to do something, assuming they do. I don't want them to puke on their keyboard when they check on the copyrighted material.

    I have seen this automated scraping a couple other places recently and it has been done very well, with the scraped content integrated very neatly. How do they do that?
     
    geomark, Sep 2, 2006 IP
  2. BRUm

    BRUm Well-Known Member

    Messages:
    3,086
    Likes Received:
    61
    Best Answers:
    1
    Trophy Points:
    100
    #2
    Wish I could help but I'm new to blogs and everything with it. I hope you get it sorted :)

    Lee.
     
    BRUm, Sep 2, 2006 IP
  3. mdvaldosta

    mdvaldosta Peon

    Messages:
    4,079
    Likes Received:
    362
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Hell, it makes my day when folks scrape my feeds. I usually have 2-3 internal links in each fed post... heh.. easy way to get relevant backlinks. If a scraper site outranks you then you're not ranking anyways.
     
    mdvaldosta, Sep 2, 2006 IP
  4. BlueDevilMedia

    BlueDevilMedia Well-Known Member

    Messages:
    1,917
    Likes Received:
    87
    Best Answers:
    0
    Trophy Points:
    190
    #4
    Possibly through RSS feeds...
     
    BlueDevilMedia, Sep 2, 2006 IP
  5. Shoemoney

    Shoemoney $

    Messages:
    4,474
    Likes Received:
    588
    Best Answers:
    0
    Trophy Points:
    295
    #5
    it happens to the best of us... spend more time on your site and less worrying about those copying... just my 2 cents

    as for how they are doing it ? its easy cause you probably supply a feed ?
     
    Shoemoney, Sep 2, 2006 IP
  6. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #6
    The best things you can do is try to use it to your advantage. Create deep links to other pages on your site (use absolute urls).

    Also you may wish to only include the first 300 words of your post to avoid them getting the full content.

    At the end of the 300 word snippet have something like "Read the rest of this post at mysite.com"
     
    mad4, Sep 2, 2006 IP
  7. dcristo

    dcristo Illustrious Member

    Messages:
    19,776
    Likes Received:
    1,200
    Best Answers:
    7
    Trophy Points:
    470
    Articles:
    7
    #7
    Dont supply feeds if you don't wanna be scraped.
     
    dcristo, Sep 2, 2006 IP
  8. Claudek

    Claudek Well-Known Member

    Messages:
    1,379
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    165
    #8
    RSS configurationlooks to be the case here. If you have a feed, control it as to how much of the article gets shown. This way people come to your website to view the entire article.

    They have a legit defence in that you are providing a feed to which they subscribe to. Is there something we are missing in all this?
     
    Claudek, Sep 2, 2006 IP
  9. brandnewx

    brandnewx Peon

    Messages:
    988
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I'm sorry to hear that. It's disgusting...

    If you could pop their US hosting account, they will probably go to bullet-proof hosting outside US like China or Russia.

    I suggest that you detect their IPs and ban them. If you hit the right IP, they will disappear. Try checking your server logs. Look for request that comes between fixed interval.

    If you can't find their IP, update your blog and wait for their bot to come. Check their site second-by-second. Check the time when their content got updated, and check your server log. (Be smart!)

    The last thing you could is report to their advertisers about this. Look for Adsense, get the pub-id and email Google immediately. Just email to all the advertisers.
     
    brandnewx, Sep 2, 2006 IP
  10. BlueDevilMedia

    BlueDevilMedia Well-Known Member

    Messages:
    1,917
    Likes Received:
    87
    Best Answers:
    0
    Trophy Points:
    190
    #10
    My thoughts exactly...add links in your articles pointing back to your site.
     
    BlueDevilMedia, Sep 2, 2006 IP
  11. Claudek

    Claudek Well-Known Member

    Messages:
    1,379
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    165
    #11
    This does not make sense if the submitter was giving the full article as a rss feed. Anyone who subscribed to that feed would have had the entire article.
    If I had a website which subscribed to a few relevant blogs and one of them configured their feed to show the full article, any retaliation like DMCA, or contacting google, ip banning etc is ridiculous.

    There are two things to do, if the feed is giving the entire article, change it to only give a bit, enough to interest people to go to the website to read the entire article. Secondly, if this is a such a big issue, remove the feed.

     
    Claudek, Sep 2, 2006 IP
  12. geomark

    geomark Peon

    Messages:
    924
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Good comments, thanks all.

    @mad4, my posts do have deep links so that is working for me. But I am bit concerned about dup content penalties. I took your advice and snipped the feed.

    @shoemoney, yeah, I don't want to get distracted too much by this. Just don't want to get hurt by it. And what I was asking is what is the technique to use the feed and integrate the posts so seamlessly. They really do a nice job scraping several sites and intergrating them into one all automatically it seems.

    @claudek, yes I think you are missing the fact that a published feed is not a license to republish copyrighted content in its entirety without permission.

    @dcristo, I thought about turning the feed off but I think I have a lot of subscribers (problem is I'm crappy at this and don't really know if I do have many subscribers).

    @brandnewx, that's an interesting idea. I have been digging around in my logs a lot lately for other reason (banning some other IPs that were doing some abusive sort of stuff). Seems like it might be difficult (for me) to do this. Also, the scraper site is a porn site, no Adsense on it, a lot of porn advertisers, guessing I won't get a response from them.
     
    geomark, Sep 2, 2006 IP
  13. [*-AnOnYmOuS-*]

    [*-AnOnYmOuS-*] Active Member

    Messages:
    253
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    58
    #13
    Okay, so that must really suck. But look at the bright side..

    For a fact; with your description it can only seem to me that they're cronning your feed. Translating to the bright side. You could show them whose the boss by playing around a bit. For example, if you can find out what they're hosting platform is(Windows, Linux..etc..) you could do some damage to their site, teaching them to take a good look before they steal content :D.

    Well, these are my 2 cents. J/K don't do that, wait for the host's response. I just like to take out some evil ideas from from time to time :rolleyes: :D.
     
    [*-AnOnYmOuS-*], Sep 2, 2006 IP
  14. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Use feedburner. Its free and tracks subscribers for you.
     
    mad4, Sep 2, 2006 IP
  15. [*-AnOnYmOuS-*]

    [*-AnOnYmOuS-*] Active Member

    Messages:
    253
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    58
    #15
    I use feedburner too. It's great. It has some great stats and its strangely free. They also have some lovely add-ons. It's amust, IMO, for every blogger..:cool: If you wanna take a look at how one would look, take a look at mine http://feeds.feedburner.com/Damnz.
     
    [*-AnOnYmOuS-*], Sep 2, 2006 IP
  16. brandnewx

    brandnewx Peon

    Messages:
    988
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    0
    #16
    I agree. It's ridiculous to ban IPs because you give full content via RSS. As RSS stands for Really Simple Syndication and you publish full vis RSS, everyone has right to scrape and duplicate. After all, it's like you're saying "hey! come here and syndicate my blogs"

    geomark, if you do publish the whole content via RSS, configure the script to publish only headline or small portion of the blogs.
     
    brandnewx, Sep 2, 2006 IP
  17. Phynder

    Phynder Well-Known Member

    Messages:
    2,603
    Likes Received:
    145
    Best Answers:
    0
    Trophy Points:
    178
    #17
    Yeah - I am kinda missing something here - half of my links back to my blog are from scraped content! I love it.
     
    Phynder, Sep 2, 2006 IP
  18. geomark

    geomark Peon

    Messages:
    924
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #18
    It's the potential for dup content penalty (real or imaginary I don't know) that is the concern. But as you all so clearly pointed out it's my bad for not realizing my feed was publishing the full article (never even checked before). I snipped it so now I'm on board and saying come on and scrape (my summary feed with deep links).
     
    geomark, Sep 2, 2006 IP
  19. PinoyIto

    PinoyIto Notable Member

    Messages:
    5,863
    Likes Received:
    170
    Best Answers:
    0
    Trophy Points:
    260
    #19
    Better remove the rss feed in your site if you don't other scrap your site.... that is the feature of the rss feeds to get more back links from other sites who want to post your rss feeds. You can also adjust your config as other says so that your entire article will not display in your feeds.
     
    PinoyIto, Sep 2, 2006 IP
  20. Ballz

    Ballz Well-Known Member

    Messages:
    649
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    125
    #20
    as mentioned earlier.. create a partial feed... and limit it to a few posts
     
    Ballz, Sep 3, 2006 IP