1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Somebody is scraping my site

Discussion in 'Legal Issues' started by CJnQA, Sep 17, 2009.

  1. rfhm

    rfhm Peon

    Messages:
    29
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #21
    Turning off your RSS is a good way to avoid autoblogs!
     
    rfhm, Sep 22, 2009 IP
  2. adbox

    adbox Well-Known Member

    Messages:
    906
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    155
    Digital Goods:
    1
    #22
    This is the best advice.

    Its bad ethics for autobloggers to use others rss without crediting the original source.

    And its sloppy to create non-niched autoblogs too, imo.

    Blogsense will save and host images. --Also an is option to credit the original source. There is also an option not to credit, because sometimes, when being fancy with building autoblogs, crediting is not appropriate.. like if your sourcing images from flickr you might not want to.

    But I believe if an ethical autoblogger pulls content from your site and credits, this can only help you in the end.
     
    adbox, Sep 22, 2009 IP
  3. c4gamerz

    c4gamerz Well-Known Member

    Messages:
    294
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    110
    #23
    yes it should work if you are not using any external services like feed burner etc
     
    c4gamerz, Sep 23, 2009 IP
  4. mcapodici

    mcapodici Well-Known Member

    Messages:
    228
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    120
    #24
    .. is to stuff your post with affiliate links that make you money. Also include an advert for your site linking to your home page in the post! You can grab traffic back from the copycat!
     
    mcapodici, Sep 23, 2009 IP
  5. kevin hemminger

    kevin hemminger Greenhorn

    Messages:
    49
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    23
    #25
    Try to isolate him. Check your visitor stats ... check his server IP address, his site's ip is 74.52.185.162. maybe he is scraping from his own server so when he shows up at your site, that's his IP

    Once you know you have his IP ... cloak your content just for him. pseudo code would be like ...

    if $_SERVER['REMOTE_ADDR'] = '74.52.185.162' then show spam content, else show regular content

    Then ... this content that you would be feeding him ... could be content specifically designed to get him banned in google. Something like buy viagra buy viagra buy viagra over and over again, mixed in with super spammy links to viagra spam farms. Those words show up on his blog and google will think he's a no good lousy spammer and ban him.
     
    kevin hemminger, Sep 23, 2009 IP
  6. MyManMatt

    MyManMatt Member

    Messages:
    29
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    36
    #26
    I am the author of a fairly powerful web site system. One of the features of that system is a hard core content scraping system. Its not intended to be used to rip content for the purpose of "stealing", but I guess it does get used that way by some of the people that download it. I figured I'd post here to let you know what kind of things you COULD be up against if someone got serious.

    The system that scraps the content doesnt have to be the same system that shows the content in a new site. In fact, why waist server resources scraping a site when it should be used for serving page requests. Banning the ip of the site hosting your content wont do anything in this case. In my system, I can run a web server on a dynamic ip, which then pushes the content to the public web server that actually shows the content.

    My system is highly configurable, and one option is a range of time to randomly pick between for when the next scrap will be. For example, I can say scrap between 5 to 30 minutes and it will randomly chose a time between those two.

    If your site requires login, I can register 10 different user accounts and then randomly pick which user account to use each time the scraping happens. That way, when you look at your access log, you cannot see that a specific user account is always going to the site at specific time intervals.

    If it gets serious, I can configure one or more proxies to go through when scraping. If it goes beyond that, I have a simple windows console app that can be run in an internet cafe which collects the content to be scrapping and saves it into a xml/zip file (images, content, everything).

    I use a combination of regex and document structure to locate the real content. For example, scraping forums, I can find forum posts, threads, groups, users and everything and rebuild it within my system forum software. I can train my system to help build up the regex properly and then turn it lose. Typically, changing a forums theme/css wont break my system from scraping it.

    If I point my system at a new forum with about 200k posts and 5 to 7k users, I can read the entire forum in just a few hours and have the entire forum recreated on my system. After that, it only looks at pages that have new content. To people coming to my forum, it looks like its own forum completely, with users posting regularly.

    My system can massively parallel scrap one or more sites at the same time. That means I can have one or 50 computers working together on different internet connections hitting one or more sites to scrap their content. Each box can also use multiple threads to read more than one page at a time. This is what makes it very fast at sucking up a forum.

    My base system can scrap most word press blogs without any new configuration because they all follow such a common html structure. Word press themes mostly map onto the standard html created by word press.

    I was able to map my system onto the IMDB.com site to grab all the movie and actor information and rehost it in my own site. It created contacts for the actors and articles for the movies. It also looks for actor info on wikipedia and grabs that too. I am also able to grab all the news stories from several large news sites like CNN.com.

    In the end, the point isnt to steal content, but to allow someone to move their current system over to my system without having to have work. For example, I have a news paper site using my system now. For about 6 months, there was a time where their new site using my system was scrapping their live system while the theme and other functions were being worked out. The news paper was able to continueously see their site in both the new and the old system. This is true for forums, blogs and others.

    However, as you can read from above, some of the features added into the scrapping system are specifically for stealthy grabbing of content. It is true that these features are for grabbing content when it is not authorized.
     
    MyManMatt, Sep 24, 2009 IP
  7. CJnQA

    CJnQA Peon

    Messages:
    224
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #27
    That was my point. Exactly! That's all I was saying. :cool:

    Yeah, which is why I didn't want my stuff associated with his... :mad:

    EXACTLY!
    Thank you!

    I wanted to quote you completely, because I think what you are telling us is extremely good to know. Thank you. And are you selling an ecourse on this--not HOW to do it, but what the rest of us non-tech-types should know about what you and others like you can do. The Internet is a whole lot more powerful than a lot of people realize.

    Which makes you kind of scary. So, are you a good witch or a bad witch? (That's a Line from The Wizard Of Oz, in case no one gets that)

    Instead, I will just recommend that everyone reading this thread read your post. It's just ^^^up-there-aways^^^

    By the way, I don't think vipin is that motivated.

    Holy Crap! And I thought I was the reason there was so much activity with my IP! We're on the same server!


    You know... your suggestion is so devious, it sounds almost like fun! LOL! What a cool idea! Thanks!

    I don't think he's scraping my site any longer, but now I know another trick. Cool!

    You know, just tonight, I read an eBook about Affiliate Marketing that mentions (as an aside) that very strategy! It specifically mentioned being sure they are within the first two or three sentences of a post, that way, even RSS feeds that are only allowing summaries instead of full posts, will still deliver YOUR links to the scraper's site!

    I thought it was a stroke of genius! Thank you!

    Then, what about my loyal readers? Believe it or not, I do seem to have a small group of deeply-disturbed followers. LOL...

    I hear ya, tho. Thanks for the suggestion.

    I can't tell. I know I had it working before Google bought out Feedburner, but after that, nothing worked. If I'm not mistaken, doesn't Worpress have, like, some kind of built-in feed... ???

    Still learning this stuff...

    Wow! I'm kind of chatty tonight! Bet you are all real glad to see 7 thread subscription notification emails... Sorry!

    :LOL: :D
     
    Last edited: Sep 25, 2009
    CJnQA, Sep 25, 2009 IP
  8. mjesales

    mjesales Peon

    Messages:
    326
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #28
    If you are using full feeds and you have a lot of full feed subscribers, then cutting those out can discourage those feed readers.

    But one of the best things you can do is to make sure to insert links to other related posts and category in the feed. Then when they syndicate your stuff, there are more links syndicated as well.

    Since that site in particular won't load... or at least it wouldn't - perhaps you got them shut down.

    But don't forget that there are a lot of legitimate rss aggregators out there, and most have a defined bot that they use - which you can block easy enough.

    Also you should be sure to include something in your sites terms and conditions about use of your feed and what is and isn't allowed.
     
    mjesales, Oct 6, 2009 IP
  9. tlcmkt

    tlcmkt Peon

    Messages:
    19
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #29
    I like the idea of using that plugin, that is slick that way you get some extra backlink. But I must admit your pigs butt approach was a creative solution as well

    tlcmkt
     
    tlcmkt, Oct 10, 2009 IP
  10. slidetheweb

    slidetheweb Peon

    Messages:
    27
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #30
    Scrapers can be a killer for your moral, workspeed or revenue. Even when you hunt down their IP's you will still have a hard time stopping it all. With proxy IP's involved it will come down to automated page monitoring (yes scraping) to spot any test entries that imply that a scrape has completed and find the related IP in your weblog. Maybe the footer options has the best effect hoping that the scraper feels the trust damage he creates on his on site.
     
    slidetheweb, Oct 16, 2009 IP
  11. CJnQA

    CJnQA Peon

    Messages:
    224
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #31
    Let me get this straight... this is MY thread... where I asked for advice about a problem I was having... and am having a conversation with the folks who respond...

    MY LAST POST, the one above, has been marked as spam and I have received an infraction!!!!!!!!!! IN MY OWN THREAD! WTF?!?!?!?!

    idfbt. Guess I won't be coming back here any time soon.
     
    CJnQA, Oct 19, 2009 IP
  12. gareth_esutera

    gareth_esutera Member

    Messages:
    47
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #32
    This is the second reason why showing post excerpt in RSS feed is better than showing the entire content. The first reason is if your visitors read only through RSS feed, they won't be able to see your ads, therefore less income. Does not apply if you don't have ads.
     
    gareth_esutera, Oct 19, 2009 IP
  13. sweetlouise

    sweetlouise Well-Known Member

    Messages:
    1,858
    Likes Received:
    38
    Best Answers:
    0
    Trophy Points:
    165
    #33

    i would totally trash the hell out of his site, fill it full of links to porn and sick sites. im sure you can set you site to not display all this. then just keep on having fun with this spammer fool. quality picture btw
     
    sweetlouise, Oct 19, 2009 IP
  14. averyz

    averyz Well-Known Member

    Messages:
    1,228
    Likes Received:
    167
    Best Answers:
    2
    Trophy Points:
    115
    #34
    I would turn off th rss and do a wildcard IP ban on his IP
     
    averyz, Oct 19, 2009 IP