1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How to stop someone from stealing my post / database

Discussion in 'Programming' started by TheSyndicate, Apr 21, 2007.

  1. #1
    I am paying someone to put in data in some directories / personal sites and i do not want it to end in DP some one selling it.
    Can some one just spider my site and take the data?
    How do i stop them from doing this? I see many databases are for sale here on DP there is also something called spider that takes stuff from other sites.
     
    TheSyndicate, Apr 21, 2007 IP
  2. MattD

    MattD Peon

    Messages:
    161
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Basically if you allow people to visit and read a website, someone will be able to steal it! Its harder to take something like a database as access permissions mean they cant just download it or something, but the data will still be accessible.

    You best defence is hiring a lawyer and sending out letters telling people to stop using your copyrighted data.
     
    MattD, Apr 21, 2007 IP
  3. Aragorn

    Aragorn Peon

    Messages:
    1,491
    Likes Received:
    72
    Best Answers:
    1
    Trophy Points:
    0
    #3
    It is not that easy to spider your site to make the database. But as MattD said, if someone makes up his mind to do that. Then there is not option for you to prevent it from stealing. You can use Copyscape to search for copies of your content on other sites. If you find some site using it, then you can file for a DCMA suit. With the DCMA notice, you can have the site removed from Google and Yahoo. Warn the site owner about that, and most probably he will remove it. But of course, still you won't be able to prevent stealing :)
     
    Aragorn, Apr 21, 2007 IP
  4. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #4
    I think the correct term is "anything that isn't nailed down"......

    If you leave something unattended, theres always a chance someone will steal it, there is no way to stop people from stealing your content that is foolproof, lawyers cost loadsa money and probably more than you'll loose in the long run - even if it's not, lawyers are famously arrogant - nobody wants that.

    I sat for a while with my ide open ( not writing, helps me think ), and I suppose there are some measures you could put in place.

    BEFORE you make this data available to the public at large, research the sort of thing you're trying to stop, find out if this "spider" sends any identifiable data that can differentiate it from an actual browser or a legitimate robot

    Setup the sites in such a way that a human presence is needed to retrieve data - I'm certain that can be done, but have no suggestions

    Only accept posted data from your own domain name

    If your site is aimed at a particular country then deploy geoip services on the equipment to block requests from outside that region of earth - might seem quite an odd thing to say, but in a way the less people that can get in the better, if theres no point in all of england viewing your data that's some 60 million potential thieves you have stopped in thier tracks .....

    Possibly the best suggestion I have would be, spiders work by reading particular tags at particular locations and matching predefined patterns to extract whatever they are stealing / storing, this is a massive weakness on thier part, you only have to change the source code of your page and it renders the spider useless

    If theres a particular section of the site with sensitive data, you could make it so all traffic going there would have to have the correct referer to get it

    All of these things aren't fantastic, however it's better than doing nothing or having to put up with any arrogance from jumped up school boys in thier daddies law firm .....

    Lastly, the person inputting the data has everything they need to steal it, first make sure you trust that person, second make sure you pay them enough for it to be worthwhile for them to do a good job and come back

    That's all I got .....
     
    krakjoe, Apr 21, 2007 IP
  5. jimrthy

    jimrthy Guest

    Messages:
    283
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #5
    All excellent suggestions. I just have one minor little thing to add...

    The most common way I'm seeing to do this lately is called captchas. They're those funky images with hard-to-read letters and numbers that people are putting on their blogs to try to make you prove that you're a real person before adding a comment.
     
    jimrthy, Apr 21, 2007 IP
  6. MattD

    MattD Peon

    Messages:
    161
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Captchas are not fool proof, and has the possible side effect of pissing of your users though - think very carefully before you implement them.

    Spiders are not limited at all by the structure of a page or the HTML - sure there might be some that operate this way, but it is not "how they work". Also for trying to block IP addresses based on country - its essentially pointless. If someone wanted to steal your data they can just use one of the millions of proxy sites or even just google cache (presuming you haven't already stopped google caching your page).
     
    MattD, Apr 21, 2007 IP
  7. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #7
    The spiders he is talking about do work in that way, there is NO other way for them to work, we're not on about web bots like google here, we're on about scipt kiddies writing scripts that just nick other peoples data by downloading pages........

    Like I said, none of these measures are fool proof, but its better than doing nothing.
     
    krakjoe, Apr 21, 2007 IP
  8. SeLfkiLL

    SeLfkiLL Active Member

    Messages:
    85
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    50
    #8
    You could limit the amount of pages/results served to a single IP and then do open proxy checking.
     
    SeLfkiLL, Apr 21, 2007 IP
  9. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #9
    i understand they can just copy. Thats easy one by one but i dont one to just steal my database right off is that so easy?

    Stop IP sounds good they will not view more then 10 pages for sure. Open proxy is not that great beacuse Thailand has that as well for everybody.
     
    TheSyndicate, Apr 21, 2007 IP
  10. Aragorn

    Aragorn Peon

    Messages:
    1,491
    Likes Received:
    72
    Best Answers:
    1
    Trophy Points:
    0
    #10
    But we can change the referer header programmatically, right?
    I think asking for CAPTCHA validation everytime a user makes a request is not good. And validating just once can be manipulated. You should limit it to every N requests.
    Ofcourse, it is not easy. But if the database is worth the difficulty, then it can be done.
     
    Aragorn, Apr 22, 2007 IP
  11. MattD

    MattD Peon

    Messages:
    161
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #11
    They wont be able to steal the schema (unless you are using a MS Access database or something which they might be able to get with HTTP), but they will be able to view some data that subsequently appears on your pages.

    I've not given this idea much thought (ha!) but you might be able to do something sneaky with the table structure that isn't visible to the end user (as your PHP etc will deal with all of the queries/joins etc automatically) but makes duplicating the database difficult. Obviously this is of limited value as they will still see some data, but it would make wholesale duplication of your database difficult.
     
    MattD, Apr 22, 2007 IP
  12. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #12
    I've not given this idea much thought (ha!) but you might be able to do something sneaky with the table structure that isn't visible to the end user (as your PHP etc will deal with all of the queries/joins etc automatically) but makes duplicating the database difficult. Obviously this is of limited value as they will still see some data, but it would make wholesale duplication of your database difficult.

    How do i do this?
     
    TheSyndicate, Apr 22, 2007 IP
  13. MattD

    MattD Peon

    Messages:
    161
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #13
    No idea - might not even be possible, but its something to sit down and think about when you are designing your db schema
     
    MattD, Apr 28, 2007 IP
  14. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #14
    anybody got a script for blocking IP lets say after 5 pages?
     
    TheSyndicate, Apr 29, 2007 IP
  15. datamynur

    datamynur Guest

    Messages:
    39
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Id like to see some scripts you guys could come up with for that... MattD seems to have it right though... captchas will piss off customers and high anonymity proxy servers with some fancy coding get right around any IP blocking mech you might come up with... but again Id take a look at what anyone comes up with and let you know what I think...
     
    datamynur, Dec 21, 2007 IP
  16. datamynur

    datamynur Guest

    Messages:
    39
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Ill add that I would for sure start with IP blocking... that will weed out many lil script kiddies... maybe from there look into membership based pages where users are required to login and have some "remember me" cookie sets or something... if your site is actually useful, people will signup and then you can really stop any spiders at that point.... long as the login is checking for the users IP that is...
     
    datamynur, Dec 21, 2007 IP