What would people think of a spider-like tool in which you feed it a URL for a forum or message board, and it starts scraping and saving all the users, posts, threads, and categories into a mysql database that can easily be imported into your own forum? It would basically take everything from a forum and save it for you in a format which you can import into your own forum. Would this be beneficial at all? Would it be fair? Could you get in trouble? Just curious since the idea came to me while reading another thread..
I thought about something like this. It would be great... if its possible to code without nuking both your server and the victim server. Such tools exist to create standard files.. but not that automatically make the posts... Are you thinking of making it?
It does seem unethical. Would this be considered blackhat? Anyways, yes, if it were web based I could code it, but it might take up alot of bandwidth on both servers =/
It sounds a bit like blog scrapers anyway. Which already exist. Blog posts are usually better quality then forum posts thats all.
Screw the server and would screw you with legal issues, theres already applications like this out there.
there is a tool, infact i had it on my old site, that builds a search for your site, by crawling your server, it is a spider, and it can be used to crawl any website you wish, and it was free or linkware when i used it. it indexed the whole site and build a search and placed all the stuff in a database, it could be run from cronjobs too, or just from the admin pannel. but it would crawl any server/site you pointed it too, and yes it did take hours to index a 10,000 page site.
Yes, there is a blog out there which created those spiders to find vulnerabilities on sites that allow retrieve other's content and databases. Not sure where the blog is located but I know how the spiders are called so here they are if someone wants to do a research on this matter: WebVulnCrawl WebVulnScan
That would be to hack into the database and steal information. The method I was suggesting was a crawler to just scan and scrape info off pages.
In such case you may try an offline browser, there are lots of them at www.download.com and they basically download all what they find available as public resource in most websites, unless it's a robots.txt preventing the access. However putting the content into a database requires manual coding
People do it to blogs every day. I bet half of your blogs are already copied by RSS readers and then psoted elsewhere.
There's a really big difference between RSS feeds where the copyright holder is choosing how much information to make available, and scraping the full text of an entire forum w/o the consent of the forum owner or posters; one of whom owns the legal rights to the text, depending on the forum.
This tool would get you directly in court. Imagine you piping out the data from some forum. You also fetch the copyrighted materials from there. And that break the copyrights. EX: I write my own articles specially for DP forums, For Namepros and other places. They are unique and they only reside on that particular forum. Consider it as my contribution towards the community or anything but I leave a small Copyright notice in the end of that post. Same can be the case with the forum owner himself. What if he has written some guide and some points on his forum for benefit of his users. Now if that gets stolen he will surely turn red.
I'm sure someone out there can or has made something like this. Might as well name it "VAMPIRE" as it would be designed to suck the life force out of real humans' efforts so as to feed a lazy, unethical slob who can only live on the talents of others. I've got a "Tool" that's just right for this - A Big Pointy Stake to shove in any blackguard's heart that would rather steal that create for themselves. I am sure there are legal recourses to use against these "scapers" (bottom feeders) when one finds their work posted somewhere they didn't offer it to. So I would hope fellow posters will be forthcoming in letting the rest of us know how to end this type of intellectual abuse.