Hi all, My bandwith was being guzzled by one scrapper. And after checking my logs deeply i realized there were a few others... So here is the BIG idea, why dont all of us here at DP start a listing of bad ip's which should be blocked?? here's something to start with Deny this ip belonging to McColo, as its a scraper... 208.66.195. At the end of every week, I will collate the data and add it all up in one thread.. and keep moving from there... I hope everyone supports this venture.
This isn't really a SE related question... A good idea but what you consider a bad bot others may not. I check all the suspicious IP's with dnsstuff and google search and then ban if I feel I'm right in doing so. Ian
Deny this ip belonging to McColo, as its a scraper... 208.66.195. Submitted by Fox LORE 66.77.136.123 62.194.10.83
These projects are always interesting. Unfortunately, they hurt the innocent. Most people do not know the difference between a bad and benign visitor. It is impossible to know someone is scraping your site just because they have visited all pages. Consider the plight of new search engines; people with extremely short attention spans; and those with increadibly fast fingers and minds. The net result is the list is bound to contain IPs which do not deserve to be blacklisted. A more fundamental problem is that your vistor could have a dynamic IP rather than a static IP. The next person who gets the IP is banned from your site for no reason. Meanwhile the scraper returns from another IP address. If I were doing this I would use some form of honey pot to determine the difference between good and bad robots and their IP addresses. I would time limit the ban to take into account dynamic IPs. This should be done in real time by a daemon. Banning throw away IP addresses after the fact is pointless. Another technique used is to limit the number of pages people can visit per minute and assume anyone viewing more pages per minute is a robot. If your pages are picture rich this will catch robots which ignore pictures and those with the temerity to view your site with Lynx. An alternative to banning, of course, is to require users to log in to view your best content.
Did you know that shopping.com does scrapping? We are talking about a big company here. They STOLE a client's image and exact description word for word. I took screen shots of it already, here's the direct link to shopping.com's page in question. I hope they will stop scrapping in the future. http://www.shopping.com/xPO-Magnetic_Bracelet~r-1~CLT-INTR~RFR-images.google.com