I have made an application that scrapes and stores results from the Australian TAB Trackside website, for personal use only. Is there any legal issues involved with me doing this? Of course in doing this I am viewing thousands of pages from the website in short periods of time, as all results are stored on seperate pages. Am I going to get into any trouble if they notice so many hits from my IP?
Good question, do they have a policy to the effect anywhere on the website? Something that reads like 'no content scrapers are allowed regardless of purpose;' or 'personal usage content scraping is allowed'? In any case why dont you pace your website spider algorithm to use different ips all spread over multiple proxies over a period of time (that is not too tightly bound in time). You can get hundreds of proxy ips from fresh proxy lists available across the internet daily. Some of these may even have an API for use in client applications (a webservice to return fresh, tested, working proxy ip - 5 numbers - with every call etc.)
Hi Traffic-Bug, Thanks for your reply. No it doesn't say that anywhere. And I probably worded it wrong too. It scrapes data, rather than actual content, but I guess that it isn't relevant anyway. I think that at the worst, if they catch on, they will limit requests by IP. In which case I will have to use proxies, but I'm not going to put in the extra effort until it is necessary, as it is only for personal use.
Simply don't run a multi-threaded downloading monster. Make it space the downloading out a bit. 2 requests a second wouldn't be a lot in my opinion and that's 120 pages of data a minute, or 1 page per second. The less influence you have on their server, the better. Now if you wrote a scraper with 100 threads, had each running at full speed, you'd be looking at big problems.