Hey guys, I need a little help scraping a no-frills website. The main problem I have is sending headers or cookies to set a store. If you've never been to the website, the first time you visit it asks you to select Province, City, and the Store. Then I have access to viewing items and prices of that store. I've tried using various methods using cURL but I get "Received HTTP code 403 from proxy after CONNECT" error. Here is the link: http://www.nofrills.ca/LCLOnline/flyers_landing_page.jsp - you can select any province, city and store for testing. Please help me. Thank you in advance, - kidatum
It should be just a matter of setting up the fields that need to be submitted and posting the form. You may need to outline what exactly you have tried.
The site accepts post variables and processes them using javascript which makes this difficult. Either way I found a solution after 4 hours of research Thanks for looking at the topic though, - kidatum
If you ever experience these issues again in the future, they key is to mimic, as much as you can, how a popular browser would access the websites. You may have to consider cookies, user agents, post/get data, encoding, etc. If you do all of that properly there's no real way for a website to deem you as anything other than a normal user (until you start hitting the server with a million requests heh)
Good point, thanks. I start with the least requirements as possible and then build upon more as needed.
A further point to mention from Alex's post try to utilize a referer that matches the site's URI structures, ie: $url = "http://example.com/scrape-this-page"; $ref = "[B][COLOR="#FF0000"]http://example.com[/COLOR][/B]"; curl_setopt($curl, CURLOPT_URL, $url); ..// code curl_setopt($curl, CURLOPT_REFERER, "[B][COLOR="#FF0000"]$ref[/COLOR][/B]"); ..// more code etc ... Code (markup): This way it appears from their server logs that you've navigated from one link to another (presumably from index to page of interest) just like a browser would do. ROOFIS