PHP cURL (Scraping a website)

kidatum Peon

Messages:: 61

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#1

Hey guys,

I need a little help scraping a no-frills website. The main problem I have is sending headers or cookies to set a store. If you've never been to the website, the first time you visit it asks you to select Province, City, and the Store. Then I have access to viewing items and prices of that store. I've tried using various methods using cURL but I get "Received HTTP code 403 from proxy after CONNECT" error.

Here is the link: http://www.nofrills.ca/LCLOnline/flyers_landing_page.jsp - you can select any province, city and store for testing.

Please help me. Thank you in advance,

- kidatum

kidatum, Mar 25, 2012 IP

sarahk iTamer Staff

Messages:: 28,899

Likes Received:: 4,555

Best Answers:: 123

Trophy Points:: 665

#2

It should be just a matter of setting up the fields that need to be submitted and posting the form.

You may need to outline what exactly you have tried.

sarahk, Mar 26, 2012 IP

kidatum Peon

Messages:: 61

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#3

sarahk said: ↑

It should be just a matter of setting up the fields that need to be submitted and posting the form.

You may need to outline what exactly you have tried.
Click to expand...

The site accepts post variables and processes them using javascript which makes this difficult. Either way I found a solution after 4 hours of research

Thanks for looking at the topic though,

- kidatum

kidatum, Mar 26, 2012 IP

Alex Roxon Active Member

Messages:: 424

Likes Received:: 11

Best Answers:: 7

Trophy Points:: 80

#4

If you ever experience these issues again in the future, they key is to mimic, as much as you can, how a popular browser would access the websites. You may have to consider cookies, user agents, post/get data, encoding, etc. If you do all of that properly there's no real way for a website to deem you as anything other than a normal user (until you start hitting the server with a million requests heh)

Alex Roxon, Mar 26, 2012 IP

kidatum Peon

Messages:: 61

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#5

Good point, thanks. I start with the least requirements as possible and then build upon more as needed.

kidatum, Mar 26, 2012 IP

ROOFIS Well-Known Member

Messages:: 1,234

Likes Received:: 30

Best Answers:: 5

Trophy Points:: 120

#6

A further point to mention from Alex's post try to utilize a referer that matches the site's URI structures,
ie:
$url = "http://example.com/scrape-this-page"; 
$ref = "[B][COLOR="#FF0000"]http://example.com[/COLOR][/B]";

curl_setopt($curl, CURLOPT_URL, $url);
 ..// code
  curl_setopt($curl, CURLOPT_REFERER, "[B][COLOR="#FF0000"]$ref[/COLOR][/B]");
  ..// more code etc ...
Code (markup):
This way it appears from their server logs that you've navigated from one link to another (presumably from index to page of interest)
just like a browser would do.

ROOFIS

ROOFIS, Mar 29, 2012 IP

Log in or Sign up

PHP cURL (Scraping a website)

kidatum Peon

sarahk iTamer Staff

kidatum Peon

Alex Roxon Active Member

kidatum Peon

ROOFIS Well-Known Member

Useful Searches