Scrape an Ajax generated page

50inches Greenhorn

Messages:: 16

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 13

#1

Hej,

I am trying to scrape from this site:
www.nelly.com/se/skor-kvinna/skor/festskor/

Initially there is no probelm, but because there are many products I would like to be able to scrape all the paginated products. When I click on for example page 3 in the pagination, the link looks like this:
http://nelly.com/se/skor-kvinna/skor/festskor/#page=3&hits=36&sort=&imgs=4

This looks like there is there is some Ajax or javascript being ran in the background.

I am using Simple HTML DOM class in PHP.

Does anybody have any experience with this?
Thanks in advance.

50inches, Apr 20, 2012 IP

PoPSiCLe Illustrious Member

Messages:: 4,623

Likes Received:: 725

Best Answers:: 152

Trophy Points:: 470

#2

Can't you just loop through the url? If it's as simple as #page=3/4/5 etc. why not just loop and pull info from each url?

PoPSiCLe, Apr 20, 2012 IP

50inches Greenhorn

Messages:: 16

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 13

#3

PoPSiCLe said: ↑

Can't you just loop through the url? If it's as simple as #page=3/4/5 etc. why not just loop and pull info from each url?
Click to expand...

That's what I thought initially aswell. But apparently I can't scrape any of those paginated sites. When I try to for example scrape page 4, it only gives me the results for page 1. The same for all the other pages aswell.

50inches, Apr 20, 2012 IP

Arttu Member

Messages:: 139

Likes Received:: 2

Best Answers:: 8

Trophy Points:: 40

#4

50inches said: ↑

That's what I thought initially aswell. But apparently I can't scrape any of those paginated sites. When I try to for example scrape page 4, it only gives me the results for page 1. The same for all the other pages aswell.
Click to expand...

That's because anything after the # character wont be send to the server.

Since they have made their code so hard to read, I suggest that you get live http headers plugin for firefox or something similar and find the query that gets the next page.

Arttu, Apr 21, 2012 IP

Log in or Sign up

Scrape an Ajax generated page

50inches Greenhorn

PoPSiCLe Illustrious Member

50inches Greenhorn

Arttu Member

Useful Searches