Scrape an Ajax generated page

Discussion in 'PHP' started by 50inches, Apr 20, 2012.

  1. #1
    Hej,

    I am trying to scrape from this site:
    www.nelly.com/se/skor-kvinna/skor/festskor/

    Initially there is no probelm, but because there are many products I would like to be able to scrape all the paginated products. When I click on for example page 3 in the pagination, the link looks like this:
    http://nelly.com/se/skor-kvinna/skor/festskor/#page=3&hits=36&sort=&imgs=4

    This looks like there is there is some Ajax or javascript being ran in the background.

    I am using Simple HTML DOM class in PHP.

    Does anybody have any experience with this?
    Thanks in advance.
     
    50inches, Apr 20, 2012 IP
  2. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #2
    Can't you just loop through the url? If it's as simple as #page=3/4/5 etc. why not just loop and pull info from each url?
     
    PoPSiCLe, Apr 20, 2012 IP
  3. 50inches

    50inches Greenhorn

    Messages:
    16
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    13
    #3
    That's what I thought initially aswell. But apparently I can't scrape any of those paginated sites. When I try to for example scrape page 4, it only gives me the results for page 1. The same for all the other pages aswell.
     
    50inches, Apr 20, 2012 IP
  4. Arttu

    Arttu Member

    Messages:
    139
    Likes Received:
    2
    Best Answers:
    8
    Trophy Points:
    40
    #4
    That's because anything after the # character wont be send to the server.

    Since they have made their code so hard to read, I suggest that you get live http headers plugin for firefox or something similar and find the query that gets the next page.
     
    Arttu, Apr 21, 2012 IP