Scraping script loop & array query

Discussion in 'PHP' started by frilioth, Jun 6, 2007.

  1. #1
    Hi All

    I need a little help with my Amazon price scraping script. I currently sell over 3000 items on amazon and need to keep up to date when prices go down. This script uses a mySQL database, goes to the relevant amazon page then scrapes the lowest price. I can then check the prices against my inventory to make sure i'm not to expensive. The problem is that it only uses the first price scraped. I know where the problem is but after 6 hours staring at this screen my mind has stopped working :confused:. If anyone can help I'd be very grateful.

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <title>Untitled Document</title>
    </head>

    <body>
    <?php
    // listing script

    // connect to the server
    mysql_connect( 'localhost','XXXXXXXXX','XXXXXXXXXX' )
    or die( "Error! Could not connect to database: " . mysql_error() );

    // select the database
    mysql_select_db( XXXXXXXX)
    or die( "Error! Could not select the database: " . mysql_error() );


    // retrieve all the rows from the database
    $query = "SELECT * FROM `XXXXXXX`";

    $results = mysql_query( $query );

    // print out the results
    if( $results )
    {
    while( $contact = mysql_fetch_object( $results ) )
    {
    // print out the info
    $id = $contact -> id;
    $sku = $contact -> sku;
    $ASIN = $contact -> ASIN;
    ?>

    <?php

    $url = "http://www.amazon.co.uk/gp/offer-listing/$ASIN/";

    $filepointer = fopen($url,"r");

    if($filepointer){

    while(!feof($filepointer)){

    $buffer = fgets($filepointer);

    $file .= $buffer;

    }

    fclose($filepointer);

    } else {

    die("Could not create a connection to Amazon.co.uk");

    }

    ?>

    <?php

    preg_match("/<span\sclass=\"price\">£(.*)\s/i",$file,$match);

    $result = $match[1];






    ?>
    <table width="100%" border="0" cellspacing="0" cellpadding="0">
    <tr>
    <td width="5%"><?php echo($id) ?></td>
    <td width="23%"><?php echo($sku) ?></td>
    <td width="18%"><?php echo($ASIN) ?></td>
    <td width="39%"><?php echo($url) ?></td>
    <td width="20%"><?php echo($result) ?></td>
    </tr>
    </table>
    <?php



    }
    }
    else
    {
    die( "Trouble getting info from database: " . mysql_error() );
    }

    ?>
    </body>
    </html>
     
    frilioth, Jun 6, 2007 IP
  2. dvd871

    dvd871 Guest

    Messages:
    13
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    First thing to do is if you know where the problem is, start by adding some echo's to see if what you suspect is true. Echo out the data and see what your queries are returning, etc. Debug that thing man ;)

    Usually I find that if after several hours of beating dead horses, its best to just step back and think about things otherwise you just go in circles or get completely off track. Go have a beer (if you are of age!) and try to visualize what you want.
     
    dvd871, Jun 6, 2007 IP
  3. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #3
    Your code, which btw you should always post in
     tags on forums because it's too hard to read code looking like you posted.
    
    [php]
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
        <title>
          Untitled Document
        </title>
      </head>
      <body>
        <?php
        // listing script
    
        // connect to the server
        mysql_connect( 'localhost','XXXXXXXXX','XXXXXXXXXX' )
        or die( "Error! Could not connect to database: " . mysql_error() );
    
        // select the database
        mysql_select_db( XXXXXXXX)
        or die( "Error! Could not select the database: " . mysql_error() );
            
    
        // retrieve all the rows from the database
        $query = "SELECT * FROM `XXXXXXX`";
    
        $results = mysql_query( $query );
    
        // print out the results
        if( $results )
        {
            while( $contact = mysql_fetch_object( $results ) )
            {
                // print out the info
                $id = $contact -> id;
                $sku = $contact -> sku;
                $ASIN = $contact -> ASIN;
                if( !( $data = file_get_contents("http://www.amazon.co.uk/gp/offer-listing/$ASIN/") ) ) 
                {
                  die("Could not create a connection to Amazon.co.uk");
                }
                preg_match("/<span class=\"price\">£(.*)\s/i", $data, $match );
                $result = $match[1];
                ?>
                <table width="100%" border="0" cellspacing="0" cellpadding="0">
                  <tr>
                    <td width="5%">
                      <?php echo($id) ?>
                    </td>
                    <td width="23%">
                      <?php echo($sku) ?>
                    </td>
                    <td width="18%">
                      <?php echo($ASIN) ?>
                    </td>
                    <td width="39%">
                      <?php echo($url) ?>
                    </td>
                    <td width="20%">
                      <?php echo($result) ?>
                    </td>
                  </tr>
                </table>
                <?php
            }
        }
        else
        {
            die( "Trouble getting info from database: " . mysql_error() );
        }
    
        ?>
      </body>
    </html>
    
    PHP:

    You're using ASIN as a variable name, that's a function name, that's bad practice also.

    Before I can say what's wrong with the script, can you post a link to a working http://www.amazon.co.uk/gp/offer-listing/$ASIN/

    My guess would be that you're supposed to be using preg_match_all if you want more than one result from each page, preg_match stops at the first occurence.
     
    krakjoe, Jun 7, 2007 IP
  4. frilioth

    frilioth Peon

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Hi krakjoe

    Thanks for the advice, it's always appreciated. Please find a working link below.

    http://www.amazon.co.uk/gp/offer-listing/B0001K9W9Y/

    I think i'm ok using preg_match as I only want the first price on the page (I need to know what the lowest marketplace price is). I think the problem's near $match[1]. The array doesn't seem to update when a new product is selected from my database.
     
    frilioth, Jun 7, 2007 IP