What type of software do I need to pull information such as Prices off a Website?

Discussion in 'Programming' started by NRLMedia, Mar 30, 2008.

  1. #1
    Does anyone know what type of software/script/program I could use to check prices off of a website?

    I'm looking to monitor price changes.

    Thanks in advance.
     
    NRLMedia, Mar 30, 2008 IP
  2. live-cms_com

    live-cms_com Notable Member

    Messages:
    3,128
    Likes Received:
    112
    Best Answers:
    0
    Trophy Points:
    205
    Digital Goods:
    1
    #2
    You use PHP to scrape the html of a page, then REGEX to get the values you require, then CRON to make the script run every day.
     
    live-cms_com, Mar 30, 2008 IP
  3. Ikki

    Ikki Peon

    Messages:
    474
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #3
    ASP, ASP.NET, PHP, PERL, etc... :p there are tons of languages that might do the job for you. Of course, as live-cms said I'd suggest PHP because is easy to learn (with some practice and willing to learn of course) and it's free.
     
    Ikki, Mar 30, 2008 IP
  4. NRLMedia

    NRLMedia Peon

    Messages:
    2,462
    Likes Received:
    215
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Do you know of any example scripts?

    I'd appreciate any addition help. I'm lost on this!
     
    NRLMedia, Mar 30, 2008 IP
  5. live-cms_com

    live-cms_com Notable Member

    Messages:
    3,128
    Likes Received:
    112
    Best Answers:
    0
    Trophy Points:
    205
    Digital Goods:
    1
    #5
    It's not as simple as downloading a software/script. At the very least you'll need to know how to create a suitable REGEX expression.

    You could find a PHP sample to do the scraping, but the REGEX varies depending on the website you want to get the data from.
     
    live-cms_com, Mar 30, 2008 IP
  6. NRLMedia

    NRLMedia Peon

    Messages:
    2,462
    Likes Received:
    215
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Alright, so my best bet is to contact a programming agency on this one.

    Thanks for the information, I appreciate it.
     
    NRLMedia, Mar 30, 2008 IP
  7. live-cms_com

    live-cms_com Notable Member

    Messages:
    3,128
    Likes Received:
    112
    Best Answers:
    0
    Trophy Points:
    205
    Digital Goods:
    1
    #7
    The REGEX is only 1 line long, so hopefully you find a good programmer who won't overcharge for it. ;)

    The PHP is copy and paste and the REGEX is 1 line, although that 1 line may take 15 minutes to code; REGEX can be rather complicated sometimes.
     
    live-cms_com, Mar 30, 2008 IP
  8. NRLMedia

    NRLMedia Peon

    Messages:
    2,462
    Likes Received:
    215
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Could REGEX work on sites that use drop down selection menus with multiple options?

    Example is: Male Shoes and Size 10.5, both have to be selected and then the price appears
     
    NRLMedia, Mar 30, 2008 IP
  9. live-cms_com

    live-cms_com Notable Member

    Messages:
    3,128
    Likes Received:
    112
    Best Answers:
    0
    Trophy Points:
    205
    Digital Goods:
    1
    #9
    Here's an example of the PHP and REGEX, the REGEX would need to be modified. (The REGEX is the preg_match_all part.)

    
    function file_get_contents_curl($url) {
    	$ch = curl_init();
    	curl_setopt($ch, CURLOPT_HEADER, 0);
    	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    	curl_setopt($ch, CURLOPT_URL, $url);
    	$data = curl_exec($ch);
    	curl_close($ch);
    	return $data;
    }
    $results = file_get_contents_curl("http://www.domain.com/index.html");
    preg_match_all('/<td>([A-Za-z0-9 ]+)<\/td>/', $results, $matches);
    $total = count($matches[1]);
    $x = 0;
    while ($x < $total){
    	echo $matches[1][$x]."<br />\n";
    	$x += 1;
    }
    
    Code (markup):
    Yes, REGEX could be used on a dropdown unless it gets the values dynamically with AJAX. REGEX searches static code, like a search engine spider does.
     
    live-cms_com, Mar 30, 2008 IP