1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Web Automation - Fetching automatically values from a specific part of a web-page

Discussion in 'PHP' started by somghosh, Nov 24, 2012.

  1. #1
    Hello

    I am trying to extract data from a specific web-page (say for an this page, go and find the price currently of the product - http://www.ocado.com/webshop/product/Ocado-Smoked-Salmon-Long-Sliced/59534011? )

    I am trying to find out the best way to do this.

    1) Imacros extension on Firefox - that allows for some script to be run on firefox, does not require a server
    2) PERL - I dont exactly know how to do it, but I think requires a server
    3) PHP - I dont exactly know how to do it, but I think requires a server

    Would be glad if someone comments on these methods and if there's anything more easy, and guides me in the right direction

    Many thanks
     
    somghosh, Nov 24, 2012 IP
  2. goldensea80

    goldensea80 Well-Known Member

    Messages:
    422
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    128
    #2
    It depends on which language you know best. For me, the best way is using Regular expression (preg_match) in PHP to extract the content.
     
    goldensea80, Nov 24, 2012 IP
  3. NetStar

    NetStar Notable Member

    Messages:
    2,471
    Likes Received:
    541
    Best Answers:
    21
    Trophy Points:
    245
    #3
    I prefer Perl (using WWW::Mechanize and an HTML Extractor library) to scrape and crawl.... With PHP you can use cURL to crawl and the DOM to scrape...
     
    NetStar, Nov 25, 2012 IP
  4. Sano000

    Sano000 Active Member

    Messages:
    52
    Likes Received:
    4
    Best Answers:
    5
    Trophy Points:
    53
    #4
    You can use any programming language. PHP and perl does not require a server, you can run them locally.

    It is a good idea, to use some libs to parse content. For example, http://simplehtmldom.sourceforge.net/

    Then solution to your problem will be like this: (php+simple_html_dom)

    <?php
        require_once('simple_html_dom.php');
    
        $html = file_get_html('http://www.ocado.com/webshop/product/Ocado-Smoked-Salmon-Long-Sliced/59534011?');
        $was_price = html_entity_decode(trim($html->find('span.wasPrice', 0)->plaintext), ENT_NOQUOTES, 'UTF-8');
        $price = html_entity_decode(trim($html->find('span.nowPrice', 0)->plaintext), ENT_NOQUOTES, 'UTF-8');
        
        print "Old price $was_price\n";
        print "New price $price\n";
    ?>
    PHP:
     
    Sano000, Nov 25, 2012 IP
  5. yelbom

    yelbom Greenhorn

    Messages:
    36
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #5
    I use PHP file_get_contents and then preg_match_all to extract data from certain parts of the page
    
    $url = file_get_contents("http://example.com);
    preg_match_all('%<span class="item">(.*?)</span>%sim', $url, $result);
    PHP:
     
    yelbom, Dec 9, 2012 IP