How to grab/scrap data from another site?

Discussion in 'PHP' started by KingCobra, Apr 17, 2013.

  1. #1
    Dear friends,

    I would like to grab data from this ( http://dsebd.org/latest_share_price_scroll_l.php) site. This is a stock markets site. In the middle of this page there is an iFrame page that displaying 522 stocks with their values. I need to grab them and display on my php based site. This page updates every 5 minutes. Would you please help me with codes or script to grab those data and display on my site. If you want I can use mysql database to store data and then display.

    Thanks.
     
    KingCobra, Apr 17, 2013 IP
  2. dontkillme

    dontkillme Well-Known Member

    Messages:
    240
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    123
    #2
    use php file_get_content ;)
     
    dontkillme, Apr 17, 2013 IP
  3. sparky21289

    sparky21289 Member

    Messages:
    97
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    45
    #3
    You should attempt it yourself, a good start is to look through some of these helpful answers:

    http://stackoverflow.com/questions/9172448/scrape-data-from-a-website-with-php

    $html = file_get_html('http://www.ps3trophies.org/games/psn/1/');
    $otherPages = $html->find('a[href^=/games/psn/]'); // this will get the links for the 7 other pages
    PHP:
     
    sparky21289, Apr 17, 2013 IP
  4. Dangy

    Dangy Well-Known Member

    Messages:
    841
    Likes Received:
    25
    Best Answers:
    2
    Trophy Points:
    155
    #4
    Your actually better off using curl & regex the output you receive from Curl, because then you can make the requests are coming from other browsers each time you hit the page.
     
    Dangy, Apr 17, 2013 IP
  5. Alex Roxon

    Alex Roxon Active Member

    Messages:
    424
    Likes Received:
    11
    Best Answers:
    7
    Trophy Points:
    80
    #5
    As was mentioned, if it's a quick grab you're probably better off using curl and regular expressions. If you're going to be doing a lot of scraping you'd be better off using a 3rd party parsing library then regular expressions. Something like the following will work in your case (it's pretty dirty, so you should add in error prevention etc.):

    <?php
     
    function retrieveStockPrices() {
        $ch = curl_init("http://dsebd.org/latest_share_price_scroll_l.php");
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $html = curl_exec($ch);
        curl_close($ch);
     
        preg_match_all('/top\'>([^\s]+) &nbsp;([\d\.]+)&nbsp;/', $html, $res);
       
        $companies = array();
        for ($x = 0; $x < sizeof($res[1]); $x++)
            $companies[$res[1][$x]] = $res[2][$x];
           
        return $companies;
    }
       
    var_dump(retrieveStockPrices());
    PHP:
    Output:
    array(74) {
      ["1JANATAMF"]=>
      string(4) "5.90"
      ["AAMRATECH"]=>
      string(5) "32.90"
      ["ACIZCBOND"]=>
      string(6) "838.50"
      ["AFTABAUTO"]=>
      string(5) "70.80"
      ["AIMS1STMF"]=>
      string(5) "38.90"
      ["AL-HAJTEX"]=>
      string(5) "49.10"
      ["ALARABANK"]=>
      string(5) "21.80"
      ["ANWARGALV"]=>
      string(5) "16.00"
      ["APEXFOODS"]=>
      string(5) "68.80"
      ["APEXSPINN"]=>
      string(5) "60.50"
      ["APEXTANRY"]=>
      string(5) "70.00"
      ["ARAMITCEM"]=>
      string(5) "65.30"
      ["ATLASBANG"]=>
      string(6) "145.80"
      ["AZIZPIPES"]=>
      string(5) "15.70"
      ["BDFINANCE"]=>
      string(5) "22.90"
      ["BDWELDING"]=>
      string(5) "20.20"
      ["BENGALWTL"]=>
      string(5) "55.40"
      ["BERGERPBL"]=>
      string(6) "508.50"
      ["BSRMSTEEL"]=>
      string(5) "46.70"
      ["CONFIDCEM"]=>
      string(6) "101.60"
      ["CONTININS"]=>
      string(5) "25.10"
      ["DELTALIFE"]=>
      string(7) "3773.10"
      ["DHAKABANK"]=>
      string(5) "18.20"
      ["FINEFOODS"]=>
      string(5) "15.80"
      ["FLEASEINT"]=>
      string(5) "29.20"
      ["FUWANGCER"]=>
      string(5) "18.40"
      ["GEMINISEA"]=>
      string(6) "116.00"
      ["GLOBALINS"]=>
      string(5) "28.80"
      ["GOLDENSON"]=>
      string(5) "43.00"
      ["GQBALLPEN"]=>
      string(6) "153.20"
      ["GRAMEENS2"]=>
      string(5) "16.50"
      ["GREENDELT"]=>
      string(5) "57.20"
      ["IBBLPBOND"]=>
      string(6) "899.75"
      ["ICB2NDNRB"]=>
      string(5) "11.40"
      ["ICB3RDNRB"]=>
      string(4) "5.80"
      ["IFIC1STMF"]=>
      string(4) "7.20"
      ["ISLAMIINS"]=>
      string(5) "27.20"
      ["JAMUNAOIL"]=>
      string(6) "174.80"
      ["JANATAINS"]=>
      string(6) "208.00"
      ["JUTESPINN"]=>
      string(5) "59.10"
      ["LIBRAINFU"]=>
      string(6) "202.50"
      ["LRGLOBMF1"]=>
      string(4) "8.20"
      ["MALEKSPIN"]=>
      string(5) "18.90"
      ["MEGHNACEM"]=>
      string(5) "92.40"
      ["METROSPIN"]=>
      string(5) "15.50"
      ["MODERNDYE"]=>
      string(5) "57.00"
      ["MONNOCERA"]=>
      string(5) "26.50"
      ["NAVANACNG"]=>
      string(5) "62.80"
      ["ORIONINFU"]=>
      string(5) "38.60"
      ["PADMALIFE"]=>
      string(5) "62.00"
      ["PARAMOUNT"]=>
      string(5) "24.00"
      ["PHENIXINS"]=>
      string(5) "48.90"
      ["POWERGRID"]=>
      string(5) "54.00"
      ["PRIMEBANK"]=>
      string(5) "30.50"
      ["PRIMELIFE"]=>
      string(6) "101.10"
      ["PURABIGEN"]=>
      string(5) "23.40"
      ["RAHIMTEXT"]=>
      string(5) "97.00"
      ["RELIANCE1"]=>
      string(4) "9.30"
      ["RENWICKJA"]=>
      string(5) "75.70"
      ["RUPALIINS"]=>
      string(5) "32.90"
      ["SAIHAMCOT"]=>
      string(5) "25.40"
      ["SAIHAMTEX"]=>
      string(5) "25.40"
      ["SALAMCRST"]=>
      string(5) "36.00"
      ["SALVOCHEM"]=>
      string(5) "21.20"
      ["SEBL1STMF"]=>
      string(4) "8.50"
      ["SONARGAON"]=>
      string(5) "17.00"
      ["TALLUSPIN"]=>
      string(5) "23.80"
      ["TRUSTB1MF"]=>
      string(4) "7.20"
      ["TRUSTBANK"]=>
      string(5) "16.90"
      ["UNIQUEHRL"]=>
      string(5) "80.70"
      ["UNITEDAIR"]=>
      string(5) "18.70"
      ["UNITEDINS"]=>
      string(5) "33.10"
      ["USMANIAGL"]=>
      string(5) "84.50"
      ["UTTARAFIN"]=>
      string(5) "64.00"
    }
    Code (markup):
     
    Alex Roxon, Apr 17, 2013 IP
  6. KingCobra

    KingCobra Well-Known Member

    Messages:
    289
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    103
    #6
    Dear Alex Roxon,
    Thank you very much for your time and effort. It seems working. I will test it now.
    Thank you again.
     
    KingCobra, Apr 17, 2013 IP
  7. kutchbhi

    kutchbhi Active Member

    Messages:
    130
    Likes Received:
    4
    Best Answers:
    2
    Trophy Points:
    70
    #7
    QUERYPATH - For scraping I can't recommend this library enough .
     
    kutchbhi, Apr 21, 2013 IP
  8. ratan1980

    ratan1980 Member

    Messages:
    46
    Likes Received:
    1
    Best Answers:
    1
    Trophy Points:
    28
    #8
    use file_get_contents or snoopy (php library)
     
    ratan1980, Apr 22, 2013 IP
  9. annaharris

    annaharris Active Member

    Messages:
    119
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    51
    #9
    Try using "file_get_contents" this function and there are other inbuilt php libraries too.
     
    annaharris, Apr 23, 2013 IP