PHP screen scraping specific data

Discussion in 'PHP' started by KingCobra, Sep 11, 2013.

  1. #1
    Dear friends,

    I need to scrap some specific data from a site. Here is the data source page:

    <table border="1" cellspacing="1" width="100%">
              <tr>
                <td width="50%" valign="top">
                  <table width="100%" border="0" cellpadding="0" cellspacing="1" bgcolor="#C0C0C0">
                    <tr>
                      <td bgcolor="#FFFFFF" colspan="2"><font face="Verdana" size="2"><font color="#0000FF">Market Information:</font></font></td>
                    </tr>
                    <tr>
                      <td width="54%" bgcolor="#FFFFFF"><font face="Verdana" size="2">Last Trade:</font></td>
                      <td width="46%" bgcolor="#FFFFFF"><font face="Verdana" size="2">
                        43.50                  </font></td>
                    </tr>
                    <tr>
                      <td width="54%" bgcolor="#FFFFFF"><font face="Verdana" size="1" color="#808080">Last Update</font></td>
                      <td width="46%" bgcolor="#FFFFFF"><font color="#808080" face="Verdana"><b> Sep 11, 2013 at 02:30PM </b></font></td>
                    </tr>
                    <tr>
                      <td width="54%" height="36" bgcolor="#FFFFFF"><font face="Verdana" size="2">Change</font></td>
                      <td width="46%" bgcolor="#FFFFFF">
                        <table border="0" cellpadding="0" cellspacing="0" width="100%">
                          <tr>
                            <td width="100%" bgcolor="#FFFFCC"><font face="Verdana" size="2">
                              -0.5                        </font></td>
                          </tr>
                          <tr>
                            <td width="100%" bgcolor="#CCCCFF"><font face="Verdana" size="2">
                              -1.14%                        </font></td>
                          </tr>
                      </table></td>
                    </tr>
                    <tr>
                      <td width="54%" bgcolor="#FFFFFF"><font face="Verdana" size="2">&nbsp;</font></td>
                      <td width="46%" bgcolor="#FFFFFF">&nbsp; </td>
                    </tr>
                    <tr>
                      <td width="54%" bgcolor="#FFFFFF"><font face="Verdana" size="2">Open Price</font></td>
                      <td width="46%" bgcolor="#FFFFFF"><font face="Verdana" size="2">
                        44.5                  </font></td>
                    </tr>
                    <tr>
                      <td width="54%" height="20" bgcolor="#FFFFFF"><font face="Verdana" size="2">Adjusted Open Price</font></td>
                      <td width="46%" bgcolor="#FFFFFF"><font face="Verdana" size="2">44.0</font></td>
                    </tr>
                    <tr>
                      <td width="54%" height="26" bgcolor="#FFFFFF"><font face="Verdana" size="2">Yesterday Close Price</font></td>
                      <td width="46%" bgcolor="#FFFFFF"><font face="Verdana" size="2">
                        44.0                  </font></td>
                    </tr>
                </table></td>
                <td width="50%" valign="top">
                  <table width="100%" border="0" cellpadding="0" bgcolor="#C0C0C0">
                  <tr>
                      <td bgcolor="#FFFFFF" height="15" colspan="2">&nbsp;</td>
                    </tr>
                      <tr>
                      <td bgcolor="#FFFFFF"><font face="Verdana" size="2">Close Price</font></td>
                      <td bgcolor="#FFFFFF"><font face="Verdana" size="2">
                      43.6                            </font></td>
                    </tr>
                    <tr>
                      <td bgcolor="#FFFFFF" height="15">&nbsp;</td>
                      <td bgcolor="#FFFFFF">&nbsp;</td>
                    </tr>
                    <tr>
                      <td width="50%" bgcolor="#FFFFFF"><font face="Verdana" size="2">Day's Range</font></td>
                      <td width="50%" bgcolor="#FFFFFF"><font face="Verdana" size="2">
                        43.5 - 45.2                  </font></td>
                    </tr>
                    <tr>
                      <td width="50%" bgcolor="#FFFFFF" height="15"><font face="Verdana" size="2">&nbsp;</font></td>
                      <td width="50%" bgcolor="#FFFFFF">&nbsp;</td>
                    </tr>
                    <tr>
                      <td width="50%" bgcolor="#FFFFFF"><font face="Verdana" size="2">Volume</font></td>
                      <td width="50%" bgcolor="#FFFFFF"><font face="Verdana" size="2">
                        166,200                  </font></td>
                    </tr>
                    <tr>
                      <td width="50%" bgcolor="#FFFFFF" height="15">&nbsp;</td>
                      <td width="50%" bgcolor="#FFFFFF">&nbsp;</td>
                    </tr>
                    <tr>
                      <td width="50%" bgcolor="#FFFFFF"><font face="Verdana" size="2">Total Trade</font></td>
                      <td width="50%" bgcolor="#FFFFFF"><font face="Verdana" size="2">
                        334                  </font></td>
                    </tr>
                    <tr>
                      <td width="50%" height="15" bgcolor="#FFFFFF"><font face="Verdana" size="2">&nbsp;</font></td>
                      <td width="50%" bgcolor="#FFFFFF">&nbsp;</td>
                    </tr>
                    <tr>
                      <td width="50%" height="24" bgcolor="#FFFFFF"><font face="Verdana" size="2">Market Cap in BDT*</font></td>
                      <td width="50%" bgcolor="#FFFFFF"><font face="Verdana" size="2"> 3,080.000 (mn)</font></td>
                    </tr>
                </table></td>
              </tr>
            </table>
    HTML:
    I need to scrap value of "Last Trade, Change, Market Cap in BDT*" that is "5.70, 0.1 1.79%, 1,120.000 (mn)" and display to my page.

    How can I do it? Please help. Thanks.

    Here is the actual data source page: http://dsebd.org/displayCompany.php?name=1JANATAMF that I am going to scrap. You can use this link or the html code above (copy the code and save as a html file).
     
    KingCobra, Sep 11, 2013 IP
  2. xxxize

    xxxize Member

    Messages:
    33
    Likes Received:
    2
    Best Answers:
    2
    Trophy Points:
    38
    #2
    The best way is Simple PHP DOM Parser. If you google it will find all information

    For this sample code you must do something like this:
    $html = file_get_content($link);
    foreach ($html->find('table', 0)->find('table') as $table) {
      $lastTrade = false;
      $change = false;
      $marketCap = false;
      foreach ($table->find('td') as $td) {
          if ($td->plaintext == "Change") $change = true;
          if ($change) {
                echo $td->plaintext;
                $change = false;
          }
          if ($td->plaintext == "Last Trade") $lastTrade = true;
          if ($lastTrade) {
                echo $td->plaintext;
                $lastTrade = false;
          }
          if ($td->plaintext == "Market Cap in BDT*") $marketCap = true;
          if ($marketCap) {
                echo $td->plaintext;
                $marketCap = false;
          }
      }
    }
    PHP:
     
    xxxize, Sep 11, 2013 IP
  3. KingCobra

    KingCobra Well-Known Member

    Messages:
    289
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    103
    #3
    Dear @xxxize ,
    Thank you for your reply. Unfortunately the code giving me error. Is it "file_get_content" or "file_get_contents"? What about "$link".
    Please check it using the above html code as target source if you have free time.
     
    KingCobra, Sep 11, 2013 IP
  4. EricBruggema

    EricBruggema Well-Known Member

    Messages:
    1,740
    Likes Received:
    28
    Best Answers:
    13
    Trophy Points:
    175
    #4
    KingCobra; maby a idea to start your question by looking functions up on php.net?
     
    EricBruggema, Sep 11, 2013 IP
  5. xxxize

    xxxize Member

    Messages:
    33
    Likes Received:
    2
    Best Answers:
    2
    Trophy Points:
    38
    #5
    sorry!!
    this is the right code. you must download simple_html_dom.php.

    include_once ('simple_html_dom.php');
    
    $html = file_get_html("testhtm.html");
    
    foreach ($html->find('table', 0)->find('table') as $table) {
      $lastTrade = false;
      $change = false;
      $marketCap = false;
      foreach ($table->find('td') as $td) {
          if ($change) {
                echo $td->plaintext;
                $change = false;
          }
          if ($lastTrade) {
                echo $td->plaintext;
                $lastTrade = false;
          }
          if ($marketCap) {
                echo $td->plaintext;
                $marketCap = false;
          }
              if ($td->plaintext == "Change") $change = true;
          if ($td->plaintext == "Last Trade") $lastTrade = true;
          if ($td->plaintext == "Market Cap in BDT*") $marketCap = true;
      }
    }
    PHP:
     
    xxxize, Sep 11, 2013 IP
  6. KingCobra

    KingCobra Well-Known Member

    Messages:
    289
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    103
    #6
    Dear @xxxize
    Thank you again for your time and codes. Its working now.
    But if I place the actual target link instead of "testhtm.html" then its show nothing, a blank page only.
    My actual target link is - http://dsebd.org/displayCompany.php?name=1JANATAMF
     
    KingCobra, Sep 12, 2013 IP
  7. xxxize

    xxxize Member

    Messages:
    33
    Likes Received:
    2
    Best Answers:
    2
    Trophy Points:
    38
    #7
    You must find the "path" for every element..
    in the html sample the path for the TD element which has the important content is get the first table and for every table take TD

    this is more simple.. check all the tables.
    The first 2 lines are to display all errors/warnings.

    <?php
    
    ini_set('display_errors', '1');
    
    error_reporting (E_ALL);
    
    include_once ('simple_html_dom.php');
    
    $html = file_get_html("http://dsebd.org/displayCompany.php?name=1JANATAMF");
    
    foreach ($html->find('table') as $table) {
      $lastTrade = false;
      $change = false;
      $marketCap = false;
      if ($table->find('td')) {
        foreach ($table->find('td') as $td) {
          if ($change) {
                echo $td->plaintext;
                $change = false;
          }
          if ($lastTrade) {
                echo $td->plaintext;
                $lastTrade = false;
          }
          if ($marketCap) {
                echo $td->plaintext;
                $marketCap = false;
          }
              if ($td->plaintext == "Change") $change = true;
          if ($td->plaintext == "Last Trade") $lastTrade = true;
          if ($td->plaintext == "Market Cap in BDT*") $marketCap = true;
        }
      }
    }
    
    ?>
    PHP:
    please do like or best answer if you can!
    thank you
     
    xxxize, Sep 12, 2013 IP
  8. KingCobra

    KingCobra Well-Known Member

    Messages:
    289
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    103
    #8
    Dear @xxxize,
    here is the actual sourc link:
    http://dsebd.org/company_details_nav.php?name=1JANATAMF

    Your scripts works but giving me 3 set of duplicate result.

    Please check. Thanks
     
    KingCobra, Nov 25, 2013 IP
  9. NetStar

    NetStar Notable Member

    Messages:
    2,471
    Likes Received:
    541
    Best Answers:
    21
    Trophy Points:
    245
    #9
    You are just looking for someone to do it for you. He wrote the code for you. If it needs tweaking that is up to you. Otherwise, consider hiring a PHP Programmer to do it for you if you are unfamiliar with the language.
     
    NetStar, Nov 25, 2013 IP