How to extract varibles from HTML tables (using PHP)

Discussion in 'PHP' started by goldensea80, Jan 30, 2007.

  1. #1
    The problem is that I read a HTML file in PHP, the main content of HTML file is a table with fields as column and rows as records. How can I extract the values to a variable, say, an associate array?
    I figure out that I must trip out all the HTML tags except tag TR and TD, then use some Regular Expession but haven't found the final solution yet.
     
    goldensea80, Jan 30, 2007 IP
  2. picouli

    picouli Peon

    Messages:
    760
    Likes Received:
    89
    Best Answers:
    0
    Trophy Points:
    0
    #2
    yes you can do a
    $text = strip_tags($html, '<td>');
    PHP:
    and then regexp with something like this:
    
    $pattern = '/<td>([^<]+)<\/td>/';
    preg_match_all($pattern, $text, $matches, PREG_SET_ORDER);
    PHP:
    You should end up with all your matches in $matches[0]

    http://php.net/strip_tags
    http://php.net/preg_match_all

    HTH, cheers!
     
    picouli, Jan 30, 2007 IP
    goldensea80 likes this.
  3. rays

    rays Active Member

    Messages:
    563
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    58
    #3
    read a file in a string, then use explode function

    note code is not precise
     
    rays, Jan 30, 2007 IP
  4. goldensea80

    goldensea80 Well-Known Member

    Messages:
    422
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    128
    #4
    Thanks guys,
    I found that the picouli's method is pretty useful. However, in my cases, it works with PREG_PATTERN_ORDER or none, and the usable array is matches[2].
    In fact, since the <td> tags has parameters, so we need to include them in the regexp. Further, we need to include empty cells as well, so + is replaced by *.
    $pattern = '/<td([^<]*)>([^<]*)<\/td>/';
    PHP:
     
    goldensea80, Jan 30, 2007 IP