The problem is that I read a HTML file in PHP, the main content of HTML file is a table with fields as column and rows as records. How can I extract the values to a variable, say, an associate array? I figure out that I must trip out all the HTML tags except tag TR and TD, then use some Regular Expession but haven't found the final solution yet.
yes you can do a $text = strip_tags($html, '<td>'); PHP: and then regexp with something like this: $pattern = '/<td>([^<]+)<\/td>/'; preg_match_all($pattern, $text, $matches, PREG_SET_ORDER); PHP: You should end up with all your matches in $matches[0] http://php.net/strip_tags http://php.net/preg_match_all HTH, cheers!
Thanks guys, I found that the picouli's method is pretty useful. However, in my cases, it works with PREG_PATTERN_ORDER or none, and the usable array is matches[2]. In fact, since the <td> tags has parameters, so we need to include them in the regexp. Further, we need to include empty cells as well, so + is replaced by *. $pattern = '/<td([^<]*)>([^<]*)<\/td>/'; PHP: