I am having trouble making this regex stuff work the way I like to. Lets say I have a bunch of tables that I would like to extract (one by one). So lets say it goes something like this: some irrelevant stuff here <table class="bla1"> somestuff here with newline characters </table> some irrelevant stuff in between <table class="bla1"> some more stuff here with newline characters </table> some irrelevant stuff after HTML: So I am using this: preg_match_all('/(<table\s+class=\"bla1\"[\\s\\S]+<\/table>/i',$s,$matches,PREG_SET_ORDER) PHP: and I get this: <table class="bla1"> somestuff here with newline characters </table> some irrelevant stuff in between <table class="bla1"> some more stuff here with newline characters </table> HTML: So instead of extracting from the first table tag to </table> it extracts to the very last </table>. I would like to have each table in one place in the results array instead of the first table tag to the very last </table> with all the useless stuff in between. I would really appreciate your help. I believe [\\s\\S] matches everything including </table>, so maybe I need to exclude it somehow? However I have only found out how to negate single chars.
'/<table\sclass="blah1">([^<]+)<\/table>/' PHP: Try this. $matches[1] should hold the wanted content.
Thanks, but in all honesty I was not really after tables (it was a simple example). My stopping expression is a characteristic sentence (one which stands at the end of a certain paragraph of text). Also what about <td> and <tr>. Wouldn't those get stripped somehow by your code? As far as I can tell you won't allow any < characters? So basically I would need to negate more than just one character. I tried ^(sentence\sto\sscan\sfor) but that didn't work.
I think I found the answer. I added U (where it says /i it now says /iU) to switch to nongreedy pattern matching. Go figure.