regex driving me nuts

andre75 Peon

Messages:: 1,203

Likes Received:: 45

Best Answers:: 0

Trophy Points:: 0

#1

I am having trouble making this regex stuff work the way I like to.

Lets say I have a bunch of tables that I would like to extract (one by one).
So lets say it goes something like this:
some irrelevant stuff here

<table class="bla1"> somestuff here with newline characters </table>

some irrelevant stuff in between

<table class="bla1"> some more stuff here with newline characters </table>

some irrelevant stuff after
HTML:
So I am using this:
preg_match_all('/(<table\s+class=\"bla1\"[\\s\\S]+<\/table>/i',$s,$matches,PREG_SET_ORDER)
PHP:
and I get this:
<table class="bla1"> somestuff here with newline characters </table>

some irrelevant stuff in between

<table class="bla1"> some more stuff here with newline characters </table>
HTML:
So instead of extracting from the first table tag to </table> it extracts to the very last </table>. I would like to have each table in one place in the results array instead of the first table tag to the very last </table> with all the useless stuff in between.
I would really appreciate your help.

I believe [\\s\\S] matches everything including </table>, so maybe I need to exclude it somehow? However I have only found out how to negate single chars.

andre75, Dec 27, 2006 IP

nico_swd Prominent Member

Messages:: 4,153

Likes Received:: 344

Best Answers:: 18

Trophy Points:: 375

#2

'/<table\sclass="blah1">([^<]+)<\/table>/'
PHP:
Try this.

$matches[1] should hold the wanted content.

nico_swd, Dec 28, 2006 IP

andre75 Peon

Messages:: 1,203

Likes Received:: 45

Best Answers:: 0

Trophy Points:: 0

#3

nico_swd said: ↑
'/<table\sclass="blah1">([^<]+)<\/table>/'
PHP:
Try this.

$matches[1] should hold the wanted content.
Click to expand...
Thanks, but in all honesty I was not really after tables (it was a simple example). My stopping expression is a characteristic sentence (one which stands at the end of a certain paragraph of text).

Also what about <td> and <tr>. Wouldn't those get stripped somehow by your code? As far as I can tell you won't allow any < characters?
So basically I would need to negate more than just one character. I tried ^(sentence\sto\sscan\sfor) but that didn't work.

andre75, Dec 28, 2006 IP

andre75 Peon

Messages:: 1,203

Likes Received:: 45

Best Answers:: 0

Trophy Points:: 0

#4

I think I found the answer. I added U (where it says /i it now says /iU) to switch to nongreedy pattern matching. Go figure.

andre75, Dec 28, 2006 IP

Log in or Sign up

regex driving me nuts

andre75 Peon

nico_swd Prominent Member

andre75 Peon

andre75 Peon

Useful Searches