regex driving me nuts

Discussion in 'PHP' started by andre75, Dec 27, 2006.

  1. #1
    I am having trouble making this regex stuff work the way I like to.

    Lets say I have a bunch of tables that I would like to extract (one by one).
    So lets say it goes something like this:
    
    some irrelevant stuff here
    
    <table class="bla1"> somestuff here with newline characters </table>
    
    some irrelevant stuff in between
    
    <table class="bla1"> some more stuff here with newline characters </table>
    
    some irrelevant stuff after
    
    HTML:
    So I am using this:
    preg_match_all('/(<table\s+class=\"bla1\"[\\s\\S]+<\/table>/i',$s,$matches,PREG_SET_ORDER)
    PHP:
    and I get this:

    
    <table class="bla1"> somestuff here with newline characters </table>
    
    some irrelevant stuff in between
    
    <table class="bla1"> some more stuff here with newline characters </table>
    
    HTML:
    So instead of extracting from the first table tag to </table> it extracts to the very last </table>. I would like to have each table in one place in the results array instead of the first table tag to the very last </table> with all the useless stuff in between.
    I would really appreciate your help.

    I believe [\\s\\S] matches everything including </table>, so maybe I need to exclude it somehow? However I have only found out how to negate single chars.
     
    andre75, Dec 27, 2006 IP
  2. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #2
    
    
    '/<table\sclass="blah1">([^<]+)<\/table>/'
    
    
    PHP:
    Try this.

    $matches[1] should hold the wanted content.
     
    nico_swd, Dec 28, 2006 IP
  3. andre75

    andre75 Peon

    Messages:
    1,203
    Likes Received:
    45
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks, but in all honesty I was not really after tables (it was a simple example). My stopping expression is a characteristic sentence (one which stands at the end of a certain paragraph of text).

    Also what about <td> and <tr>. Wouldn't those get stripped somehow by your code? As far as I can tell you won't allow any < characters?
    So basically I would need to negate more than just one character. I tried ^(sentence\sto\sscan\sfor) but that didn't work.
     
    andre75, Dec 28, 2006 IP
  4. andre75

    andre75 Peon

    Messages:
    1,203
    Likes Received:
    45
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I think I found the answer. I added U (where it says /i it now says /iU) to switch to nongreedy pattern matching. Go figure.
     
    andre75, Dec 28, 2006 IP