1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

regex driving me nuts

Discussion in 'PHP' started by andre75, Dec 27, 2006.

  1. #1
    I am having trouble making this regex stuff work the way I like to.

    Lets say I have a bunch of tables that I would like to extract (one by one).
    So lets say it goes something like this:
    
    some irrelevant stuff here
    
    <table class="bla1"> somestuff here with newline characters </table>
    
    some irrelevant stuff in between
    
    <table class="bla1"> some more stuff here with newline characters </table>
    
    some irrelevant stuff after
    
    HTML:
    So I am using this:
    preg_match_all('/(<table\s+class=\"bla1\"[\\s\\S]+<\/table>/i',$s,$matches,PREG_SET_ORDER)
    PHP:
    and I get this:

    
    <table class="bla1"> somestuff here with newline characters </table>
    
    some irrelevant stuff in between
    
    <table class="bla1"> some more stuff here with newline characters </table>
    
    HTML:
    So instead of extracting from the first table tag to </table> it extracts to the very last </table>. I would like to have each table in one place in the results array instead of the first table tag to the very last </table> with all the useless stuff in between.
    I would really appreciate your help.

    I believe [\\s\\S] matches everything including </table>, so maybe I need to exclude it somehow? However I have only found out how to negate single chars.
     
    andre75, Dec 27, 2006 IP
  2. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #2
    
    
    '/<table\sclass="blah1">([^<]+)<\/table>/'
    
    
    PHP:
    Try this.

    $matches[1] should hold the wanted content.
     
    nico_swd, Dec 28, 2006 IP
  3. andre75

    andre75 Peon

    Messages:
    1,203
    Likes Received:
    45
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks, but in all honesty I was not really after tables (it was a simple example). My stopping expression is a characteristic sentence (one which stands at the end of a certain paragraph of text).

    Also what about <td> and <tr>. Wouldn't those get stripped somehow by your code? As far as I can tell you won't allow any < characters?
    So basically I would need to negate more than just one character. I tried ^(sentence\sto\sscan\sfor) but that didn't work.
     
    andre75, Dec 28, 2006 IP
  4. andre75

    andre75 Peon

    Messages:
    1,203
    Likes Received:
    45
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I think I found the answer. I added U (where it says /i it now says /iU) to switch to nongreedy pattern matching. Go figure.
     
    andre75, Dec 28, 2006 IP