Need some help with regex

Discussion in 'PHP' started by AHA7, May 22, 2007.

  1. #1
    Hello,

    I am struggling with a regex format and I am starting to lose it :rofl:

    I want to use PHP's preg_match_all() function to search HTML files for <img> and <embed> tages and extract all the src URLs from those tags on a given HTML document. I want to cover all the possibilities and forms that those tages may be formated in.

    Here's an example with all the matches highlighted:

    <html>
    <body>
    <h1>Multimedia Page</h1>
    <img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> this is another flash object <embed
    (there is a newline, a tab and a space characters seperating the rest of this tag from its opening <embed) type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> Here is another image tag <IMG
    (newline)
    (new line and tab)
    (new line)

    SRC="http://ex.com/img.jpg" HEIGHT="10">...
    <body>
    </html>

    The regex in words:

    MATCH THE FOLLOWING: <img (or <IMG) followed by any character (including spaces, tabs newlines with any count) followed by src= (or SRC=) which may be followed by a single or double quotation mark followed by anything (this is the URL part which will be the first set of matches stored in the multi-dimentional array generated by preg_match_all()) followed by an optional single or double quotation mark followed by optional anything :D (including spaces, tabs and newlines with any count) until the first > (not greedy) OR (|) MATCH THE FOLLOWING: the same scenario but this time for the <embed> tag and the URL (anything in regex) after src= as the second set of matches.

    I know that the regex would be only one line long or so, but writing all the above is much simpler, at least to me!
     
    AHA7, May 22, 2007 IP
  2. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #2
    Try this:
    
    preg_match_all('/<(img|embed).*?src\s*=\s*["\']([^"\'<]+)/si', $text, $matches);
    
    echo '<pre>' . print_r($matches[2], true) . '</pre>';
    
    PHP:
     
    nico_swd, May 22, 2007 IP