Hello, I am struggling with a regex format and I am starting to lose it :rofl: I want to use PHP's preg_match_all() function to search HTML files for <img> and <embed> tages and extract all the src URLs from those tags on a given HTML document. I want to cover all the possibilities and forms that those tages may be formated in. Here's an example with all the matches highlighted: <html> <body> <h1>Multimedia Page</h1> <img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> this is another flash object <embed (there is a newline, a tab and a space characters seperating the rest of this tag from its opening <embed) type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> Here is another image tag <IMG (newline) (new line and tab) (new line) SRC="http://ex.com/img.jpg" HEIGHT="10">... <body> </html> The regex in words: MATCH THE FOLLOWING: <img (or <IMG) followed by any character (including spaces, tabs newlines with any count) followed by src= (or SRC=) which may be followed by a single or double quotation mark followed by anything (this is the URL part which will be the first set of matches stored in the multi-dimentional array generated by preg_match_all()) followed by an optional single or double quotation mark followed by optional anything (including spaces, tabs and newlines with any count) until the first > (not greedy) OR (|) MATCH THE FOLLOWING: the same scenario but this time for the <embed> tag and the URL (anything in regex) after src= as the second set of matches. I know that the regex would be only one line long or so, but writing all the above is much simpler, at least to me!
Try this: preg_match_all('/<(img|embed).*?src\s*=\s*["\']([^"\'<]+)/si', $text, $matches); echo '<pre>' . print_r($matches[2], true) . '</pre>'; PHP: