I want to extract the src attribute from <embed> tag, what regular expression should I use? i've tried: preg_match("/<embed\s[^>]*src=(\�?)([^\" >]*?)\\1[^>]*>(.*)<\/embed>/i", $url, $m) Code (markup):
It's not regular expression but it would do the same trick without blowing a gasket over preg_match. $fpos = stripos($url, "src="); if($fpos !== false) { $nstr = substr($url, $fpos+4); $fpos = strpos($nstr, "\""); if($fpos === false) { $fpos = strpos($nstr, "'"); } $m = substr($nstr, 0, $fpos-1); } PHP: Basically finds src=, snips everything from that and front off, finds the quote ending it, strips everything from that and after that off. Thus remaining with the src url.
Alternately, you can use DOM. <?php $doc = new DOMDocument(); $doc->loadHTML('<html><body><p>Test</p><embed src="html.mov" width="50" height="100" /></body></html>'); $embeds = $doc->getElementsByTagName('embed'); foreach( $embeds as $embed ){ echo $embed->getAttribute('src'); } ?> PHP:
Yep that can work quite well for the purpose, but not everyone has the "DOM/XML" module installed for PHP. However DOM might actually be part of the core libraries now as part of PHP5. Simply save a php script with <? phpinfo(); ?> in it, and run it, you'll see a table with "DOM" over it, if you got it.
Hey My Regex is quite a bit rusty and I would have suggested that if I remebered the '?' To those unfamiliar, placing <embed.+? in there means that it begins with <embed, but 'could' have anything between it and src= (the ? is so that it doesn't have to have something between them, but it might) , and the second ? is because the value of src could be one or more characters. and again on the rest. I guess since we're not validating the url and simply trying to retreive it, (.+?) matt mentioned, makes more sense.