Hi, I'm trying to extract the URL from an anchor tag using preg_match_all, but it seems to be completely neglecting the matching part after the URL. Here's the preg_match: preg_match_all( '/(href)(\s)*(=)(\"|\'|\s)*(.*?)(\"|\'|\s)*(>)/is', '<a href="http://localhost/?w=title_quote" title=some > localhost2 </a>', $s ); preg_match_all( '/(href)(\s)*(=)(\"|\'|\s)*(.*?)(\"|\'|\s)*(>)/is', '<a href="index.php?w=title_quote" title=some > localhost2 </a>', $s ); Code (markup): They both give: the URL along with: " title= etc. I just want the URL... Could someone please rectify the mistake? Thanks
A few heads up. Anything in parenthesis () you're actually retrieving the subsequent match of - you have parenthesis all over the place, which is why $s is returning so many results. Another one; you're properly escaping double quotation marks ", so congrats, but not single quotation marks The string itself is encased with single quotations, so \' is essentially just ' in your pattern. You need it as \', so \\\' should work. Finally, your pattern needs to be optimiesd. The following should work: preg_match_all( '/href\s*=[\"|\\\'|\s]{1}(.*?)[\"|\\\'|\s]{1}[^>]{0,}>/is', '<a href="http://localhost/?w=title_quote" title=some > localhost2 </a>', $s ); PHP:
Yes, generally it's a much better idea to access such things through DOM, for example using Zend_Dom_Query: http://framework.zend.com/manual/en/zend.dom.query.html A pattern for capturing the href attribute would be something like this: /href="([^"]+)"/
Try the following code $str = '<a href="http://localhost/?w=title_quote" title=some > localhost2 </a>'; $pattern = '#href="([^"]+)"#si'; if (preg_match_all($pattern, $str, $m)) { $matches = $m[1]; print_r($matches); } Code (markup):