Any generic regular expression for links

Discussion in 'PHP' started by indianseo, Feb 20, 2008.

  1. #1
    Hi,

    Is there a generic regular expression which will match all URLs in anchor tags taking into consideration the quotes (',") and links without quotes?
     
    indianseo, Feb 20, 2008 IP
  2. mvl

    mvl Peon

    Messages:
    147
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #2
    
    <?php
        $txt="blabla <A HREF='http://www.bla.com'>bla</a>skkddkdl<a href=http://bla.mobi/bladiebla.html>hjgkdllls</a>
    
    jgfjdhdkd<a HREF=\"http://www.bla.nl/\" title=\"bla\">hgjkdldl</a>jfjfj";
    
        $pattern="/\<a[^\>]*href=([\'\"]?)([^\1\ \>]*)/i";
        preg_match_all($pattern, $txt, $matches);
        var_dump($matches[2]);
    
    
    PHP:
    This will output:
    explanation of the pattern:

    $pattern="/\<a[^\>]*href=([\'\"]?)([^\1\ \>]*)/i";

    \<a matches the opening of an anchor tag.
    [^\>]* matches zero or more occurences of characters not matching the closing bracket of a tag (>)
    ([\'\"]?) matches either zero or one single quote or zero or one double quote ( ' or " ) this is match number 1 for backreference used later (backreference is done by matching \1)
    ([^\1\ \>]*) matches the url. It says: "any number of consecutive characters not containing the first match ( ' or " or nothing) and not containing a space or a tag closing bracket ( > )
    i tells us to match case-insensitive

    Effectively $matches[2] will contain all urls inside anchor tags, either enclosed in single or double quotes, or in no quotes at all.
     
    mvl, Feb 20, 2008 IP