How to find a particular URL's hyperlink in a remote Webpage / URL

Discussion in 'PHP' started by php_freelancer, Feb 1, 2008.

  1. #1
    How to find a particular URL in a Webpage & checking Hyperlink..

    Say my site is www.rankingoogle.com ; I want to check whether all back linking sites are properly back linking me or not? I have all the back linking site list in database. I want to pick up every one back linking URL and check by following regex pattern..

    Say a common link is <a href="http://www.rankingoogle.com" title="SEO" class='class_name' target='_blank'> SEO Ranking </a>

    Check the " or ' or SPACE in the link pattern

    Follwoing a regular expression I want to check it, any kind of generalised hyperlink checking. I have written a regex as follows

    $regex = "<[a][[:space:]]+([a-zA-Z]*[[:space:]]*=?[[:space:]]*(\"[^\"]*\"¦'[^']*')?[[:space:]]+)*href[[:space:]]*=[[:space:]]*((\"http://(www\\.)?".$murl."/?[[:space:]]*\")¦('http://(www\\.)?".$murl."/?')¦(http://(www\\.)?".$murl."/?))[[:space:]]*([a-zA-Z]*[[:space:]]*=?[[:space:]]*(\"[^\"]*\"¦'[^']*')?[[:space:]]*)*>.+";

    But it fails in some cases..

    I wanted to write a REGEX which will consider as follows
    <a(ANY No. of space)(This whole set optional : attribute(space optional)=(space optional)(" or ' optional)(attribute_value optional)(" or ' optional, but single quote or double quote must end with proper match)(ANY No. of space)) href=(ANY No. of space)(" or ' optional)http://(www. optional)URL(trailing slash / optional)(" or ' optional but match with starting " or ')(ANY No. of space optional)(any no. characters spaces except >)

    This is the main section after that match </a>
     
    php_freelancer, Feb 1, 2008 IP
  2. php_freelancer

    php_freelancer Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    In the above $murl = "rankingoogle.com"
     
    php_freelancer, Feb 1, 2008 IP
  3. zerostar07

    zerostar07 Peon

    Messages:
    34
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    try doing it with preg_match, perl regular expressions are more flexible:

    preg_match("#<a\s*?((\w+\s*?\=\s*?(\".*?\"|\'.*?\')\s*?href=\s*?(\"http://.*?\"|\'http://.*?\').*?>#si", $url, $regs);

    not sure i followed the whole url, but you can try variations
     
    zerostar07, Feb 2, 2008 IP