How to find a particular URL in a Webpage & checking Hyperlink.. Say my site is www.rankingoogle.com ; I want to check whether all back linking sites are properly back linking me or not? I have all the back linking site list in database. I want to pick up every one back linking URL and check by following regex pattern.. Say a common link is <a href="http://www.rankingoogle.com" title="SEO" class='class_name' target='_blank'> SEO Ranking </a> Check the " or ' or SPACE in the link pattern Follwoing a regular expression I want to check it, any kind of generalised hyperlink checking. I have written a regex as follows $regex = "<[a][[:space:]]+([a-zA-Z]*[[:space:]]*=?[[:space:]]*(\"[^\"]*\"¦'[^']*')?[[:space:]]+)*href[[:space:]]*=[[:space:]]*((\"http://(www\\.)?".$murl."/?[[:space:]]*\")¦('http://(www\\.)?".$murl."/?')¦(http://(www\\.)?".$murl."/?))[[:space:]]*([a-zA-Z]*[[:space:]]*=?[[:space:]]*(\"[^\"]*\"¦'[^']*')?[[:space:]]*)*>.+"; But it fails in some cases.. I wanted to write a REGEX which will consider as follows <a(ANY No. of space)(This whole set optional : attribute(space optional)=(space optional)(" or ' optional)(attribute_value optional)(" or ' optional, but single quote or double quote must end with proper match)(ANY No. of space)) href=(ANY No. of space)(" or ' optional)http://(www. optional)URL(trailing slash / optional)(" or ' optional but match with starting " or ')(ANY No. of space optional)(any no. characters spaces except >) This is the main section after that match </a>
try doing it with preg_match, perl regular expressions are more flexible: preg_match("#<a\s*?((\w+\s*?\=\s*?(\".*?\"|\'.*?\')\s*?href=\s*?(\"http://.*?\"|\'http://.*?\').*?>#si", $url, $regs); not sure i followed the whole url, but you can try variations