if I have this pattern $pattern = "@<\s*a\s+href\s*=\s*([\"\'])?(http://([^>\"\']+))\\1.*?>@si"; Code (markup): to match urls but I need to match not only "<a href " urls but also "<a class='something' href" urls where the class definition is between the a and href I thought I could just stick a * in the before the "+href" but that doesn't seem to work, I tired sticking it some other spots but that also didn't seem to work, any ideas?
A messy solution would be to use string replace to remove the class=xxxxx bit before the regex and then add it in again afterwards.
For whats it worth, I am not sure the source of the html you are regexing, but to handle all html you have to parse. try .* instead of just * btw
[ ]*(class='[^\']*')?[ ]* try this (or modify it to work with your situation): $pattern = "@<\s*a\s+(class='[^\']*')?\s*href\s*=\s*([\"\'])?(http://([^>\"\']+))\\1.*?>@si";
yeah but will that work if there is no class should I just be able stick a .* in there somewhere and cover both basis I swear I have stuck .*, (.*), [.*] in every spot that seems to make sense by that way that doesn't seem to work at all
Avoid using .* if the souce your checking has more than the one link on it. Don't use it. It will read to the last occurrence of the ending of your regex statement in the page. (class='[^\']*')? The ? means that the whole thing in () is optional. You may need to change the single ' to " if that's what you're using.