regex question

ferret77 Heretic

Messages:: 5,276

Likes Received:: 230

Best Answers:: 0

Trophy Points:: 0

#1

if I have this pattern
$pattern = "@<\s*a\s+href\s*=\s*([\"\'])?(http://([^>\"\']+))\\1.*?>@si";
Code (markup):
to match urls but I need to match not only "<a href " urls but also "<a class='something' href" urls where the class definition is between the a and href

I thought I could just stick a * in the before the "+href" but that doesn't seem to work, I tired sticking it some other spots but that also didn't seem to work, any ideas?

ferret77, Nov 24, 2005 IP

dave487 Peon

Messages:: 701

Likes Received:: 20

Best Answers:: 0

Trophy Points:: 0

#2

A messy solution would be to use string replace to remove the class=xxxxx bit before the regex and then add it in again afterwards.

dave487, Nov 25, 2005 IP

jbw Peon

Messages:: 343

Likes Received:: 12

Best Answers:: 0

Trophy Points:: 0

#3

For whats it worth, I am not sure the source of the html you are regexing, but to handle all html you have to parse.

try .* instead of just * btw

jbw, Nov 25, 2005 IP

ferret77 Heretic

Messages:: 5,276

Likes Received:: 230

Best Answers:: 0

Trophy Points:: 0

#4

oh shit

i keep thinking I am doing ultraedit expressions instead of real ones

ferret77, Nov 25, 2005 IP

hdpinn Peon

Messages:: 48

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#5

ferret77 said:

class='something'
Click to expand...

[ ]*(class='[^\']*')?[ ]*

try this (or modify it to work with your situation):

$pattern = "@<\s*a\s+(class='[^\']*')?\s*href\s*=\s*([\"\'])?(http://([^>\"\']+))\\1.*?>@si";

hdpinn, Nov 26, 2005 IP

ferret77 Heretic

Messages:: 5,276

Likes Received:: 230

Best Answers:: 0

Trophy Points:: 0

#6

yeah but will that work if there is no class

should I just be able stick a .* in there somewhere and cover both basis

I swear I have stuck .*, (.*), [.*] in every spot that seems to make sense

by that way that doesn't seem to work at all

ferret77, Nov 26, 2005 IP

hdpinn Peon

Messages:: 48

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#7

Avoid using .* if the souce your checking has more than the one link on it. Don't use it. It will read to the last occurrence of the ending of your regex statement in the page.

(class='[^\']*')?

The ? means that the whole thing in () is optional. You may need to change the single ' to " if that's what you're using.

hdpinn, Nov 28, 2005 IP

Log in or Sign up

regex question

ferret77 Heretic

dave487 Peon

jbw Peon

ferret77 Heretic

hdpinn Peon

ferret77 Heretic

hdpinn Peon

Useful Searches