regex question

Discussion in 'PHP' started by ferret77, Nov 24, 2005.

  1. #1
    if I have this pattern

    
    $pattern = "@<\s*a\s+href\s*=\s*([\"\'])?(http://([^>\"\']+))\\1.*?>@si";
    Code (markup):
    to match urls but I need to match not only "<a href " urls but also "<a class='something' href" urls where the class definition is between the a and href

    I thought I could just stick a * in the before the "+href" but that doesn't seem to work, I tired sticking it some other spots but that also didn't seem to work, any ideas?
     
    ferret77, Nov 24, 2005 IP
  2. dave487

    dave487 Peon

    Messages:
    701
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    0
    #2
    A messy solution would be to use string replace to remove the class=xxxxx bit before the regex and then add it in again afterwards.
     
    dave487, Nov 25, 2005 IP
  3. jbw

    jbw Peon

    Messages:
    343
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #3
    For whats it worth, I am not sure the source of the html you are regexing, but to handle all html you have to parse.

    try .* instead of just * btw
     
    jbw, Nov 25, 2005 IP
  4. ferret77

    ferret77 Heretic

    Messages:
    5,276
    Likes Received:
    230
    Best Answers:
    0
    Trophy Points:
    0
    #4
    oh shit

    i keep thinking I am doing ultraedit expressions instead of real ones
     
    ferret77, Nov 25, 2005 IP
  5. hdpinn

    hdpinn Peon

    Messages:
    48
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    [ ]*(class='[^\']*')?[ ]*

    try this (or modify it to work with your situation):

    $pattern = "@<\s*a\s+(class='[^\']*')?\s*href\s*=\s*([\"\'])?(http://([^>\"\']+))\\1.*?>@si";
     
    hdpinn, Nov 26, 2005 IP
  6. ferret77

    ferret77 Heretic

    Messages:
    5,276
    Likes Received:
    230
    Best Answers:
    0
    Trophy Points:
    0
    #6
    yeah but will that work if there is no class

    should I just be able stick a .* in there somewhere and cover both basis

    I swear I have stuck .*, (.*), [.*] in every spot that seems to make sense

    by that way that doesn't seem to work at all
     
    ferret77, Nov 26, 2005 IP
  7. hdpinn

    hdpinn Peon

    Messages:
    48
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Avoid using .* if the souce your checking has more than the one link on it. Don't use it. It will read to the last occurrence of the ending of your regex statement in the page.

    (class='[^\']*')?

    The ? means that the whole thing in () is optional. You may need to change the single ' to " if that's what you're using.
     
    hdpinn, Nov 28, 2005 IP