Simple Regular Expression Issue

Discussion in 'PHP' started by ColorWP.com, Oct 20, 2011.

  1. #1
    Hello.

    I am using this pattern to match anchor texts of files in a string:
    $pattern = '#<a href\=\'([^\']*)\'>(.*?)</a>#';
    PHP:
    However, if I have a link which contains a single quote in it's anchor it's not included:
    <a href='http://google.com'>Google's Website</a>
    <!--Note the ' after the word Google-->
    HTML:
    Note that I can not modify the input string, it is acquired from a remote page (e.g. I can't change href=' ' to href=" "), but I have to make this regular expression work.

    Any ideas?
     
    ColorWP.com, Oct 20, 2011 IP
  2. HuggyEssex

    HuggyEssex Member

    Messages:
    297
    Likes Received:
    4
    Best Answers:
    2
    Trophy Points:
    45
    #2
    Have you tried to take the ^\' out and just have it as ^' instead? Or you could just replace the loaded content like $content = str_replace("'",'"',$content);

    Just a thought.
     
    HuggyEssex, Oct 20, 2011 IP
  3. ColorWP.com

    ColorWP.com Notable Member

    Messages:
    3,120
    Likes Received:
    100
    Best Answers:
    1
    Trophy Points:
    270
    #3
    Replacing is not an option because it will replace both the single quotes in the href attribute and the anchor text itself.
     
    ColorWP.com, Oct 20, 2011 IP
  4. P1raten

    P1raten Greenhorn

    Messages:
    11
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    11
    #4
    Why use single quotes in the href attribute?
     
    P1raten, Oct 20, 2011 IP
  5. ColorWP.com

    ColorWP.com Notable Member

    Messages:
    3,120
    Likes Received:
    100
    Best Answers:
    1
    Trophy Points:
    270
    #5
    Again, I don't input the string. It's a block of HTML code which I scrape from a remote page. It is formatted this way: single quotes for the href attribute and single quotes (occasionally) in the anchor. I need a way to find only the anchors.
     
    ColorWP.com, Oct 20, 2011 IP
  6. ColorWP.com

    ColorWP.com Notable Member

    Messages:
    3,120
    Likes Received:
    100
    Best Answers:
    1
    Trophy Points:
    270
    #6
    It seems like the solution was very simple. Adding a regular expression modifier to the end - m.
    $pattern = '#<a href\=\'([^\']*)\'>(.*?)</a>#im'; // the "i" modifier in the end is for case-insensitive match, "m" is for multiple occurances
    PHP:
    This regular expression cheat sheet was of huge help:
    http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
     
    ColorWP.com, Oct 21, 2011 IP