PHP Regular Expressions Issues with preg_match_all

Discussion in 'PHP' started by ColorWP.com, Jan 16, 2010.

  1. #1
    Hello.

    I am getting a remote page via CURL. Then I try to parse all the URLs which are listed in a following manner:
    <option  value="google.com">google.com
    HTML:
    but also as:
    <option style="font-weight:bold; background-color:#00FF00;" value="yahoo.com">yahoo.com
    HTML:
    and also as:
    <option style="font-weight:bold; background-color:#00FF00;" value="digitalpoint.com" name="digitalpoint">digitalpoint.com
    HTML:
    Meaning that there might or might not be additional junk between "<option" and "value=", and also between the value attribute and the closing tag for <option> .

    Currently I use something like this:
    $content = get_data("http://mysourcesite.com");
    preg_match_all('#value=\"(.*)\"#',$content,$list);
    foreach($list[0] as $item) {
    echo $item."<br>";
    }
    
    PHP:
    However, since preg_match_all returns a multidimensional array, I get confused and can't get it working with the code above. Also, the above code returns the whole text as 'value="google.com">' and I want to extract only the 'google.com' part.
     
    ColorWP.com, Jan 16, 2010 IP
  2. JAY6390

    JAY6390 Peon

    Messages:
    918
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    0
    #2
    you should use the XML dom and get these values that way really
    otherwise you could use the regex
    %<option\b[^>]+?\bvalue="\K[^"]+%
     
    JAY6390, Jan 16, 2010 IP
  3. unigogo

    unigogo Peon

    Messages:
    286
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #3
    unigogo, Jan 16, 2010 IP
  4. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #4
    \K isn't available prior to 5.2.4

    #<option\s.*value=[\'"](.+)[\'"]#Uis
    Code (markup):
     
    joebert, Jan 17, 2010 IP
  5. JAY6390

    JAY6390 Peon

    Messages:
    918
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Yes I know, but hopefully the OP has an up to date php version as they should :)
     
    JAY6390, Jan 17, 2010 IP
  6. ColorWP.com

    ColorWP.com Notable Member

    Messages:
    3,120
    Likes Received:
    100
    Best Answers:
    1
    Trophy Points:
    270
    #6
    Yes. The PHP is up to date. Thank you all for the contributions. I have yet to try the expressions and I will post the results here (I could guess they will satisfactory for me).

    All contributors will get some green reputation points from me. ;)
     
    ColorWP.com, Jan 17, 2010 IP
  7. ColorWP.com

    ColorWP.com Notable Member

    Messages:
    3,120
    Likes Received:
    100
    Best Answers:
    1
    Trophy Points:
    270
    #7
    The regular expression worked, but displaying the multidimensional array was giving me hell. However, I got it working with this code (two nested foreach() functions are used):

    $data = get_data("http://source.com"); // get_data is my custom function that uses CURL to download the page's content
    preg_match_all('#option\s.*value=[\'"](.+)[\'"]#',$data,$matches);
    $matches = array($matches[0]);
    $domains = array();
    
    foreach($matches as $item) {
    foreach($item as $items) {
    $items = explode("value=\"",$items);
    $items = explode("\"",$items[1]);
    $items = $items[0];
    $domains[]=$items;
    // echo $items." inserted into array<br>";
    }
    }
    // Now $domains is a normal array containing all the URL's I need
    // To echo all the entries I can use php.net/foreach
    
    foreach($domains as $item) { echo $item; echo "<br>"; }
    PHP:
     
    ColorWP.com, Jan 18, 2010 IP
  8. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #8
    I have no idea what you were thinking with the following line, but whatever it was there's definitely a cleaner way to do it.

    $matches = array($matches[0]);
    Code (markup):
    Try this just as a test, so you can get a better idea of what your results look like.

    $data = get_data("http://source.com"); // get_data is my custom function that uses CURL to download the page's content
    preg_match_all('#option\s.*value=[\'"](.+)[\'"]#',$data,$matches);
    echo '<pre>', print_r($matches, true), '</pre>';
    Code (markup):
    Since your call to preg_match_all isn't including the 4th argument, it's using the default return order which is PREG_PATTERN_ORDER. What this means is that $matches[0] contains an array of complete matches, and in this case $matches[1] contains an array of only the match parts contained in the sub-pattern of your pattern.

    By the looks of that code you have there, you're really looking for $matches[1].

    I bet if you do something like the following, you'll be left with a $matches array that only contains the things you want.

    $data = get_data("http://source.com"); // get_data is my custom function that uses CURL to download the page's content
    preg_match_all('#option\s.*value=[\'"](.+)[\'"]#',$data,$matches);
    $matches = $matches[1];
    Code (markup):
     
    joebert, Jan 18, 2010 IP
  9. ColorWP.com

    ColorWP.com Notable Member

    Messages:
    3,120
    Likes Received:
    100
    Best Answers:
    1
    Trophy Points:
    270
    #9
    @joebert: I didn't think of that. I've tried numerous combinations like $matches[0], $matches[0][0] and $matches[0][0], but never thought that it should be $matches[1].
     
    ColorWP.com, Jan 19, 2010 IP
  10. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #10
    I've been using the preg_* functions for years and I still have to consult the manual from time to time to refresh my memory about the order results come in.

    That \K escape sequence to reset the beginning of the complete matches is looking nice right about now. I never knew about it until this thread.
     
    joebert, Jan 19, 2010 IP
  11. JAY6390

    JAY6390 Peon

    Messages:
    918
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Yeah I found it purely by chance one day too
     
    JAY6390, Jan 19, 2010 IP