Hello. I am getting a remote page via CURL. Then I try to parse all the URLs which are listed in a following manner: <option value="google.com">google.com HTML: but also as: <option style="font-weight:bold; background-color:#00FF00;" value="yahoo.com">yahoo.com HTML: and also as: <option style="font-weight:bold; background-color:#00FF00;" value="digitalpoint.com" name="digitalpoint">digitalpoint.com HTML: Meaning that there might or might not be additional junk between "<option" and "value=", and also between the value attribute and the closing tag for <option> . Currently I use something like this: $content = get_data("http://mysourcesite.com"); preg_match_all('#value=\"(.*)\"#',$content,$list); foreach($list[0] as $item) { echo $item."<br>"; } PHP: However, since preg_match_all returns a multidimensional array, I get confused and can't get it working with the code above. Also, the above code returns the whole text as 'value="google.com">' and I want to extract only the 'google.com' part.
you should use the XML dom and get these values that way really otherwise you could use the regex %<option\b[^>]+?\bvalue="\K[^"]+%
Try your regular expression in the PHP regular expression tester. I think your regexp returns what you want.
Yes. The PHP is up to date. Thank you all for the contributions. I have yet to try the expressions and I will post the results here (I could guess they will satisfactory for me). All contributors will get some green reputation points from me.
The regular expression worked, but displaying the multidimensional array was giving me hell. However, I got it working with this code (two nested foreach() functions are used): $data = get_data("http://source.com"); // get_data is my custom function that uses CURL to download the page's content preg_match_all('#option\s.*value=[\'"](.+)[\'"]#',$data,$matches); $matches = array($matches[0]); $domains = array(); foreach($matches as $item) { foreach($item as $items) { $items = explode("value=\"",$items); $items = explode("\"",$items[1]); $items = $items[0]; $domains[]=$items; // echo $items." inserted into array<br>"; } } // Now $domains is a normal array containing all the URL's I need // To echo all the entries I can use php.net/foreach foreach($domains as $item) { echo $item; echo "<br>"; } PHP:
I have no idea what you were thinking with the following line, but whatever it was there's definitely a cleaner way to do it. $matches = array($matches[0]); Code (markup): Try this just as a test, so you can get a better idea of what your results look like. $data = get_data("http://source.com"); // get_data is my custom function that uses CURL to download the page's content preg_match_all('#option\s.*value=[\'"](.+)[\'"]#',$data,$matches); echo '<pre>', print_r($matches, true), '</pre>'; Code (markup): Since your call to preg_match_all isn't including the 4th argument, it's using the default return order which is PREG_PATTERN_ORDER. What this means is that $matches[0] contains an array of complete matches, and in this case $matches[1] contains an array of only the match parts contained in the sub-pattern of your pattern. By the looks of that code you have there, you're really looking for $matches[1]. I bet if you do something like the following, you'll be left with a $matches array that only contains the things you want. $data = get_data("http://source.com"); // get_data is my custom function that uses CURL to download the page's content preg_match_all('#option\s.*value=[\'"](.+)[\'"]#',$data,$matches); $matches = $matches[1]; Code (markup):
@joebert: I didn't think of that. I've tried numerous combinations like $matches[0], $matches[0][0] and $matches[0][0], but never thought that it should be $matches[1].
I've been using the preg_* functions for years and I still have to consult the manual from time to time to refresh my memory about the order results come in. That \K escape sequence to reset the beginning of the complete matches is looking nice right about now. I never knew about it until this thread.