PHP Regular Expressions Issues with preg_match

ColorWP.com Notable Member

Messages:: 3,120

Likes Received:: 100

Best Answers:: 1

Trophy Points:: 270

#1

Hello.

I am getting a remote page via CURL. Then I try to parse all the URLs which are listed in a following manner:
<option  value="google.com">google.com
HTML:
but also as:
<option style="font-weight:bold; background-color:#00FF00;" value="yahoo.com">yahoo.com
HTML:
and also as:
<option style="font-weight:bold; background-color:#00FF00;" value="digitalpoint.com" name="digitalpoint">digitalpoint.com
HTML:
Meaning that there might or might not be additional junk between "<option" and "value=", and also between the value attribute and the closing tag for <option> .

Currently I use something like this:
$content = get_data("http://mysourcesite.com");
preg_match_all('#value=\"(.*)\"#',$content,$list);
foreach($list[0] as $item) {
echo $item."<br>";
}
PHP:
However, since preg_match_all returns a multidimensional array, I get confused and can't get it working with the code above. Also, the above code returns the whole text as 'value="google.com">' and I want to extract only the 'google.com' part.

ColorWP.com, Jan 16, 2010 IP

JAY6390 Peon

Messages:: 918

Likes Received:: 31

Best Answers:: 0

Trophy Points:: 0

#2

you should use the XML dom and get these values that way really
otherwise you could use the regex
%<option\b[^>]+?\bvalue="\K[^"]+%

JAY6390, Jan 16, 2010 IP

unigogo Peon

Messages:: 286

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 0

#3

Try your regular expression in the PHP regular expression tester.

I think your regexp returns what you want.

unigogo, Jan 16, 2010 IP

joebert Well-Known Member

Messages:: 2,150

Likes Received:: 88

Best Answers:: 0

Trophy Points:: 145

#4

\K isn't available prior to 5.2.4
#<option\s.*value=[\'"](.+)[\'"]#Uis
Code (markup):

joebert, Jan 17, 2010 IP

JAY6390 Peon

Messages:: 918

Likes Received:: 31

Best Answers:: 0

Trophy Points:: 0

#5

Yes I know, but hopefully the OP has an up to date php version as they should

JAY6390, Jan 17, 2010 IP

ColorWP.com Notable Member

Messages:: 3,120

Likes Received:: 100

Best Answers:: 1

Trophy Points:: 270

#6

Yes. The PHP is up to date. Thank you all for the contributions. I have yet to try the expressions and I will post the results here (I could guess they will satisfactory for me).

All contributors will get some green reputation points from me.

ColorWP.com, Jan 17, 2010 IP

ColorWP.com Notable Member

Messages:: 3,120

Likes Received:: 100

Best Answers:: 1

Trophy Points:: 270

#7

The regular expression worked, but displaying the multidimensional array was giving me hell. However, I got it working with this code (two nested foreach() functions are used):

$data = get_data("http://source.com"); // get_data is my custom function that uses CURL to download the page's content
preg_match_all('#option\s.*value=[\'"](.+)[\'"]#',$data,$matches);
$matches = array($matches[0]);
$domains = array();

foreach($matches as $item) {
foreach($item as $items) {
$items = explode("value=\"",$items);
$items = explode("\"",$items[1]);
$items = $items[0];
$domains[]=$items;
// echo $items." inserted into array<br>";
}
}
// Now $domains is a normal array containing all the URL's I need
// To echo all the entries I can use php.net/foreach

foreach($domains as $item) { echo $item; echo "<br>"; }

PHP:

ColorWP.com, Jan 18, 2010 IP

joebert Well-Known Member

Messages:: 2,150

Likes Received:: 88

Best Answers:: 0

Trophy Points:: 145

#8

I have no idea what you were thinking with the following line, but whatever it was there's definitely a cleaner way to do it.
$matches = array($matches[0]);
Code (markup):
Try this just as a test, so you can get a better idea of what your results look like.
$data = get_data("http://source.com"); // get_data is my custom function that uses CURL to download the page's content
preg_match_all('#option\s.*value=[\'"](.+)[\'"]#',$data,$matches);
echo '<pre>', print_r($matches, true), '</pre>';
Code (markup):
Since your call to preg_match_all isn't including the 4th argument, it's using the default return order which is PREG_PATTERN_ORDER. What this means is that $matches[0] contains an array of complete matches, and in this case $matches[1] contains an array of only the match parts contained in the sub-pattern of your pattern.

By the looks of that code you have there, you're really looking for $matches[1].

I bet if you do something like the following, you'll be left with a $matches array that only contains the things you want.
$data = get_data("http://source.com"); // get_data is my custom function that uses CURL to download the page's content
preg_match_all('#option\s.*value=[\'"](.+)[\'"]#',$data,$matches);
$matches = $matches[1];
Code (markup):

joebert, Jan 18, 2010 IP

ColorWP.com Notable Member

Messages:: 3,120

Likes Received:: 100

Best Answers:: 1

Trophy Points:: 270

#9

@joebert: I didn't think of that. I've tried numerous combinations like $matches[0], $matches[0][0] and $matches[0][0], but never thought that it should be $matches[1].

ColorWP.com, Jan 19, 2010 IP

joebert Well-Known Member

Messages:: 2,150

Likes Received:: 88

Best Answers:: 0

Trophy Points:: 145

#10

I've been using the preg_* functions for years and I still have to consult the manual from time to time to refresh my memory about the order results come in.

That \K escape sequence to reset the beginning of the complete matches is looking nice right about now. I never knew about it until this thread.

joebert, Jan 19, 2010 IP

JAY6390 Peon

Messages:: 918

Likes Received:: 31

Best Answers:: 0

Trophy Points:: 0

#11

Yeah I found it purely by chance one day too

JAY6390, Jan 19, 2010 IP

Log in or Sign up

PHP Regular Expressions Issues with preg_match_all

ColorWP.com Notable Member

JAY6390 Peon

unigogo Peon

joebert Well-Known Member

JAY6390 Peon

ColorWP.com Notable Member

ColorWP.com Notable Member

joebert Well-Known Member

ColorWP.com Notable Member

joebert Well-Known Member

JAY6390 Peon

Log in or Sign up

PHP Regular Expressions Issues with preg_match_all

ColorWP.com Notable Member

JAY6390 Peon

unigogo Peon

joebert Well-Known Member

JAY6390 Peon

ColorWP.com Notable Member

ColorWP.com Notable Member

joebert Well-Known Member

ColorWP.com Notable Member

joebert Well-Known Member

JAY6390 Peon

Useful Searches