I need to know the formula to find all where the class attribute is equal to g-pb25 There must be something wrong with this that it doesn't work: preg_match_all('|CLASS="g-pb25">|U', $contents_of_page, $classout, PREG_SET_ORDER); PHP:
I'm a bit confused. What exactly are you trying to achieve here? Can you provide the rest of the code that you have?
Ok. I have two preg_match_alls. The one that works is blocked with //. The other, for some reason, doesn't work: //CLASS echo "<table style=\"border: 1px solid black;\"> <tr> <td>"; preg_match_all('#class=\"g-pb25\">(.+?)<#Ui', $contents_of_page, $classout, PREG_SET_ORDER); //print_r($classout)."\n"; /* foreach ($classout as $val) { echo "matched: " . $val[0] . "\n"; echo "part 1: " . $val[1] . "\n"; echo "part 2: " . $val[3] . "\n"; echo "part 3: " . $val[4] . "\n\n"; } */ echo " </td> </tr> </table>"; echo "<table style=\"border: 1px solid red;\"> <tr> <td>"; preg_match_all('#class=\"g-asm\">(.+?)<#Ui', $contents_of_page, $classgasm, PREG_SET_ORDER); print_r($classgasm)."\n"; foreach ($classgasm as $val) { echo "<span style:\"color: red;\">matched: " . $val[0] . "</span>\n"; echo "part 1: " . $val[1] . "\n"; echo "part 2: " . $val[3] . "\n"; echo "part 3: " . $val[4] . "\n\n"; } echo " </td> </tr> </table>"; PHP: The foreach doesn't work either.
Firstly I don't know what your content looks like, but add this to the top of your script $contents_of_page = '<span CLASS="g-pb25">this is the content</span> <span CLASS="g-asm">this is the other content</span>'; Code (markup): both of your syntax are correct, so I think its where you are grabbing the content for your "$contents_of_page" information from. Let me know how it turns out with the test string above
$contents_of_page = '<span CLASS="mbg">this is the content</span><span CLASS="g-asm">this is the other content</span>'; echo "<table style=\"border: 1px solid black;\"> <tr> <td>"; preg_match_all('#class=\"mbg\">(.+?)<#Ui', $contents_of_page, $classmbg, PREG_SET_ORDER); print_r($classmbg)."\n\n"; echo " </td> </tr> </table>"; echo "<table style=\"border: 1px solid red;\"> <tr> <td>"; preg_match_all('#class=\"g-asm\">(.+?)<#Ui', $contents_of_page, $classgasm, PREG_SET_ORDER); print_r($classgasm)."\n"; echo " </td> </tr> </table>"; PHP: It seems to be ignoring the selected tags.It's looking for the contents of every tag.
This would work too: CLASS="g-pb25">(.+)<\/ the one in (.*) should get you what's inside CLASS="g-pb25", although this will not span multiple lines
Get rid of the U modifier you used. I think you meant to use the lowercase u modifier for UTF-8 compatibility, but if not just remove the ? from inside your grouping So change: preg_match_all('#class="mbg">(.+?)<#Ui', $contents_of_page, $classmbg, PREG_SET_ORDER); To: preg_match_all('#class="mbg">(.+?)<#ui', $contents_of_page, $classmbg, PREG_SET_ORDER); or this: preg_match_all('#class="mbg">(.+)<#Ui', $contents_of_page, $classmbg, PREG_SET_ORDER); and this: preg_match_all('#class=\"g-asm\">(.+?)<#Ui', $contents_of_page, $classgasm, PREG_SET_ORDER); to this: preg_match_all('#class=\"g-asm\">(.+?)<#ui', $contents_of_page, $classgasm, PREG_SET_ORDER); or this: preg_match_all('#class=\"g-asm\">(.+)<#Ui', $contents_of_page, $classgasm, PREG_SET_ORDER); Code (markup): Just a matter of being greedy I guess
preg_match_all('#<(\w+) +class="mbg">(.+?)</\1>#is', $contents_of_page, $classmbg, PREG_SET_ORDER); PHP: Try this...
I wonder why it's doing in two arrays (one array within another): Array ( [0] => Array ( [0] => class="mbg"><a href="http://.../wrath_of_the_lamb_rev616">wrath_of_the_lamb_rev616< [1] => </a><a href="http://.../wrath_of_the_lamb_rev616">wrath_of_the_lamb_rev616 ) [1] => Array ( [0] => class="mbg"></a><a href="http://.../80s_toyz/">80s_toyz< [1] => </a><a href="http://.../80s_toyz/">80s_toyz ) [2] => Array ( [0] => class="mbg"></a><a href="http://.../80s_toyz/">80s_toyz< [1] => </a><a href="http://.../80s_toyz/">80s_toyz ) [3] => Array ( [0] => class="mbg"></a><a href="http://.../80s_toyz/">80s_toyz< [1] => </a><a href="http://.../80s_toyz/">80s_toyz ) Code (markup): From this php code: $contents_of_page = file_get_contents($booklink); //$contents_of_page = '<span class="mbg">this is the content</span><span class="g-asm">This is the other content</span>'; echo "<table style=\"border: 1px solid black;\"> <tr> <td>"; preg_match_all('#class=\"mbg\">(.+?)<#i', $contents_of_page, $classmbg, PREG_SET_ORDER); print_r($classmbg)."\n"; echo " </td> </tr> </table>"; echo "<table style=\"border: 1px solid red;\"> <tr> <td>"; preg_match_all('#class=\"g-asm\">(.+?)<#i', $contents_of_page, $classgasm, PREG_SET_ORDER); print_r($classgasm)."\n"; echo " </td> </tr> </table>"; PHP:
yours worked out: $contents_of_page = file_get_contents($booklink); //$contents_of_page = '<span class="mbg">this is the content</span><span class="g-asm">This is the other content</span>'; echo "<table style=\"border: 1px solid black;\"> <tr> <td>"; preg_match_all('#<(\w+) +class="mbg">(.+?)</\1>#is', $contents_of_page, $classmbg, PREG_SET_ORDER); print_r($classmbg)."\n"; echo " </td> </tr> </table>"; PHP: this is the result: qwer0230 ( 171 ) [1] => div [2] => qwer0230 ) Code (markup): There are a tags around these which I want to strip.
hmm... can you give the example of the string you want to match? it's hard to do it without actually seeing the string...
nah, since you didn't give the example of string, I don't know how can I help you more, the result you copied is pretty weird... Well, you can eliminate any html tags using strip_tags function... I tried using this string, and it worked... I hope the string you want to match is similar to this.. $contents_of_page = '<span class="mbg"><a href="">content1</a></span><span class="mbg"><a href="">content2</a></span>'; preg_match_all('#<(\w+) +class="mbg">(.+?)</\1>#is', $contents_of_page, $classmbg, PREG_SET_ORDER); foreach ($classmbg as $mbg) { echo $mbg[2].": "; // strip tags echo strip_tags($mbg[2])."<br/>"; } PHP:
Oh, well, I do a little search and I think I know what you were tried to accomplish... Here is the code I just write: preg_match_all('#<(\w+) +class="mbg"> *<a[^>]*>(.+?)</a>(.*?)</\1>#is', $contents_of_page, $classmbg, PREG_SET_ORDER); // grab it foreach ($classmbg as $mbg) { echo "Matched: ". $mbg[0] ."<br/>"; echo "The starting tags: ". $mbg[1] ."<br />"; echo "The string we want: ". $mbg[2] ."<br/>"; echo "The rest of the string: ". $mbg[3] ."<br/>"; // want to grab the [2] only, here... $newclassmbg[] = $mbg[2]; } print_r($newclassmbg); PHP: Hope that works...
Works . But I wanted to do the same thing but this time with this: /*******************************username**********************************************************************************/ preg_match_all('#<(\w+) +class="mbg"> *<a[^>]*>(.+?)</a>(.*?)</\1>#is', $contents_of_page, $classmbg, PREG_SET_ORDER); // grab it $i = 1; foreach ($classmbg as $mbg) { //echo "Matched: ". $mbg[0] ."<br/>"; //echo "The starting tags: ". $mbg[1] ."<br />"; echo "<span style=\"font-weight: bold;\">".$i." Username:</span> ". $mbg[2] ."<br/>"; //echo "The rest of the string: ". $mbg[3] ."<br/>"; // want to grab the [2] only, here... //$newclassmbg[] = $mbg[2]; $i++; } print_r($newclassmbg); /*******************************item name**********************************************************************************/ preg_match_all('#<(\w+) +class="g-asm"> *<a[^>]*>(.+?)</a>(.*?)</\1>#is', $contents_of_page, $classgasm, PREG_SET_ORDER); // grab it $i = 1; foreach ($classgasm as $gasm) { //echo "Matched: ". $gasm[0] ."<br/>"; //echo "The starting tags: ". $gasm[1] ."<br />"; echo "<span style=\"font-weight: bold;\">".$i." Item name:</span> ". $gasm[2] ."<br/>"; //echo "The rest of the string: ". $gasm[3] ."<br/>"; // want to grab the [2] only, here... //$newclassmbg[] = $gasm[2]; $i++; } print_r($newclassgasm); PHP: class=g-asm but it's returning void. Actually the class name is in the a tag: <a class="g-asm"
It would be simpler, try this one: preg_match_all('#<a[^>]*class="g-asm"[^>]*>(.+?)</a>#is', $contents_of_page, $classgasm, PREG_SET_ORDER); foreach ($classgasm as $gasm) { echo "Grab: ". $gasm[1] ."<br/>"; } PHP: Hope that works..
Right on ...and how about a href of the a tag? How can I strip off everything except the href? How can I learn all these regexps?
There is a lot of source to learn regex in Google, try some search... Here I give some explanation on previous regex, #<a[^>]*class="g-asm"[^>]*>(.+?)</a>#is Code (markup): The both # is for determine the starting and ending of the regex, is in the end is to tell it to match not case sensitively and to treat it as single line, so it will include newline to match. There is some case where people write many newline in html and so this is important. <a[^>]*class="g-asm"[^>]*> this match <a, the [^>]* will match any character excepts >, so it make sure we don't get out of the scope we want. class="g-asm" simply match as is. And we end it with >, so this regex will match the entire <a...>. (.+?), this will group and match any character, the ? is to tell that it want to match as less as possible until the next expression found. When we group it, we will able to get it values later using \1 (or 2, 3, etc respectively) in the same expression and also in array we passed to preg_match_all. See the difference between + and *, + will match for 1 or more character, and * will match 0 or more character. </a> also simply match as is. Sorry, I'm not a good english speaker... but hope you can understand that...