Hello, Could you please explain why this part doesnt work: $bg = preg_match_all("/<TD width=\"5\"><IMG src=\"images\/blue_box_R.gif\" width=5 height=24><\/TD>(.*)<td><table border=\"0\" cellpadding=\"0\" cellspacing=\"0\" width=\"100%\">/U", $str, $res3); I am trying to extract information between <TD width=\"5\"><IMG src=\"images\/blue_box_R.gif\" width=5 height=24><\/TD> and <td><table border=\"0\" cellpadding=\"0\" cellspacing=\"0\" width=\"100%\"> but because that part which have to be extracted contains is a description including html tags I am having problem to extract it, what can I do about that ? thank you
Add an "s" pattern modifier if that ".*" will ever contain a newline. You might also want to add an "i" flag if you don't want upper/lower case letters to make a difference. http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php You should get in the habbit of using a pattern delimiter that suits your pattern. For instance you're using a forward slash in your pattern but not a pound symbol (#) so a pound would make a better delimiter so you don't have to escape characters you don't need to escape. Same goes for the quotes. Since you're not using apostrophes (single quotes) or variables in your pattern, it makes more sense to wrap your pattern with single quotes instead of double quotes so you don't have to escape all of the quotes in your pattern.
to joebert: Thank you , after I added i and s to the end that pattern start working in tool to check patterns (working fine) but it doesnt work in php script: /Product Details<\/TD> <TD width=\"5\"><IMG src=\"images\/blue_box_R.gif\" width=5 height=24><\/TD> <\/tr> <tr> <td colspan=3 valign=top class=\"medium\">(.*)<\/tr> <\/table> <\/td> <\/tr> <tr>/Uis I think its not working in php because of those so many new lines and tabs , what I can do about that ? pattern have been copied from html where those new lines and tabs present
If you have a newline followed by two tabs, replace all 3 characters with a single \s* If you have two nelines in a row, replace them with a single \s* Basicly what \s* means is "whitespace such as tabs, spaces, and newlines zero or more times". '#Product Details</TD>\s*<TD width="5"><IMG src="images/blue_box_R.gif" width=5 height=24></TD>\s*</tr>\s*<tr>\s*<td colspan=3 valign=top class="medium">(.*)</tr>\s*</table>\s*</td>\s*</tr>\s*<tr>#Uis' Code (markup):