Hi guys I am trying to write a little parser function for HTML like this function getHtmlContent(string $url, string $head, string $tail) the function should go to the page specified by $url and selectively grab from there ANY content that is surrounded by the FIRST occurance of $head and FIRST occurance of $tail for example, if i have an html like this: begin 111 end begin 222 end begin 333 end ... it should only grab the "begin \n111 end" at the first pass, OR grab them all at once but put them all in separate array elements. so at the end i will either end up with "begin \n111 end" or with an array like result[0]="begin \n111 end" , result[1]="begin 222\n end", result[2]="begin 333 end" The array case is prefferable Can anyone please help me with this ? right now i have come up with the folowing code $url = "http://us2.php.net/preg_match_all"; $html = file_get_contents($url); $head = "<option"; $tail = "<\/option>"; function getHtmlContent($page, $head, $tail) { $regex="/$head(.*\n*)*$tail/"; preg_match_all($regex, $page, $m); return $m[0]; } foreach ( getHtmlContent($html, $head, $tail) as $match) { echo $match; } Code (markup): it works for SOME sites and SOME $head and $tail, but for example with the values above - it won't work
Thanks for your valuable idea .. anyone else can please help me ? I guess i wrote the regexp incorrect in my function .. what i want it to be is a regexp that will match ANYTHING that may be seen on a webpage's code, including all the special symbols and "new lines" and everything else .. so i wrote (.*\n*)* .. but guess that's not enough