parser for HTML

stats Well-Known Member

Messages:: 586

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 110

#1

Hi guys

I am trying to write a little parser function for HTML like this

function getHtmlContent(string $url, string $head, string $tail)

the function should go to the page specified by $url and selectively grab from there ANY content that is surrounded by the FIRST occurance of $head and FIRST occurance of $tail

for example, if i have an html like this:

begin
111 end
begin 222
end begin 333 end ...

it should only grab the "begin \n111 end" at the first pass, OR grab them all at once but put them all in separate array elements.

so at the end i will either end up with "begin \n111 end" or with an array like result[0]="begin \n111 end" , result[1]="begin 222\n end", result[2]="begin 333 end"

The array case is prefferable

Can anyone please help me with this ?

right now i have come up with the folowing code
$url = "http://us2.php.net/preg_match_all";
$html = file_get_contents($url);
$head = "<option";
$tail = "<\/option>";

function getHtmlContent($page, $head, $tail) {
       $regex="/$head(.*\n*)*$tail/";
       preg_match_all($regex, $page, $m);
       return $m[0];
}

foreach ( getHtmlContent($html, $head, $tail) as $match) {
                echo $match;
}
Code (markup):
it works for SOME sites and SOME $head and $tail, but for example with the values above - it won't work

stats, Jul 6, 2007 IP

SuperMarketer Peon

Messages:: 5

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#2

I have no idea what that is.
You are a smart one.

SuperMarketer, Jul 6, 2007 IP

stats Well-Known Member

Messages:: 586

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 110

#3

Thanks for your valuable idea ..

anyone else can please help me ?

I guess i wrote the regexp incorrect in my function .. what i want it to be is a regexp that will match ANYTHING that may be seen on a webpage's code, including all the special symbols and "new lines" and everything else ..

so i wrote (.*\n*)* .. but guess that's not enough

stats, Jul 6, 2007 IP

Barti1987 Well-Known Member

Messages:: 2,703

Likes Received:: 115

Best Answers:: 0

Trophy Points:: 185

#4

$regex="/$head(.*)$tail/s";
PHP:
Source.

Peace,

Barti1987, Jul 6, 2007 IP

Log in or Sign up

parser for HTML

stats Well-Known Member

SuperMarketer Peon

stats Well-Known Member

Barti1987 Well-Known Member

Useful Searches