OK, So I want to be able to enter a url in some form and treat it with php/dom. I want to extract certain informations from it based on html tag. For example, I want to enter a url and extract the text content of the H1 tag. Im running in 2 difficulties: 1) I extract the whole h1 tag but just want the innerHTML. 2) If my H1 is linked inside, I extract the link <a href...> tag also. Im using this parser code: http://simplehtmldom.sourceforge.net/manual.htm Here is my code so far: <?php include("simple_html_dom.php"); $html = file_get_html("http://".$_POST["url"]); foreach($html->find('H1') as $element); echo $element; ?> Code (markup): Any help is appreciated as im trying to learn all this as an experiment project.
Hi, I'm not sure If this might help. Why use jQuery to access and manipulate the client side scripts.
jQuey is a good way to handle this... also you can use XPATH which will treat the DOM as XML and elements.
You might also check into using PHP file_get_contents. You can use strpos to pass in the start and stop information.
Seems like an interesting object: To get the InnerHTML: echo $element->innertext; PHP: To get the outer Href, I would search the parent src attr: echo $element->parent()->href; PHP:
Alright I just tested this and it works. Enjoy: <?php$content = file_get_contents("http://" . $_POST['url']); $start_limiter = '<h1>';$end_limiter = '</h1>'; $start_pos = strpos($content,$start_limiter);if ($start_pos === FALSE){die("Starting limiter ".$start_limiter." not found ");} $end_pos = strpos($content,$end_limiter,$start_pos); if ($end_pos === FALSE){die("Ending limiter ".$end_limiter." not found ");} $h1tag= substr($content, $start_pos, ($end_pos)-$start_pos);echo $h1tag;?> PHP: