Hi! I'm using the file_get_content() to get everything (html) from a url. However I would like to get what's in the <h1> tag. I have read and searched for the DOM-document which seems to be the best way to do this but I'm not sure exaxtly how to do it with PHP. I have seen some tutorials for javascript but I need to write the content to a database so I need to use php.
check following url.. it will answer most of your queries.. http://docstore.mik.ua/orelly/webprog/pcook/ch13_08.htm
Hi.......... You can try this, I hope it'll help you. <?php $myFile = "myfile.html"; $fh = fopen($myFile, 'r'); $htmlData = fread($fh, filesize($myFile)); fclose($fh); /* Get all contents from <h1> .... </h1> */ preg_match_all("/<h1>?.*?<\/h1>/", $htmlData, $matches); print_r($matches); ?> PHP: The Output should be like this............. Array ( [0] => Array ( [0] => <h1>Chroot Bind FreeBSD</h1> [1] => <h1>MySQL on FreeBSD</h1> [2] => <h1>10 Best Linux Distro</h1> [3] => <h1>Top 4 Virtualization Platforms</h1> [4] => <h1>You can access all above information from my blog site </h1> [5] => <h1>www.techbabu.com</h1> ) ) Code (markup): Techbabu -------------------------------------- Dont' just make a website: Make an Impact
Although the answer above is Good you will have to consider if the user has style, class attributes within there tag. If it does you will not return any results. consider looking for just <tab at first, then you can start to get more into array manipulation more. Best of luck
I get it to work with an external .html page with techbabus code.. However I don't understand why this don't work, shouldn't it be basically the same thing.. getting html from a website or getting it from a file. $url = "http://www.example.com"; $testing = file_get_contents($url); /* Get all contents from <h1> .... </h1> */ preg_match_all("/<h1>?.*?<\/h1>/", $testing, $matches); print_r($matches);
$url = "http://www.example.com"; $content = file_get_contents($url); preg_match_all('%<h1>([^<]+)</h1>%s', $content, $matches); echo '<pre>'.prit_r($matches, true).'</pre>'; PHP:
Hi... Its work here, I think your variable $testing is empty or there is nothing about <h1> in it. <?php $url = "http://www.techbabu.com"; $testing = file_get_contents($url); /* Get all contents from <h1> .... </h1> */ preg_match_all("/<h1>?.*?<\/h1>/", $testing, $matches); print_r($matches); PHP: Array ( [0] => Array ( [0] => <h1><a href="http://www.techbabu.com/2009/10/best-10-linux-distros/" rel="bookmark">Best 10 Linux Distros</a></h1> [1] => <h1><a href="http://www.techbabu.com/2009/10/microsoft-windows-7-launched/" rel="bookmark">Microsoft Windows 7 Launched</a></h1> [2] => <h1><a href="http://www.techbabu.com/2009/10/samsung-mobile-phone-t401g/" rel="bookmark">Samsung Mobile Phone – T401G</a></h1> [3] => <h1><a href="http://www.techbabu.com/2009/10/motorola-mobile-phone-dext-mb220/" rel="bookmark">Motorola Mobile Phone – DEXT MB220</a></h1> [4] => <h1><a href="http://www.techbabu.com/2009/10/samsung-mobile-phone-t939-behold-2/" rel="bookmark">Samsung Mobile Phone – T939 Behold 2</a></h1> [5] => <h1><a href="/privacy-policy">Privacy Policy</a> | <a href="/sitemap/">Sitemap</a> | <a href="/contact/">Contact Us</a></h1> ) ) Code (markup):