finding a tag in html document

theblackjacker Peon

Messages:: 52

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#1

Hi!

I'm using the file_get_content() to get everything (html) from a url. However I would like to get what's in the <h1> tag.

I have read and searched for the DOM-document which seems to be the best way to do this but I'm not sure exaxtly how to do it with PHP.

I have seen some tutorials for javascript but I need to write the content to a database so I need to use php.

theblackjacker, Oct 23, 2009 IP

mastermunj Well-Known Member

Messages:: 687

Likes Received:: 13

Best Answers:: 0

Trophy Points:: 110

#2

check following url.. it will answer most of your queries..

http://docstore.mik.ua/orelly/webprog/pcook/ch13_08.htm

mastermunj, Oct 23, 2009 IP

techbabu Peon

Messages:: 20

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

theblackjacker said: ↑

Hi!

I'm using the file_get_content() to get everything (html) from a url. However I would like to get what's in the <h1> tag.

I have read and searched for the DOM-document which seems to be the best way to do this but I'm not sure exaxtly how to do it with PHP.

I have seen some tutorials for javascript but I need to write the content to a database so I need to use php.
Click to expand...

Hi..........

You can try this, I hope it'll help you.
<?php

$myFile = "myfile.html";
$fh = fopen($myFile, 'r');
$htmlData = fread($fh, filesize($myFile));
fclose($fh);

/* Get all contents from <h1> .... </h1> */
preg_match_all("/<h1>?.*?<\/h1>/", $htmlData, $matches);
print_r($matches);

?>
PHP:
The Output should be like this.............
Array
(
    [0] => Array
        (
            [0] => <h1>Chroot Bind FreeBSD</h1>
            [1] => <h1>MySQL on FreeBSD</h1>
            [2] => <h1>10 Best Linux Distro</h1>
            [3] => <h1>Top 4 Virtualization Platforms</h1>
            [4] => <h1>You can access all above information from my blog site </h1>
            [5] => <h1>www.techbabu.com</h1>
        )

)
Code (markup):
Techbabu
--------------------------------------
Dont' just make a website: Make an Impact

techbabu, Oct 23, 2009 IP

FCM Well-Known Member

Messages:: 669

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 155

#4

Although the answer above is Good you will have to consider if the user has style, class attributes within there tag. If it does you will not return any results.

consider looking for just <tab

at first, then you can start to get more into array manipulation more.

Best of luck

FCM, Oct 24, 2009 IP

theblackjacker Peon

Messages:: 52

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#5

I get it to work with an external .html page with techbabus code..

However I don't understand why this don't work, shouldn't it be basically the same thing.. getting html from a website or getting it from a file.

$url = "http://www.example.com";
$testing = file_get_contents($url);

/* Get all contents from <h1> .... </h1> */
preg_match_all("/<h1>?.*?<\/h1>/", $testing, $matches);
print_r($matches);

theblackjacker, Oct 24, 2009 IP

JAY6390 Peon

Messages:: 918

Likes Received:: 31

Best Answers:: 0

Trophy Points:: 0

#6

$url = "http://www.example.com";
$content = file_get_contents($url);
preg_match_all('%<h1>([^<]+)</h1>%s', $content, $matches);
echo '<pre>'.prit_r($matches, true).'</pre>';

PHP:

JAY6390, Oct 24, 2009 IP

techbabu Peon

Messages:: 20

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#7

theblackjacker said: ↑

I get it to work with an external .html page with techbabus code..

However I don't understand why this don't work, shouldn't it be basically the same thing.. getting html from a website or getting it from a file.

$url = "http://www.example.com";
$testing = file_get_contents($url);

/* Get all contents from <h1> .... </h1> */
preg_match_all("/<h1>?.*?<\/h1>/", $testing, $matches);
print_r($matches);
Click to expand...

Hi...

Its work here, I think your variable $testing is empty or there is nothing about <h1> in it.
<?php

$url = "http://www.techbabu.com";
$testing = file_get_contents($url);

/* Get all contents from <h1> .... </h1> */
preg_match_all("/<h1>?.*?<\/h1>/", $testing, $matches);
print_r($matches);
PHP:
Array
(
    [0] => Array
        (
            [0] => <h1><a href="http://www.techbabu.com/2009/10/best-10-linux-distros/" rel="bookmark">Best 10 Linux Distros</a></h1>
            [1] => <h1><a href="http://www.techbabu.com/2009/10/microsoft-windows-7-launched/" rel="bookmark">Microsoft Windows 7 Launched</a></h1>
            [2] => <h1><a href="http://www.techbabu.com/2009/10/samsung-mobile-phone-t401g/" rel="bookmark">Samsung Mobile Phone â€“ T401G</a></h1>
            [3] => <h1><a href="http://www.techbabu.com/2009/10/motorola-mobile-phone-dext-mb220/" rel="bookmark">Motorola Mobile Phone â€“ DEXT MB220</a></h1>
            [4] => <h1><a href="http://www.techbabu.com/2009/10/samsung-mobile-phone-t939-behold-2/" rel="bookmark">Samsung Mobile Phone â€“ T939 Behold 2</a></h1>
            [5] => <h1><a href="/privacy-policy">Privacy Policy</a> &nbsp; | &nbsp; <a href="/sitemap/">Sitemap</a> &nbsp; | &nbsp; <a href="/contact/">Contact Us</a></h1>
        )

)
Code (markup):

techbabu, Oct 24, 2009 IP

Log in or Sign up

finding a tag in html document

theblackjacker Peon

mastermunj Well-Known Member

techbabu Peon

FCM Well-Known Member

theblackjacker Peon

JAY6390 Peon

techbabu Peon

Useful Searches