I need to php code that gets title and description and keywords of a webpage Is it possible to write a code for this? I searched in Google and other php resources but I can't find none! I used a preg_match code but I did not get the required results What I should use for getting text in between the tags?
<?php function getMeta( $url ) { $rets = new stdClass; $data = file_get_contents( $url ); if( !$data ) : return false; endif; preg_match('#<title>(.*?)</title>#si', $data, $matches ); $rets->title = trim( $matches[1] ); preg_match('#<meta ?+name=["|\']keywords["|\'] ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches ); $rets->keywords = trim( $matches[1] ); preg_match('#<meta ?+name=["|\']description["|\'] ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches ); $rets->description = trim( $matches[1] ); return $rets; } echo "<pre>"; echo "krakjoe.com meta object\r\n"; print_r( getMeta('http://krakjoe.com') ); echo "your thread meta object\r\n"; print_r( getMeta('http://forums.digitalpoint.com/showthread.php?t=300405') ); ?> PHP:
$filestring=file_get_contents("$the_url") ; if (eregi("<title>(.*)</title>", $filestring, $out)) { $titletag="$out[1]"; } Code (markup):
Though this code is fine this does not work for this site www.rediff.com due to too much characters Anyway thanks to KRAKJOE(joewatkins) for his code I would rate him as one of the best in php coding! some Greens for you! Also I need to store this data (description,title) in separate variables what should I do?
I corrected this myself I got only the title what is the reasons? Since there are meta tags in the source page!
try inserting unset($matches); PHP: after each assignment. Once you have put $title = trim....; insert the code I pasted in there. As $matches[] is an array it may be the case that you can't overwrite it by assigning new data to it. It's best practise to have the data returned from the function before you start doing things with it the way krakjoe had done first of all. This means that the function will be reuseable for future programs and not just specific to the program you are coding at the moment.
no need to unset the array, it's not passed by reference. <?php function getMeta( $url ) { $rets = new stdClass; $data = file_get_contents( $url ); if( !$data ) : return false; endif; preg_match('#<title>(.*?)</title>#si', $data, $matches ); $rets->title = $matches[1]; preg_match('#<meta ?+name=["|\']?keywords["|\']? ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches ); $rets->keywords = trim( $matches[1] ); preg_match('#<meta ?+name=["|\']?description["|\']? ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches ); $rets->description = trim( $matches[1] ); return $rets; } echo "<pre>"; echo "krakjoe.com meta object\r\n"; print_r( getMeta('http://krakjoe.com') ); echo "your thread meta object\r\n"; print_r( getMeta('http://forums.digitalpoint.com/showthread.php?t=300405') ); echo "Your website - non standard tags\r\n"; print_r( getMeta('http://www.rediff.com') ); ?> PHP: You have non standard meta tags, regex fixed to include them also
It works fine now Thanks Joe for your help! Also whether there is any method to separate the keywords and store it in separate variables??
<?php function getMeta( $url ) { $rets = new stdClass; $data = file_get_contents( $url ); if( !$data ) : return false; endif; preg_match('#<title>(.*?)</title>#si', $data, $matches ); $rets->title = $matches[1]; preg_match('#<meta ?+name=["|\']?keywords["|\']? ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches ); $rets->keywords = split(",", trim( $matches[1] ) ); preg_match('#<meta ?+name=["|\']?description["|\']? ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches ); $rets->description = trim( $matches[1] ); return $rets; } echo "<pre>"; echo "krakjoe.com meta object\r\n"; print_r( getMeta('http://krakjoe.com') ); echo "your thread meta object\r\n"; print_r( getMeta('http://forums.digitalpoint.com/showthread.php?t=300405') ); echo "Your website - non standard tags\r\n"; print_r( getMeta('http://www.rediff.com') ); ?> PHP:
Thanks for the info This is what I was exactly looking for! I modified it for myself this is the output for your site address