Need help for getting title and description of a website

Discussion in 'PHP' started by Freewebspace, Apr 16, 2007.

  1. #1
    I need to php code that

    gets title and description and keywords of a webpage

    Is it possible to write a code for this?

    I searched in Google and other php resources but I can't find none!

    I used a preg_match code but I did not get the required results

    What I should use for getting text in between the tags?
     
    Freewebspace, Apr 16, 2007 IP
  2. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #2
    
    <?php
    function getMeta( $url )
    {
    	$rets = new stdClass;
    	$data = file_get_contents( $url );
    	if( !$data ) : return false; endif;
    	preg_match('#<title>(.*?)</title>#si', $data, $matches );
    	$rets->title = trim( $matches[1] );
    	preg_match('#<meta ?+name=["|\']keywords["|\'] ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches );
    	$rets->keywords = trim( $matches[1] );
    	preg_match('#<meta ?+name=["|\']description["|\'] ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches );
    	$rets->description = trim( $matches[1] );
    	return $rets;
    }
    echo "<pre>";
    echo "krakjoe.com meta object\r\n";
    print_r( getMeta('http://krakjoe.com') );
    echo "your thread meta object\r\n";
    print_r( getMeta('http://forums.digitalpoint.com/showthread.php?t=300405') );
    ?>
    PHP:
     
    krakjoe, Apr 16, 2007 IP
    Freewebspace likes this.
  3. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #3
    $filestring=file_get_contents("$the_url") ;
    if (eregi("<title>(.*)</title>", $filestring, $out)) { 
            $titletag="$out[1]"; 
            } 
    Code (markup):
     
    mad4, Apr 16, 2007 IP
  4. Freewebspace

    Freewebspace Notable Member

    Messages:
    6,213
    Likes Received:
    370
    Best Answers:
    0
    Trophy Points:
    275
    #4
    Though this code is fine this does not work for this site

    www.rediff.com

    due to too much characters

    Anyway thanks to KRAKJOE(joewatkins) for his code

    I would rate him as one of the best in php coding!

    some Greens for you!


    Also I need to store this data (description,title) in separate variables

    what should I do?
     
    Freewebspace, Apr 16, 2007 IP
  5. Weirfire

    Weirfire Language Translation Company

    Messages:
    6,979
    Likes Received:
    365
    Best Answers:
    0
    Trophy Points:
    280
    #5
    $title = $rets->title;
    PHP:
    Just do

    vardump($rets);
    PHP:
    to see all the variables within the array.
     
    Weirfire, Apr 16, 2007 IP
  6. Freewebspace

    Freewebspace Notable Member

    Messages:
    6,213
    Likes Received:
    370
    Best Answers:
    0
    Trophy Points:
    275
    #6
    I corrected this myself





    I got only the title

    what is the reasons?

    Since there are meta tags in the source page!
     
    Freewebspace, Apr 16, 2007 IP
  7. Weirfire

    Weirfire Language Translation Company

    Messages:
    6,979
    Likes Received:
    365
    Best Answers:
    0
    Trophy Points:
    280
    #7

    try inserting

    unset($matches);
    PHP:
    after each assignment. Once you have put $title = trim....; insert the code I pasted in there. As $matches[] is an array it may be the case that you can't overwrite it by assigning new data to it.

    It's best practise to have the data returned from the function before you start doing things with it the way krakjoe had done first of all. This means that the function will be reuseable for future programs and not just specific to the program you are coding at the moment.
     
    Weirfire, Apr 16, 2007 IP
  8. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #8
    no need to unset the array, it's not passed by reference.

    
    <?php
    function getMeta( $url )
    {
    	$rets = new stdClass;
    	$data = file_get_contents( $url );
    	if( !$data ) : return false; endif;
    	preg_match('#<title>(.*?)</title>#si', $data, $matches );
    	$rets->title = $matches[1];
    	preg_match('#<meta ?+name=["|\']?keywords["|\']? ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches );
    	$rets->keywords = trim( $matches[1] );
    	preg_match('#<meta ?+name=["|\']?description["|\']? ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches );
    	$rets->description = trim( $matches[1] );
    	return $rets;
    }
    echo "<pre>";
    echo "krakjoe.com meta object\r\n";
    print_r( getMeta('http://krakjoe.com') );
    echo "your thread meta object\r\n";
    print_r( getMeta('http://forums.digitalpoint.com/showthread.php?t=300405') );
    echo "Your website - non standard tags\r\n";
    print_r( getMeta('http://www.rediff.com') );
    ?>
    
    PHP:
    You have non standard meta tags, regex fixed to include them also
     
    krakjoe, Apr 16, 2007 IP
    Weirfire likes this.
  9. Freewebspace

    Freewebspace Notable Member

    Messages:
    6,213
    Likes Received:
    370
    Best Answers:
    0
    Trophy Points:
    275
    #9
    It works fine now

    Thanks Joe for your help!

    Also whether there is any method to separate the keywords and store it in separate variables??
     
    Freewebspace, Apr 16, 2007 IP
  10. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #10
    
    <?php
    function getMeta( $url )
    {
    	$rets = new stdClass;
    	$data = file_get_contents( $url );
    	if( !$data ) : return false; endif;
    	preg_match('#<title>(.*?)</title>#si', $data, $matches );
    	$rets->title = $matches[1];
    	preg_match('#<meta ?+name=["|\']?keywords["|\']? ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches );
    	$rets->keywords = split(",", trim( $matches[1] ) );
    	preg_match('#<meta ?+name=["|\']?description["|\']? ?+content=["|\'](.*?)["|\'].*?/>#si', $data, $matches );
    	$rets->description = trim( $matches[1] );
    	return $rets;
    }
    echo "<pre>";
    echo "krakjoe.com meta object\r\n";
    print_r( getMeta('http://krakjoe.com') );
    echo "your thread meta object\r\n";
    print_r( getMeta('http://forums.digitalpoint.com/showthread.php?t=300405') );
    echo "Your website - non standard tags\r\n";
    print_r( getMeta('http://www.rediff.com') );
    ?>
    
    PHP:
     
    krakjoe, Apr 16, 2007 IP
  11. Freewebspace

    Freewebspace Notable Member

    Messages:
    6,213
    Likes Received:
    370
    Best Answers:
    0
    Trophy Points:
    275
    #11
    Thanks for the info


    This is what I was exactly looking for!

    I modified it for myself this is the output for your site address
     
    Freewebspace, Apr 16, 2007 IP