How to to extract TITLE, DESCRIPTION of a URL

rakibtg Peon

Messages:: 19

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

Hello,
here i am showing you a code that extract TITLE, DESCRIPTION perfectly of a page.

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <title>amar titkle</title>
    <meta http-equiv="content-type" content="text/html;charset=utf-8" />
    <meta name="description" content="shob kisur bornona" />
    <meta name="keywords" content="eta ki, googl , bd, in" />
    <style type="text/css">
    #Box{border:1px solid #000044;text-align:center;background-color : #fff;}
    #green {color:green }
    </style>
    </head>
    <body><h1>Niche Dekhun</h1><br />
    <div id="Box">
    <script type="text/javascript">
    //<![CDATA[
    var a = (document.title);
    var b = (document.location.href);
    var c = document.getElementsByTagName('meta');
    var description;
    for (var x = 0, y = c.length; x < y; x++) {
    if (c[x].name.toLowerCase() == "description") {
    description = c[x];
    }
    }
    document.write('<a href="' + b + '"> ' + a + '<\/a> <br /> ' + description.content + '<br /> ');
    document.write('<span id="green"> ' + b + '<\/span> ');
    //]]>
    </script>
    </div>
    </body>
    </html>

Code (markup):

What then if i want to extract TITLE, DESCRIPTION of a URL by this above code in my website? What types of changes i need to do in this code?
Hoping a great response!
Thankyou

rakibtg, Jun 21, 2011 IP

badmas Well-Known Member

Messages:: 117

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 130

#2

may we know how you are processing the new url ??

Ajax is great way to perform it.

badmas, Jun 21, 2011 IP

rakibtg Peon

Messages:: 19

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

it is $attn

rakibtg, Jun 21, 2011 IP

Limetreeonline Peon

Messages:: 5

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

I hope I can copy paste someone's code over here. This code I found from forkaya site. This code will fetch title, description and keyword from a url entered:

<!--
<?php
/*
 * URL Fetch Script
 * 
 * This script fetches/extracts Title, Description, and Keywords from webpages
 * using specified URL
 * 
 * Provided by www.forkaya.com
 * 
 */


/*
 * INITIALIZATION SECTION ***************************************************************************************************************************
 */

	$isError = false;
	$submitted = false;
	$sourceVisible = false;
	$sourceText_Visible = 'View Source Code';
	$sourceText_NotVisible = 'Hide Source Code';
	$eMsg = '';
	$aValues = array(
				'url'=>'',
				'title'=>'',
				'description'=>'',
				'keywords'=>''
				);
	//script support charsets needed for encoding purposes
	//add others if needed; it will require custom coding; look for 'charset custom' comments below
	//keep charsets lowercase			
	$aCharsets = array(
				'utf-8', //Unicode
				'iso-8859-1' //Western Europe
				//charset custom: add other charsets as needed
				//'windows-1258' //Vietnamese
				);

/*
 * FUNCTIONS SECTION ***************************************************************************************************************************
 */
	//this function will determine the website's charset
	function get_charset($aCS,$website) {
		
		$result = '';
		$website = strtolower($website); 
		
		//check the http header first
		$pos = strpos($website,'<html');
		if ($pos) {
		    $wsHeader = substr($website,0,$pos);
			//loop through array of charsets
			foreach ($aCS as $val) {
			
				if (strpos($wsHeader,$val) > 0) {
					$result = $val;
					break;
				}
			}
		}

		if (empty($result)) {
			
			//supported charset was not found in the http header

			$wsContentType = '';
			
			$wsDOM = new DOMDocument();
			@$wsDOM->loadHTML($website);
			
			$meta_elements = $wsDOM->getElementsByTagName('meta');
			foreach ($meta_elements as $meta_element) {
				if (strtolower($meta_element->getAttribute('http-equiv')) == 'content-type') {
			    	$wsContentType = strtolower($meta_element->getAttribute('content'));
				}
			}
			
			if ($wsContentType === '') {
				//return empty
			} else {
				// look for specific charsets
				
				//loop through array of charsets
				foreach ($aCS as $val) {
				
					if (strpos($wsContentType,$val) > 0) {
						$result = $val;
						break;
					}
				}
			}
		}
		
		return $result;
	}
	
/*
 * VALIDATION AND ACTION SECTION ********************************************************************************************************************
 */
	
	if (isset($_POST['submit'])) {

		$submitted = true;
		$aValues['url'] = $_POST['url'];
		$aValues['title'] = 'No title';
		$aValues['description'] = 'No description';
		$aValues['keywords'] = 'No keywords';
		
		if (strlen($_POST['url']) == 0) {
			$eMsg .= 'URL cannot be blank.<br />';
			$isError = true;
		}

		if(!$isError) {

			//create a new cURL resource pointing to specified url
			$cURL = curl_init($aValues['url']);
			//include the header in the output. 
			curl_setopt($cURL,CURLOPT_HEADER,true);
			//return the transfer as a string of the return value of curl_exec()
			//instead of outputting it out directly. 
			curl_setopt($cURL,CURLOPT_RETURNTRANSFER,true);
			//set the request timeout in sec.
			curl_setopt($cURL,CURLOPT_TIMEOUT,60);
			//go after redirected pages
			curl_setopt($cURL, CURLOPT_FOLLOWLOCATION, true);
			
			//grab URL and assign it as string to variable
			$reply_page = curl_exec($cURL);

			//echo('<--'.$reply_page.'-->');
			
			//close cURL resource, and free up system resources
			curl_close($cURL);
			
			if (strlen($reply_page) == 0) {
				$eMsg .= 'Website unavailable.<br />';
				$isError = true;
			} else {
				
				//determine the website's charset
				$wbCharset = get_charset($aCharsets,$reply_page);
				
				//we do not need header anymore
				$reply_page = strstr($reply_page,'<html');
				
				//we need to convert to utf-8 because DOMDocument expects it
				switch ($wbCharset) {
					case '':
						//do nothing
						break;
						
					case 'utf-8':
						
						//for the purpose of this script, we can replace 'iso-8859-1' strings with 'utf-8' (if there are any) in the whole website
						$reply_page = str_ireplace('iso-8859-1','utf-8',$reply_page);
						break;
						
					case 'iso-8859-1':

						//for the purpose of this script, we can replace 'iso-8859-1' with 'utf-8' in the whole website
						$reply_page = str_ireplace('iso-8859-1','utf-8',$reply_page);
						
						//encode the website into utf-8
						$reply_page = utf8_encode($reply_page);
						break;
						
					//charset custom: add logic for other charsets as needed
					//case 'windows-1258': //Vietnamese
					//	$reply_page = str_ireplace('windows-1258','utf-8',$reply_page);
					//	write or find a code to encode the charset to utf-8
					//	break;
				}

				//for the purpose of this script, 
				//we can add <meta http-equiv=Content-Type content="text/html; charset=utf-8"> tag
				//right after <head> tag to make DOM 'happy'
				$reply_page = str_ireplace(
					'<head>',
					'<head><meta http-equiv=Content-Type content="text/html; charset=utf-8">',
					$reply_page);
				
				$pageDOM = new DOMDocument();
				@$pageDOM->loadHTML($reply_page);
				
				//Title
				$title_elements = $pageDOM->getElementsByTagName('title');
				if ($title_elements->length <> 0) {
					$aValues['title'] = $title_elements->item(0)->nodeValue;
				}
				
				$meta_elements = $pageDOM->getElementsByTagName('meta');
				foreach ($meta_elements as $meta_element) {
					if (strtolower($meta_element->getAttribute('name')) == 'description') {
				    	$aValues['description'] = $meta_element->getAttribute('content');
					}
					if (strtolower($meta_element->getAttribute('name')) == 'keywords') {
				    	$aValues['keywords'] = $meta_element->getAttribute('content');
					}
				}
			}
		}
		
	}
	
	if (isset($_GET['source'])) {
		if ($_GET['source'] == 1) {
			$sourceStr = file_get_contents('url-fetch-source.php');
			$sourceVisible = true;
			$sourceText = $sourceText_NotVisible;
			$sourceValue = 0;
		} else {
			$sourceVisible = false;
			$sourceText = $sourceText_Visible;
			$sourceValue = 1;
		}
	} else {
		$sourceText = $sourceText_Visible;
		$sourceValue = 1;
	}
	
	header('Content-Type: text/html; charset=utf-8');
	
/*
 * DISPLAY SECTION **********************************************************************************************************************************
 */
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
		<meta name="description" content="This PHP script extracts Title, Description, and Keywords from specified URL"/>
		<meta name="keywords" content="php scripting, php, extract, fetch, meta"/>
		<meta name="author" content="forkaya" />
		<link rel="stylesheet" href="../style.css" type="text/css">
		<title>Forkaya - PHP Scripts - URL Fetch - Extract Title, Description, and Keywords from URL</title>
	</head>
	<body>
		<form action="" method="post">
			<table 	align="left" class="bb">
				<tr>
					<td colspan="2" height="30" align="center"  class="aa"><h3><a href="..">Forkaya</a> - <a href=".">PHP Scripts</a> - <a href="./url-fetch.php">URL Fetch</a></h3></td>
				</tr>
				<tr>
					<td colspan="2" height="30" align="left">This script fetches/extracts Title, Description, and Keywords from webpages using specified URL</td>
				</tr>
				<tr>
					<td colspan="2" height="30"></td>
				</tr>
<?php

	if($isError) { 
		echo('
				<tr>
					<td colspan="2" align="left" class="cc">'.$eMsg.'</td>
				</tr>
		'); 
	}

?>
				<tr>
					<td align="left">Enter URL:</td>
					<td align="left"><input type="text" name="url" maxlength="256" size="56" value="<?php echo($aValues['url']);?>"/></td>
				</tr>
				<tr>
					<td align="left"><input type="submit" name="submit" value="Submit"/></td>
					<td align="left"><a href="url-fetch.php?source=<?php echo($sourceValue);?>" class="ff"><?php echo($sourceText); ?></a></td>
				</tr>
				<tr>
					<td colspan="2" align="left"></td>
				</tr>
<?php

	if($submitted and !$isError) { 
		echo('
				<tr>
					<td align="left" valign="top" class="aa">Title: </td>
					<td align="left">'.$aValues['title'].'</td>
				</tr>
				<tr>
					<td align="left" valign="top" class="aa">Description:</td>
					<td>'.$aValues['description'].'</td>
				</tr>
				<tr>
					<td align="left" valign="top" class="aa">Keywords:</td>
					<td align="left">'.$aValues['keywords'].'</td>
				</tr>
		'); 
	}

	if($sourceVisible) { 
		echo('
				<tr>
					<td align="left" colspan="2"><textarea rows="174" cols="210" readonly="readonly" class="ee">'.$sourceStr.'

HTML:

Limetreeonline, Jun 22, 2011 IP

rakibtg Peon

Messages:: 19

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

is this code completed?

Last edited: Jun 23, 2011

rakibtg, Jun 23, 2011 IP

BRUm Well-Known Member

Messages:: 3,086

Likes Received:: 61

Best Answers:: 1

Trophy Points:: 100

#6

That's a lot of code. Unless you really need the client's browser to extract these things, just use PHP's REGEX functions and simply return it.
<?php
    $html = file_get_contents("http://awebsite.com");
    preg_match("/<title>.*<\/title>/", $html, $title); // Gets title
    preg_match("/<description>.*<\/description>/", $html, $desc); // Gets description
    print $title[0];
    print $desc[0];
?>
PHP:

BRUm, Jun 25, 2011 IP

charlessconcepts Peon

Messages:: 1

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#7

I have used this as well.

$metatagarray = get_meta_tags( $URL );
$keywords = $metatagarray[ "keywords" ];
$description = $metatagarray[ "description" ];
$author = $metatagarray[ "author" ];

But have a small issue. I have been trying to extract the first image within the content/body section with no success.

preg_match("/<img[^>]+>/i", $html, $image);

I want the image url itself to put into page as an <img src=
Any ideas?

Thanks
Charles.

charlessconcepts, Aug 6, 2011 IP

Log in or Sign up

How to to extract TITLE, DESCRIPTION of a URL

rakibtg Peon

badmas Well-Known Member

rakibtg Peon

Limetreeonline Peon

rakibtg Peon

BRUm Well-Known Member

charlessconcepts Peon

Useful Searches