How to edit PHP code to except only the first 200 words?

sheldon365 Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#1

I have a code from five filters. This code converts rss to full text, but to avoid copyright issues I would like to add the first 200 words of the rss feed only. I would like to know what would I have to edit to let only the first 200 words to be displayed.

////////////////////////////////
// Check for feed URL
////////////////////////////////
if (!isset($_GET['url'])) { 
	die('No URL supplied'); 
}
$url = $_GET['url'];
if (!preg_match('!^https?://.+!i', $url)) {
	$url = 'http://'.$url;
}
$valid_url = filter_var($url, FILTER_VALIDATE_URL);
if ($valid_url !== false && $valid_url !== null && preg_match('!^https?://!', $valid_url)) {
	$url = filter_var($url, FILTER_SANITIZE_URL);
} else {
	die('Invalid URL supplied');
}

///////////////////////////////////////////////
// Check if the request is explicitly for an HTML page
///////////////////////////////////////////////
$html_only = (isset($_GET['html']) && $_GET['html'] == 'true');

////////////////////////////////
// Check for valid format
////////////////////////////////
$format = 'rss';

//////////////////////////////////
// Check for cached copy
//////////////////////////////////
$cache_file = 'cache/'.md5($url).'.xml';
if (file_exists($cache_file)) {
	$cache_mtime = filemtime($cache_file);
	$diff = time() - $cache_mtime;
	$diff = $diff / 60;
	if ($diff < 10) { // cache created less than 10 minutes ago
		header("Content-type: text/xml; charset=UTF-8");
		if (headers_sent()) die('Some data has already been output to browser, can\'t send RSS file');
		readfile($cache_file);
		exit;
	}
}

////////////////////////////////
// Get RSS/Atom feed
////////////////////////////////
if (!$html_only) {
	$feed = new SimplePie();
	$feed->set_feed_url($url);
	$feed->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);
	$feed->set_timeout(20);
	$feed->enable_cache(false);
	$feed->set_stupidly_fast(true);
	$feed->enable_order_by_date(false); // we don't want to do anything to the feed
	$feed->set_url_replacements(array());
	$result = $feed->init();
	//$feed->handle_content_type();
	//$feed->get_title();
	if ($result && (!is_array($feed->data) || count($feed->data) == 0)) {
		die('Sorry, no feed items found');
	}
}

////////////////////////////////////////////////////////////////////////////////
// Extract content from HTML (if URL is not feed or explicit HTML request has been made)
////////////////////////////////////////////////////////////////////////////////
if ($html_only || !$result) {
	$html = @file_get_contents($url);
	if (!$html) die('Error retrieving '.$url);
	$node = grabArticle($html);
	$title = $node->firstChild->textContent;
	$content = $node->ownerDocument->saveXML($node->lastChild);
	unset($node, $html);
	$output = new FeedWriter(); //ATOM an option
	$output->setTitle($title);
	$output->setDescription("Content extracted by fivefilters.org from $url");
	if ($format == 'atom') {
		$output->setChannelElement('updated', date(DATE_ATOM));
		$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
	}
	$output->setLink($url);
	$newitem = $output->createNewItem();
	$newitem->setTitle($title);
	$newitem->setLink($url);
	if ($format == 'atom') {
		$newitem->setDate(time());
		$newitem->addElement('content', $content);
	} else {
		$newitem->setDescription($content);
	}
	$output->addItem($newitem);
	$output->genarateFeed(); 
	exit;
}

////////////////////////////////////////////
// Create full-text feed
////////////////////////////////////////////

$output = new FeedWriter(); //ATOM an option
$output->setTitle($feed->get_title());
$output->setDescription('[full-text feed from fivefilters.org]: '.$feed->get_description());
$output->setLink($feed->get_link());
if ($img_url = $feed->get_image_url()) {
	$output->setImage($feed->get_title(), $feed->get_link(), $img_url);
}
if ($format == 'atom') {
	$output->setChannelElement('updated', date(DATE_ATOM));
	$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
}

////////////////////////////////////////////
// Loop through feed items
////////////////////////////////////////////
$items = $feed->get_items(0, 15);	 
foreach ($items as $item) {
	// some URLs appear to have characters HTML encoded - does decoding affect other URLs?
	$permalink = htmlspecialchars_decode($item->get_permalink());
	$permalink = filter_var($permalink, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);
	if ($permalink !== false && $permalink !== null && preg_match('!^https?://!', $permalink)) {
		$permalink = filter_var($permalink, FILTER_SANITIZE_URL);
	} else {
		$permalink = false;
	}
	$newitem = $output->createNewItem();
	$newitem->setTitle(htmlspecialchars_decode($item->get_title()));
	if ($permalink !== false) {
		$newitem->setLink($permalink);
	} else {
		$newitem->setLink($item->get_permalink());
	}
	
	if ($permalink && $html = @file_get_contents($permalink)) {
		$html = grabArticleHtml($html, false);
	} else {
		$html = '<p><em>[fivefilters.org: unable to retrieve full-text content]</em></p>';
		$html .= $item->get_description();
	}
	if ($format == 'atom') {
		$newitem->addElement('content', $html);
		$newitem->setDate((int)$item->get_date('U'));
		if ($author = $item->get_author()) {
			$newitem->addElement('author', array('name'=>$author->get_name()));
		}
	} else {
		$newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
		$newitem->setDescription($html);
		if ((int)$item->get_date('U') > 0) {
			$newitem->setDate((int)$item->get_date('U'));
		}
		if ($author = $item->get_author()) {
			$newitem->addElement('dc:creator', $author->get_name());
		}
	}
	$output->addItem($newitem);
	unset($html);
}
// output feed
ob_start();
$output->genarateFeed();
$output = ob_get_contents();
ob_end_clean();
file_put_contents($cache_file, $output);
echo $output;
?>

Code (markup):

sheldon365, Dec 26, 2010 IP

sheldon365 Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#2

60+ views and yet not a single reply.

sheldon365, Dec 27, 2010 IP

mastermunj Well-Known Member

Messages:: 687

Likes Received:: 13

Best Answers:: 0

Trophy Points:: 110

#3

Though I am not sure how accurate following solution would be, try it and let me know if you face any difficulty.

Replace
$html = grabArticleHtml($html, false);
PHP:
with
$html = grabArticleHtml($html, false);
$html = str_n_words($html, 200);
PHP:
Also, copy following function into same file.
function str_n_words($str, $word_count)
{
	$str_split = explode(' ', $str);
	if(count($str_split) <= $word_count)
	{
		return $str;
	}
	
	array_splice($str_split, $word_count);
	return implode(' ', $str_split);
}
PHP:

mastermunj, Dec 27, 2010 IP

sheldon365 Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#4

// Include SimplePie for RSS/Atom parsing
require_once('libraries/simplepie/simplepie.inc');
// Include FeedCreator for RSS/Atom creation
//require_once('libraries/feedcreator/include/feedcreator.class.php');
require_once('libraries/feedwriter/FeedWriter.php');
require_once('libraries/feedwriter/FeedItem.php');
// Include readability.php for identifying and extracting content from URLs
require_once('readability.php');

////////////////////////////////
// Check for feed URL
////////////////////////////////
if (!isset($_GET['url'])) { 
	die('No URL supplied'); 
}
$url = $_GET['url'];
if (!preg_match('!^https?://.+!i', $url)) {
	$url = 'http://'.$url;
}
$valid_url = filter_var($url, FILTER_VALIDATE_URL);
if ($valid_url !== false && $valid_url !== null && preg_match('!^https?://!', $valid_url)) {
	$url = filter_var($url, FILTER_SANITIZE_URL);
} else {
	die('Invalid URL supplied');
}

///////////////////////////////////////////////
// Check if the request is explicitly for an HTML page
///////////////////////////////////////////////
$html_only = (isset($_GET['html']) && $_GET['html'] == 'true');

////////////////////////////////
// Check for valid format
////////////////////////////////
$format = 'rss';

//////////////////////////////////
// Check for cached copy
//////////////////////////////////
$cache_file = 'cache/'.md5($url).'.xml';
if (file_exists($cache_file)) {
	$cache_mtime = filemtime($cache_file);
	$diff = time() - $cache_mtime;
	$diff = $diff / 60;
	if ($diff < 10) { // cache created less than 10 minutes ago
		header("Content-type: text/xml; charset=UTF-8");
		if (headers_sent()) die('Some data has already been output to browser, can\'t send RSS file');
		readfile($cache_file);
		exit;
	}
}

////////////////////////////////
// Get RSS/Atom feed
////////////////////////////////
if (!$html_only) {
	$feed = new SimplePie();
	$feed->set_feed_url($url);
	$feed->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);
	$feed->set_timeout(20);
	$feed->enable_cache(false);
	$feed->set_stupidly_fast(true);
	$feed->enable_order_by_date(false); // we don't want to do anything to the feed
	$feed->set_url_replacements(array());
	$result = $feed->init();
	//$feed->handle_content_type();
	//$feed->get_title();
	if ($result && (!is_array($feed->data) || count($feed->data) == 0)) {
		die('Sorry, no feed items found');
	}
}

////////////////////////////////////////////////////////////////////////////////
// Extract content from HTML (if URL is not feed or explicit HTML request has been made)
////////////////////////////////////////////////////////////////////////////////
if ($html_only || !$result) {
	$html = @file_get_contents($url);
	if (!$html) die('Error retrieving '.$url);
	$node = grabArticle($html);
	$title = $node->firstChild->textContent;
	$content = $node->ownerDocument->saveXML($node->lastChild);
	unset($node, $html);
	$output = new FeedWriter(); //ATOM an option
	$output->setTitle($title);
	$output->setDescription("Content extracted by fivefilters.org from $url");
	if ($format == 'atom') {
		$output->setChannelElement('updated', date(DATE_ATOM));
		$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
	}
	$output->setLink($url);
	$newitem = $output->createNewItem();
	$newitem->setTitle($title);
	$newitem->setLink($url);
	if ($format == 'atom') {
		$newitem->setDate(time());
		$newitem->addElement('content', $content);
	} else {
		$newitem->setDescription($content);
	}
	$output->addItem($newitem);
	$output->genarateFeed(); 
	exit;
}

////////////////////////////////////////////
// Create full-text feed
////////////////////////////////////////////

$output = new FeedWriter(); //ATOM an option
$output->setTitle($feed->get_title());
$output->setDescription('[full-text feed from fivefilters.org]: '.$feed->get_description());
$output->setLink($feed->get_link());
if ($img_url = $feed->get_image_url()) {
	$output->setImage($feed->get_title(), $feed->get_link(), $img_url);
}
if ($format == 'atom') {
	$output->setChannelElement('updated', date(DATE_ATOM));
	$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
}

////////////////////////////////////////////
// Loop through feed items
////////////////////////////////////////////
$items = $feed->get_items(0, 1);	 
foreach ($items as $item) {
	// some URLs appear to have characters HTML encoded - does decoding affect other URLs?
	$permalink = htmlspecialchars_decode($item->get_permalink());
	$permalink = filter_var($permalink, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);
	function str_n_words($str, $word_count)
{
    $str_split = explode(' ', $str);
    if(count($str_split) <= $word_count)
    {
        return $str;
    }
    
    array_splice($str_split, $word_count);
    return implode(' ', $str_split);
}
        if ($permalink !== false && $permalink !== null && preg_match('!^https?://!', $permalink)) {
		$permalink = filter_var($permalink, FILTER_SANITIZE_URL);
	} else {
		$permalink = false;
	}
	$newitem = $output->createNewItem();
	$newitem->setTitle(htmlspecialchars_decode($item->get_title()));
	if ($permalink !== false) {
		$newitem->setLink($permalink);
	} else {
		$newitem->setLink($item->get_permalink());
	}
	
	if ($permalink && $html = @file_get_contents($permalink)) {
		$html = grabArticleHtml($html, false);
                $html = str_n_words($html, 200);

	} else {
		$html = '<p><em>[fivefilters.org: unable to retrieve full-text content]</em></p>';
		$html .= $item->get_description();
	}
	if ($format == 'atom') {
		$newitem->addElement('content', $html);
		$newitem->setDate((int)$item->get_date('U'));
		if ($author = $item->get_author()) {
			$newitem->addElement('author', array('name'=>$author->get_name()));
		}
	} else {
		$newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
		$newitem->setDescription($html);
		if ((int)$item->get_date('U') > 0) {
			$newitem->setDate((int)$item->get_date('U'));
		}
		if ($author = $item->get_author()) {
			$newitem->addElement('dc:creator', $author->get_name());
		}
	}
	$output->addItem($newitem);
	unset($html);
}
// output feed
ob_start();
$output->genarateFeed();
$output = ob_get_contents();
ob_end_clean();
file_put_contents($cache_file, $output);
echo $output;
?>

Code (markup):

Please let me know if I have placed the function in the right place. Also there are no errors when I convert it to full rss.

sheldon365, Dec 27, 2010 IP

mastermunj Well-Known Member

Messages:: 687

Likes Received:: 13

Best Answers:: 0

Trophy Points:: 110

#5

place the function in same file where this code is placed.

mastermunj, Dec 27, 2010 IP

sheldon365 Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#6

Please check my code placed in the above post and see if it is right. If it is wrong please place it in the right place and post in the next post.

sheldon365, Dec 27, 2010 IP

mastermunj Well-Known Member

Messages:: 687

Likes Received:: 13

Best Answers:: 0

Trophy Points:: 110

#7

That is wrong placement.

Keep the function at either beginning of file after "<?" or at end of file before "?>".

mastermunj, Dec 27, 2010 IP

sheldon365 Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#8

This is the error I get when I place it like this.
Parse error: syntax error, unexpected T_STRING on line 33

<?function str_n_words($str, $word_count)
{
    $str_split = explode(' ', $str);
    if(count($str_split) <= $word_count)
    {
        return $str;
    }
    
    array_splice($str_split, $word_count);
    return implode(' ', $str_split);
}
php
// Create Full-Text Feeds
// Author: Keyvan Minoukadeh
// License: AGPLv3
// Date: 2009-08-03
// How to use: request this file passing it your feed in the querystring: makefulltextfeed.php?url=http://mysite.org

/*
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
*/
error_reporting(E_ALL ^ E_NOTICE);
ini_set("display_errors", 1);
@set_time_limit(120);

// Include SimplePie for RSS/Atom parsing
require_once('libraries/simplepie/simplepie.inc');
// Include FeedCreator for RSS/Atom creation
//require_once('libraries/feedcreator/include/feedcreator.class.php');
require_once('libraries/feedwriter/FeedWriter.php');
require_once('libraries/feedwriter/FeedItem.php');
// Include readability.php for identifying and extracting content from URLs
require_once('readability.php');

////////////////////////////////
// Check for feed URL
////////////////////////////////
if (!isset($_GET['url'])) { 
	die('No URL supplied'); 
}
$url = $_GET['url'];
if (!preg_match('!^https?://.+!i', $url)) {
	$url = 'http://'.$url;
}
$valid_url = filter_var($url, FILTER_VALIDATE_URL);
if ($valid_url !== false && $valid_url !== null && preg_match('!^https?://!', $valid_url)) {
	$url = filter_var($url, FILTER_SANITIZE_URL);
} else {
	die('Invalid URL supplied');
}

///////////////////////////////////////////////
// Check if the request is explicitly for an HTML page
///////////////////////////////////////////////
$html_only = (isset($_GET['html']) && $_GET['html'] == 'true');

////////////////////////////////
// Check for valid format
////////////////////////////////
$format = 'rss';

//////////////////////////////////
// Check for cached copy
//////////////////////////////////
$cache_file = 'cache/'.md5($url).'.xml';
if (file_exists($cache_file)) {
	$cache_mtime = filemtime($cache_file);
	$diff = time() - $cache_mtime;
	$diff = $diff / 60;
	if ($diff < 10) { // cache created less than 10 minutes ago
		header("Content-type: text/xml; charset=UTF-8");
		if (headers_sent()) die('Some data has already been output to browser, can\'t send RSS file');
		readfile($cache_file);
		exit;
	}
}

////////////////////////////////
// Get RSS/Atom feed
////////////////////////////////
if (!$html_only) {
	$feed = new SimplePie();
	$feed->set_feed_url($url);
	$feed->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);
	$feed->set_timeout(20);
	$feed->enable_cache(false);
	$feed->set_stupidly_fast(true);
	$feed->enable_order_by_date(false); // we don't want to do anything to the feed
	$feed->set_url_replacements(array());
	$result = $feed->init();
	//$feed->handle_content_type();
	//$feed->get_title();
	if ($result && (!is_array($feed->data) || count($feed->data) == 0)) {
		die('Sorry, no feed items found');
	}
}

////////////////////////////////////////////////////////////////////////////////
// Extract content from HTML (if URL is not feed or explicit HTML request has been made)
////////////////////////////////////////////////////////////////////////////////
if ($html_only || !$result) {
	$html = @file_get_contents($url);
	if (!$html) die('Error retrieving '.$url);
	$node = grabArticle($html);
	$title = $node->firstChild->textContent;
	$content = $node->ownerDocument->saveXML($node->lastChild);
	unset($node, $html);
	$output = new FeedWriter(); //ATOM an option
	$output->setTitle($title);
	$output->setDescription("Content extracted by fivefilters.org from $url");
	if ($format == 'atom') {
		$output->setChannelElement('updated', date(DATE_ATOM));
		$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
	}
	$output->setLink($url);
	$newitem = $output->createNewItem();
	$newitem->setTitle($title);
	$newitem->setLink($url);
	if ($format == 'atom') {
		$newitem->setDate(time());
		$newitem->addElement('content', $content);
	} else {
		$newitem->setDescription($content);
	}
	$output->addItem($newitem);
	$output->genarateFeed(); 
	exit;
}

////////////////////////////////////////////
// Create full-text feed
////////////////////////////////////////////

$output = new FeedWriter(); //ATOM an option
$output->setTitle($feed->get_title());
$output->setDescription('[full-text feed from fivefilters.org]: '.$feed->get_description());
$output->setLink($feed->get_link());
if ($img_url = $feed->get_image_url()) {
	$output->setImage($feed->get_title(), $feed->get_link(), $img_url);
}
if ($format == 'atom') {
	$output->setChannelElement('updated', date(DATE_ATOM));
	$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
}

////////////////////////////////////////////
// Loop through feed items
////////////////////////////////////////////
$items = $feed->get_items(0, 1);	 
foreach ($items as $item) {
	// some URLs appear to have characters HTML encoded - does decoding affect other URLs?
	$permalink = htmlspecialchars_decode($item->get_permalink());
	$permalink = filter_var($permalink, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);
	
        if ($permalink !== false && $permalink !== null && preg_match('!^https?://!', $permalink)) {
		$permalink = filter_var($permalink, FILTER_SANITIZE_URL);
	} else {
		$permalink = false;
	}
	$newitem = $output->createNewItem();
	$newitem->setTitle(htmlspecialchars_decode($item->get_title()));
	if ($permalink !== false) {
		$newitem->setLink($permalink);
	} else {
		$newitem->setLink($item->get_permalink());
	}
	
	if ($permalink && $html = @file_get_contents($permalink)) {
		$html = grabArticleHtml($html, false);
                $html = str_n_words($html, 100);

	} else {
		$html = '<p><em>[fivefilters.org: unable to retrieve full-text content]</em></p>';
		$html .= $item->get_description();
	}
	if ($format == 'atom') {
		$newitem->addElement('content', $html);
		$newitem->setDate((int)$item->get_date('U'));
		if ($author = $item->get_author()) {
			$newitem->addElement('author', array('name'=>$author->get_name()));
		}
	} else {
		$newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
		$newitem->setDescription($html);
		if ((int)$item->get_date('U') > 0) {
			$newitem->setDate((int)$item->get_date('U'));
		}
		if ($author = $item->get_author()) {
			$newitem->addElement('dc:creator', $author->get_name());
		}
	}
	$output->addItem($newitem);
	unset($html);
}
// output feed
ob_start();
$output->genarateFeed();
$output = ob_get_contents();
ob_end_clean();
file_put_contents($cache_file, $output);
echo $output;

?>

Code (markup):

sheldon365, Dec 27, 2010 IP

sheldon365 Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#9

Any Ideas Guys??

sheldon365, Dec 30, 2010 IP

drctaccess Peon

Messages:: 62

Likes Received:: 1

Best Answers:: 1

Trophy Points:: 0

#10

Use this:


<?php
error_reporting(E_ALL ^ E_NOTICE);
ini_set("display_errors", 1);
@set_time_limit(120);

// Include SimplePie for RSS/Atom parsing
require_once('libraries/simplepie/simplepie.inc');
// Include FeedCreator for RSS/Atom creation
//require_once('libraries/feedcreator/include/feedcreator.class.php');
require_once('libraries/feedwriter/FeedWriter.php');
require_once('libraries/feedwriter/FeedItem.php');
// Include readability.php for identifying and extracting content from URLs
require_once('readability.php');

////////////////////////////////
// Check for feed URL
////////////////////////////////
if (!isset($_GET['url'])) {
        die('No URL supplied');
}
$url = $_GET['url'];
if (!preg_match('!^https?://.+!i', $url)) {
        $url = 'http://'.$url;
}
$valid_url = filter_var($url, FILTER_VALIDATE_URL);
if ($valid_url !== false && $valid_url !== null && preg_match('!^https?://!', $valid_url)) {
        $url = filter_var($url, FILTER_SANITIZE_URL);
} else {
        die('Invalid URL supplied');
}

///////////////////////////////////////////////
// Check if the request is explicitly for an HTML page
///////////////////////////////////////////////
$html_only = (isset($_GET['html']) && $_GET['html'] == 'true');

////////////////////////////////
// Check for valid format
////////////////////////////////
$format = 'rss';

//////////////////////////////////
// Check for cached copy
//////////////////////////////////
$cache_file = 'cache/'.md5($url).'.xml';
if (file_exists($cache_file)) {
        $cache_mtime = filemtime($cache_file);
        $diff = time() - $cache_mtime;
        $diff = $diff / 60;
        if ($diff < 10) { // cache created less than 10 minutes ago
                header("Content-type: text/xml; charset=UTF-8");
                if (headers_sent()) die('Some data has already been output to browser, can\'t send RSS file');
                readfile($cache_file);
                exit;
        }
}

////////////////////////////////
// Get RSS/Atom feed
////////////////////////////////
if (!$html_only) {
        $feed = new SimplePie();
        $feed->set_feed_url($url);
        $feed->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);
        $feed->set_timeout(20);
        $feed->enable_cache(false);
        $feed->set_stupidly_fast(true);
        $feed->enable_order_by_date(false); // we don't want to do anything to the feed
        $feed->set_url_replacements(array());
        $result = $feed->init();
        //$feed->handle_content_type();
        //$feed->get_title();
        if ($result && (!is_array($feed->data) || count($feed->data) == 0)) {
                die('Sorry, no feed items found');
        }
}

////////////////////////////////////////////////////////////////////////////////
// Extract content from HTML (if URL is not feed or explicit HTML request has been made)
////////////////////////////////////////////////////////////////////////////////
if ($html_only || !$result) {
        $html = @file_get_contents($url);
        if (!$html) die('Error retrieving '.$url);
        $node = grabArticle($html);
        $title = $node->firstChild->textContent;
        $content = $node->ownerDocument->saveXML($node->lastChild);
        unset($node, $html);
        $output = new FeedWriter(); //ATOM an option
        $output->setTitle($title);
        $output->setDescription("Content extracted by fivefilters.org from $url");
        if ($format == 'atom') {
                $output->setChannelElement('updated', date(DATE_ATOM));
                $output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
        }
        $output->setLink($url);
        $newitem = $output->createNewItem();
        $newitem->setTitle($title);
        $newitem->setLink($url);
        if ($format == 'atom') {
                $newitem->setDate(time());
                $newitem->addElement('content', $content);
        } else {
                $newitem->setDescription($content);
        }
        $output->addItem($newitem);
        $output->genarateFeed();
        exit;
}

////////////////////////////////////////////
// Create full-text feed
////////////////////////////////////////////

$output = new FeedWriter(); //ATOM an option
$output->setTitle($feed->get_title());
$output->setDescription('[full-text feed from fivefilters.org]: '.$feed->get_description());
$output->setLink($feed->get_link());
if ($img_url = $feed->get_image_url()) {
        $output->setImage($feed->get_title(), $feed->get_link(), $img_url);
}
if ($format == 'atom') {
        $output->setChannelElement('updated', date(DATE_ATOM));
        $output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
}

////////////////////////////////////////////
// Loop through feed items
////////////////////////////////////////////
$items = $feed->get_items(0, 1);
foreach ($items as $item) {
        // some URLs appear to have characters HTML encoded - does decoding affect other URLs?
        $permalink = htmlspecialchars_decode($item->get_permalink());
        $permalink = filter_var($permalink, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);

        if ($permalink !== false && $permalink !== null && preg_match('!^https?://!', $permalink)) {
                $permalink = filter_var($permalink, FILTER_SANITIZE_URL);
        } else {
                $permalink = false;
        }
        $newitem = $output->createNewItem();
        $newitem->setTitle(htmlspecialchars_decode($item->get_title()));
        if ($permalink !== false) {
                $newitem->setLink($permalink);
        } else {
                $newitem->setLink($item->get_permalink());
        }

        if ($permalink && $html = @file_get_contents($permalink)) {
                $html = grabArticleHtml($html, false);
                $html = str_n_words($html, 100);

        } else {
                $html = '<p><em>[fivefilters.org: unable to retrieve full-text content]</em></p>';
                $html .= $item->get_description();
        }
        if ($format == 'atom') {
                $newitem->addElement('content', $html);
                $newitem->setDate((int)$item->get_date('U'));
                if ($author = $item->get_author()) {
                        $newitem->addElement('author', array('name'=>$author->get_name()));
                }
        } else {
                $newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
                $newitem->setDescription($html);
                if ((int)$item->get_date('U') > 0) {
                        $newitem->setDate((int)$item->get_date('U'));
                }
                if ($author = $item->get_author()) {
                        $newitem->addElement('dc:creator', $author->get_name());
                }
        }
        $output->addItem($newitem);
        unset($html);
}
// output feed
ob_start();
$output->genarateFeed();
$output = ob_get_contents();
ob_end_clean();
file_put_contents($cache_file, $output);
echo $output;


function str_n_words($str, $word_count)
{
    $str_split = explode(' ', $str);
    if(count($str_split) <= $word_count)
    {
        return $str;
    }

    array_splice($str_split, $word_count);
    return implode(' ', $str_split);
}
?>

Code (markup):

and you will not get syntax error.

I hope this helps

drctaccess, Dec 30, 2010 IP

mastermunj Well-Known Member

Messages:: 687

Likes Received:: 13

Best Answers:: 0

Trophy Points:: 110

#11

@drctaccess, Thanks, that was the change needed.

@sheldon365, try changes given by drctaccess and let us know if you face any difficulty.

mastermunj, Dec 30, 2010 IP

sheldon365 Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#12

Works without any errors. How many words has it been set to? If I have to increase the number of words to be displayed to say 350. What would I have to change for that?

sheldon365, Dec 30, 2010 IP

drctaccess Peon

Messages:: 62

Likes Received:: 1

Best Answers:: 1

Trophy Points:: 0

#13

right now is set to 100 words .. if you want to modify the number find this line
$html = str_n_words($html, 100);
Code (markup):
and replace 100 with your desired number.

I hope this helps

drctaccess, Dec 30, 2010 IP

sheldon365 Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#14

Thank you so much. Will test it for a few days and get back to you.

sheldon365, Dec 30, 2010 IP

sheldon365 Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#15

It Works!!!

sheldon365, Feb 3, 2011 IP

Log in or Sign up

How to edit PHP code to except only the first 200 words?

sheldon365 Greenhorn

sheldon365 Greenhorn

mastermunj Well-Known Member

sheldon365 Greenhorn

mastermunj Well-Known Member

sheldon365 Greenhorn

mastermunj Well-Known Member

sheldon365 Greenhorn

sheldon365 Greenhorn

drctaccess Peon

mastermunj Well-Known Member

sheldon365 Greenhorn

drctaccess Peon

sheldon365 Greenhorn

sheldon365 Greenhorn

Useful Searches