1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How to edit PHP code to except only the first 200 words?

Discussion in 'PHP' started by sheldon365, Dec 26, 2010.

  1. #1
    I have a code from five filters. This code converts rss to full text, but to avoid copyright issues I would like to add the first 200 words of the rss feed only. I would like to know what would I have to edit to let only the first 200 words to be displayed.
    ////////////////////////////////
    // Check for feed URL
    ////////////////////////////////
    if (!isset($_GET['url'])) { 
    	die('No URL supplied'); 
    }
    $url = $_GET['url'];
    if (!preg_match('!^https?://.+!i', $url)) {
    	$url = 'http://'.$url;
    }
    $valid_url = filter_var($url, FILTER_VALIDATE_URL);
    if ($valid_url !== false && $valid_url !== null && preg_match('!^https?://!', $valid_url)) {
    	$url = filter_var($url, FILTER_SANITIZE_URL);
    } else {
    	die('Invalid URL supplied');
    }
    
    ///////////////////////////////////////////////
    // Check if the request is explicitly for an HTML page
    ///////////////////////////////////////////////
    $html_only = (isset($_GET['html']) && $_GET['html'] == 'true');
    
    ////////////////////////////////
    // Check for valid format
    ////////////////////////////////
    $format = 'rss';
    
    //////////////////////////////////
    // Check for cached copy
    //////////////////////////////////
    $cache_file = 'cache/'.md5($url).'.xml';
    if (file_exists($cache_file)) {
    	$cache_mtime = filemtime($cache_file);
    	$diff = time() - $cache_mtime;
    	$diff = $diff / 60;
    	if ($diff < 10) { // cache created less than 10 minutes ago
    		header("Content-type: text/xml; charset=UTF-8");
    		if (headers_sent()) die('Some data has already been output to browser, can\'t send RSS file');
    		readfile($cache_file);
    		exit;
    	}
    }
    
    ////////////////////////////////
    // Get RSS/Atom feed
    ////////////////////////////////
    if (!$html_only) {
    	$feed = new SimplePie();
    	$feed->set_feed_url($url);
    	$feed->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);
    	$feed->set_timeout(20);
    	$feed->enable_cache(false);
    	$feed->set_stupidly_fast(true);
    	$feed->enable_order_by_date(false); // we don't want to do anything to the feed
    	$feed->set_url_replacements(array());
    	$result = $feed->init();
    	//$feed->handle_content_type();
    	//$feed->get_title();
    	if ($result && (!is_array($feed->data) || count($feed->data) == 0)) {
    		die('Sorry, no feed items found');
    	}
    }
    
    ////////////////////////////////////////////////////////////////////////////////
    // Extract content from HTML (if URL is not feed or explicit HTML request has been made)
    ////////////////////////////////////////////////////////////////////////////////
    if ($html_only || !$result) {
    	$html = @file_get_contents($url);
    	if (!$html) die('Error retrieving '.$url);
    	$node = grabArticle($html);
    	$title = $node->firstChild->textContent;
    	$content = $node->ownerDocument->saveXML($node->lastChild);
    	unset($node, $html);
    	$output = new FeedWriter(); //ATOM an option
    	$output->setTitle($title);
    	$output->setDescription("Content extracted by fivefilters.org from $url");
    	if ($format == 'atom') {
    		$output->setChannelElement('updated', date(DATE_ATOM));
    		$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
    	}
    	$output->setLink($url);
    	$newitem = $output->createNewItem();
    	$newitem->setTitle($title);
    	$newitem->setLink($url);
    	if ($format == 'atom') {
    		$newitem->setDate(time());
    		$newitem->addElement('content', $content);
    	} else {
    		$newitem->setDescription($content);
    	}
    	$output->addItem($newitem);
    	$output->genarateFeed(); 
    	exit;
    }
    
    ////////////////////////////////////////////
    // Create full-text feed
    ////////////////////////////////////////////
    
    $output = new FeedWriter(); //ATOM an option
    $output->setTitle($feed->get_title());
    $output->setDescription('[full-text feed from fivefilters.org]: '.$feed->get_description());
    $output->setLink($feed->get_link());
    if ($img_url = $feed->get_image_url()) {
    	$output->setImage($feed->get_title(), $feed->get_link(), $img_url);
    }
    if ($format == 'atom') {
    	$output->setChannelElement('updated', date(DATE_ATOM));
    	$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
    }
    
    ////////////////////////////////////////////
    // Loop through feed items
    ////////////////////////////////////////////
    $items = $feed->get_items(0, 15);	 
    foreach ($items as $item) {
    	// some URLs appear to have characters HTML encoded - does decoding affect other URLs?
    	$permalink = htmlspecialchars_decode($item->get_permalink());
    	$permalink = filter_var($permalink, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);
    	if ($permalink !== false && $permalink !== null && preg_match('!^https?://!', $permalink)) {
    		$permalink = filter_var($permalink, FILTER_SANITIZE_URL);
    	} else {
    		$permalink = false;
    	}
    	$newitem = $output->createNewItem();
    	$newitem->setTitle(htmlspecialchars_decode($item->get_title()));
    	if ($permalink !== false) {
    		$newitem->setLink($permalink);
    	} else {
    		$newitem->setLink($item->get_permalink());
    	}
    	
    	if ($permalink && $html = @file_get_contents($permalink)) {
    		$html = grabArticleHtml($html, false);
    	} else {
    		$html = '<p><em>[fivefilters.org: unable to retrieve full-text content]</em></p>';
    		$html .= $item->get_description();
    	}
    	if ($format == 'atom') {
    		$newitem->addElement('content', $html);
    		$newitem->setDate((int)$item->get_date('U'));
    		if ($author = $item->get_author()) {
    			$newitem->addElement('author', array('name'=>$author->get_name()));
    		}
    	} else {
    		$newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
    		$newitem->setDescription($html);
    		if ((int)$item->get_date('U') > 0) {
    			$newitem->setDate((int)$item->get_date('U'));
    		}
    		if ($author = $item->get_author()) {
    			$newitem->addElement('dc:creator', $author->get_name());
    		}
    	}
    	$output->addItem($newitem);
    	unset($html);
    }
    // output feed
    ob_start();
    $output->genarateFeed();
    $output = ob_get_contents();
    ob_end_clean();
    file_put_contents($cache_file, $output);
    echo $output;
    ?>
    Code (markup):

     
    sheldon365, Dec 26, 2010 IP
  2. sheldon365

    sheldon365 Greenhorn

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #2
    60+ views and yet not a single reply.
     
    sheldon365, Dec 27, 2010 IP
  3. mastermunj

    mastermunj Well-Known Member

    Messages:
    687
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    110
    #3
    Though I am not sure how accurate following solution would be, try it and let me know if you face any difficulty.

    Replace
    
    $html = grabArticleHtml($html, false);
    
    PHP:
    with

    
    $html = grabArticleHtml($html, false);
    $html = str_n_words($html, 200);
    
    PHP:

    Also, copy following function into same file.

    
    function str_n_words($str, $word_count)
    {
    	$str_split = explode(' ', $str);
    	if(count($str_split) <= $word_count)
    	{
    		return $str;
    	}
    	
    	array_splice($str_split, $word_count);
    	return implode(' ', $str_split);
    }
    
    PHP:
     
    mastermunj, Dec 27, 2010 IP
  4. sheldon365

    sheldon365 Greenhorn

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #4
    // Include SimplePie for RSS/Atom parsing
    require_once('libraries/simplepie/simplepie.inc');
    // Include FeedCreator for RSS/Atom creation
    //require_once('libraries/feedcreator/include/feedcreator.class.php');
    require_once('libraries/feedwriter/FeedWriter.php');
    require_once('libraries/feedwriter/FeedItem.php');
    // Include readability.php for identifying and extracting content from URLs
    require_once('readability.php');
    
    ////////////////////////////////
    // Check for feed URL
    ////////////////////////////////
    if (!isset($_GET['url'])) { 
    	die('No URL supplied'); 
    }
    $url = $_GET['url'];
    if (!preg_match('!^https?://.+!i', $url)) {
    	$url = 'http://'.$url;
    }
    $valid_url = filter_var($url, FILTER_VALIDATE_URL);
    if ($valid_url !== false && $valid_url !== null && preg_match('!^https?://!', $valid_url)) {
    	$url = filter_var($url, FILTER_SANITIZE_URL);
    } else {
    	die('Invalid URL supplied');
    }
    
    ///////////////////////////////////////////////
    // Check if the request is explicitly for an HTML page
    ///////////////////////////////////////////////
    $html_only = (isset($_GET['html']) && $_GET['html'] == 'true');
    
    ////////////////////////////////
    // Check for valid format
    ////////////////////////////////
    $format = 'rss';
    
    //////////////////////////////////
    // Check for cached copy
    //////////////////////////////////
    $cache_file = 'cache/'.md5($url).'.xml';
    if (file_exists($cache_file)) {
    	$cache_mtime = filemtime($cache_file);
    	$diff = time() - $cache_mtime;
    	$diff = $diff / 60;
    	if ($diff < 10) { // cache created less than 10 minutes ago
    		header("Content-type: text/xml; charset=UTF-8");
    		if (headers_sent()) die('Some data has already been output to browser, can\'t send RSS file');
    		readfile($cache_file);
    		exit;
    	}
    }
    
    ////////////////////////////////
    // Get RSS/Atom feed
    ////////////////////////////////
    if (!$html_only) {
    	$feed = new SimplePie();
    	$feed->set_feed_url($url);
    	$feed->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);
    	$feed->set_timeout(20);
    	$feed->enable_cache(false);
    	$feed->set_stupidly_fast(true);
    	$feed->enable_order_by_date(false); // we don't want to do anything to the feed
    	$feed->set_url_replacements(array());
    	$result = $feed->init();
    	//$feed->handle_content_type();
    	//$feed->get_title();
    	if ($result && (!is_array($feed->data) || count($feed->data) == 0)) {
    		die('Sorry, no feed items found');
    	}
    }
    
    ////////////////////////////////////////////////////////////////////////////////
    // Extract content from HTML (if URL is not feed or explicit HTML request has been made)
    ////////////////////////////////////////////////////////////////////////////////
    if ($html_only || !$result) {
    	$html = @file_get_contents($url);
    	if (!$html) die('Error retrieving '.$url);
    	$node = grabArticle($html);
    	$title = $node->firstChild->textContent;
    	$content = $node->ownerDocument->saveXML($node->lastChild);
    	unset($node, $html);
    	$output = new FeedWriter(); //ATOM an option
    	$output->setTitle($title);
    	$output->setDescription("Content extracted by fivefilters.org from $url");
    	if ($format == 'atom') {
    		$output->setChannelElement('updated', date(DATE_ATOM));
    		$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
    	}
    	$output->setLink($url);
    	$newitem = $output->createNewItem();
    	$newitem->setTitle($title);
    	$newitem->setLink($url);
    	if ($format == 'atom') {
    		$newitem->setDate(time());
    		$newitem->addElement('content', $content);
    	} else {
    		$newitem->setDescription($content);
    	}
    	$output->addItem($newitem);
    	$output->genarateFeed(); 
    	exit;
    }
    
    ////////////////////////////////////////////
    // Create full-text feed
    ////////////////////////////////////////////
    
    $output = new FeedWriter(); //ATOM an option
    $output->setTitle($feed->get_title());
    $output->setDescription('[full-text feed from fivefilters.org]: '.$feed->get_description());
    $output->setLink($feed->get_link());
    if ($img_url = $feed->get_image_url()) {
    	$output->setImage($feed->get_title(), $feed->get_link(), $img_url);
    }
    if ($format == 'atom') {
    	$output->setChannelElement('updated', date(DATE_ATOM));
    	$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
    }
    
    ////////////////////////////////////////////
    // Loop through feed items
    ////////////////////////////////////////////
    $items = $feed->get_items(0, 1);	 
    foreach ($items as $item) {
    	// some URLs appear to have characters HTML encoded - does decoding affect other URLs?
    	$permalink = htmlspecialchars_decode($item->get_permalink());
    	$permalink = filter_var($permalink, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);
    	function str_n_words($str, $word_count)
    {
        $str_split = explode(' ', $str);
        if(count($str_split) <= $word_count)
        {
            return $str;
        }
        
        array_splice($str_split, $word_count);
        return implode(' ', $str_split);
    }
            if ($permalink !== false && $permalink !== null && preg_match('!^https?://!', $permalink)) {
    		$permalink = filter_var($permalink, FILTER_SANITIZE_URL);
    	} else {
    		$permalink = false;
    	}
    	$newitem = $output->createNewItem();
    	$newitem->setTitle(htmlspecialchars_decode($item->get_title()));
    	if ($permalink !== false) {
    		$newitem->setLink($permalink);
    	} else {
    		$newitem->setLink($item->get_permalink());
    	}
    	
    	if ($permalink && $html = @file_get_contents($permalink)) {
    		$html = grabArticleHtml($html, false);
                    $html = str_n_words($html, 200);
    
    	} else {
    		$html = '<p><em>[fivefilters.org: unable to retrieve full-text content]</em></p>';
    		$html .= $item->get_description();
    	}
    	if ($format == 'atom') {
    		$newitem->addElement('content', $html);
    		$newitem->setDate((int)$item->get_date('U'));
    		if ($author = $item->get_author()) {
    			$newitem->addElement('author', array('name'=>$author->get_name()));
    		}
    	} else {
    		$newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
    		$newitem->setDescription($html);
    		if ((int)$item->get_date('U') > 0) {
    			$newitem->setDate((int)$item->get_date('U'));
    		}
    		if ($author = $item->get_author()) {
    			$newitem->addElement('dc:creator', $author->get_name());
    		}
    	}
    	$output->addItem($newitem);
    	unset($html);
    }
    // output feed
    ob_start();
    $output->genarateFeed();
    $output = ob_get_contents();
    ob_end_clean();
    file_put_contents($cache_file, $output);
    echo $output;
    ?>
    Code (markup):
    Please let me know if I have placed the function in the right place. Also there are no errors when I convert it to full rss.
     
    sheldon365, Dec 27, 2010 IP
  5. mastermunj

    mastermunj Well-Known Member

    Messages:
    687
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    110
    #5
    place the function in same file where this code is placed.
     
    mastermunj, Dec 27, 2010 IP
  6. sheldon365

    sheldon365 Greenhorn

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #6
    Please check my code placed in the above post and see if it is right. If it is wrong please place it in the right place and post in the next post.
     
    sheldon365, Dec 27, 2010 IP
  7. mastermunj

    mastermunj Well-Known Member

    Messages:
    687
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    110
    #7
    That is wrong placement.

    Keep the function at either beginning of file after "<?" or at end of file before "?>".
     
    mastermunj, Dec 27, 2010 IP
  8. sheldon365

    sheldon365 Greenhorn

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #8
    This is the error I get when I place it like this.
    Parse error: syntax error, unexpected T_STRING on line 33

    <?function str_n_words($str, $word_count)
    {
        $str_split = explode(' ', $str);
        if(count($str_split) <= $word_count)
        {
            return $str;
        }
        
        array_splice($str_split, $word_count);
        return implode(' ', $str_split);
    }
    php
    // Create Full-Text Feeds
    // Author: Keyvan Minoukadeh
    // License: AGPLv3
    // Date: 2009-08-03
    // How to use: request this file passing it your feed in the querystring: makefulltextfeed.php?url=http://mysite.org
    
    /*
    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU Affero General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
    
    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Affero General Public License for more details.
    
    You should have received a copy of the GNU Affero General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
    */
    error_reporting(E_ALL ^ E_NOTICE);
    ini_set("display_errors", 1);
    @set_time_limit(120);
    
    // Include SimplePie for RSS/Atom parsing
    require_once('libraries/simplepie/simplepie.inc');
    // Include FeedCreator for RSS/Atom creation
    //require_once('libraries/feedcreator/include/feedcreator.class.php');
    require_once('libraries/feedwriter/FeedWriter.php');
    require_once('libraries/feedwriter/FeedItem.php');
    // Include readability.php for identifying and extracting content from URLs
    require_once('readability.php');
    
    ////////////////////////////////
    // Check for feed URL
    ////////////////////////////////
    if (!isset($_GET['url'])) { 
    	die('No URL supplied'); 
    }
    $url = $_GET['url'];
    if (!preg_match('!^https?://.+!i', $url)) {
    	$url = 'http://'.$url;
    }
    $valid_url = filter_var($url, FILTER_VALIDATE_URL);
    if ($valid_url !== false && $valid_url !== null && preg_match('!^https?://!', $valid_url)) {
    	$url = filter_var($url, FILTER_SANITIZE_URL);
    } else {
    	die('Invalid URL supplied');
    }
    
    ///////////////////////////////////////////////
    // Check if the request is explicitly for an HTML page
    ///////////////////////////////////////////////
    $html_only = (isset($_GET['html']) && $_GET['html'] == 'true');
    
    ////////////////////////////////
    // Check for valid format
    ////////////////////////////////
    $format = 'rss';
    
    //////////////////////////////////
    // Check for cached copy
    //////////////////////////////////
    $cache_file = 'cache/'.md5($url).'.xml';
    if (file_exists($cache_file)) {
    	$cache_mtime = filemtime($cache_file);
    	$diff = time() - $cache_mtime;
    	$diff = $diff / 60;
    	if ($diff < 10) { // cache created less than 10 minutes ago
    		header("Content-type: text/xml; charset=UTF-8");
    		if (headers_sent()) die('Some data has already been output to browser, can\'t send RSS file');
    		readfile($cache_file);
    		exit;
    	}
    }
    
    ////////////////////////////////
    // Get RSS/Atom feed
    ////////////////////////////////
    if (!$html_only) {
    	$feed = new SimplePie();
    	$feed->set_feed_url($url);
    	$feed->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);
    	$feed->set_timeout(20);
    	$feed->enable_cache(false);
    	$feed->set_stupidly_fast(true);
    	$feed->enable_order_by_date(false); // we don't want to do anything to the feed
    	$feed->set_url_replacements(array());
    	$result = $feed->init();
    	//$feed->handle_content_type();
    	//$feed->get_title();
    	if ($result && (!is_array($feed->data) || count($feed->data) == 0)) {
    		die('Sorry, no feed items found');
    	}
    }
    
    ////////////////////////////////////////////////////////////////////////////////
    // Extract content from HTML (if URL is not feed or explicit HTML request has been made)
    ////////////////////////////////////////////////////////////////////////////////
    if ($html_only || !$result) {
    	$html = @file_get_contents($url);
    	if (!$html) die('Error retrieving '.$url);
    	$node = grabArticle($html);
    	$title = $node->firstChild->textContent;
    	$content = $node->ownerDocument->saveXML($node->lastChild);
    	unset($node, $html);
    	$output = new FeedWriter(); //ATOM an option
    	$output->setTitle($title);
    	$output->setDescription("Content extracted by fivefilters.org from $url");
    	if ($format == 'atom') {
    		$output->setChannelElement('updated', date(DATE_ATOM));
    		$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
    	}
    	$output->setLink($url);
    	$newitem = $output->createNewItem();
    	$newitem->setTitle($title);
    	$newitem->setLink($url);
    	if ($format == 'atom') {
    		$newitem->setDate(time());
    		$newitem->addElement('content', $content);
    	} else {
    		$newitem->setDescription($content);
    	}
    	$output->addItem($newitem);
    	$output->genarateFeed(); 
    	exit;
    }
    
    ////////////////////////////////////////////
    // Create full-text feed
    ////////////////////////////////////////////
    
    $output = new FeedWriter(); //ATOM an option
    $output->setTitle($feed->get_title());
    $output->setDescription('[full-text feed from fivefilters.org]: '.$feed->get_description());
    $output->setLink($feed->get_link());
    if ($img_url = $feed->get_image_url()) {
    	$output->setImage($feed->get_title(), $feed->get_link(), $img_url);
    }
    if ($format == 'atom') {
    	$output->setChannelElement('updated', date(DATE_ATOM));
    	$output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
    }
    
    ////////////////////////////////////////////
    // Loop through feed items
    ////////////////////////////////////////////
    $items = $feed->get_items(0, 1);	 
    foreach ($items as $item) {
    	// some URLs appear to have characters HTML encoded - does decoding affect other URLs?
    	$permalink = htmlspecialchars_decode($item->get_permalink());
    	$permalink = filter_var($permalink, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);
    	
            if ($permalink !== false && $permalink !== null && preg_match('!^https?://!', $permalink)) {
    		$permalink = filter_var($permalink, FILTER_SANITIZE_URL);
    	} else {
    		$permalink = false;
    	}
    	$newitem = $output->createNewItem();
    	$newitem->setTitle(htmlspecialchars_decode($item->get_title()));
    	if ($permalink !== false) {
    		$newitem->setLink($permalink);
    	} else {
    		$newitem->setLink($item->get_permalink());
    	}
    	
    	if ($permalink && $html = @file_get_contents($permalink)) {
    		$html = grabArticleHtml($html, false);
                    $html = str_n_words($html, 100);
    
    	} else {
    		$html = '<p><em>[fivefilters.org: unable to retrieve full-text content]</em></p>';
    		$html .= $item->get_description();
    	}
    	if ($format == 'atom') {
    		$newitem->addElement('content', $html);
    		$newitem->setDate((int)$item->get_date('U'));
    		if ($author = $item->get_author()) {
    			$newitem->addElement('author', array('name'=>$author->get_name()));
    		}
    	} else {
    		$newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
    		$newitem->setDescription($html);
    		if ((int)$item->get_date('U') > 0) {
    			$newitem->setDate((int)$item->get_date('U'));
    		}
    		if ($author = $item->get_author()) {
    			$newitem->addElement('dc:creator', $author->get_name());
    		}
    	}
    	$output->addItem($newitem);
    	unset($html);
    }
    // output feed
    ob_start();
    $output->genarateFeed();
    $output = ob_get_contents();
    ob_end_clean();
    file_put_contents($cache_file, $output);
    echo $output;
    
    ?>
    Code (markup):
     
    sheldon365, Dec 27, 2010 IP
  9. sheldon365

    sheldon365 Greenhorn

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #9
    Any Ideas Guys??
     
    sheldon365, Dec 30, 2010 IP
  10. drctaccess

    drctaccess Peon

    Messages:
    62
    Likes Received:
    1
    Best Answers:
    1
    Trophy Points:
    0
    #10
    Use this:

    
    <?php
    error_reporting(E_ALL ^ E_NOTICE);
    ini_set("display_errors", 1);
    @set_time_limit(120);
    
    // Include SimplePie for RSS/Atom parsing
    require_once('libraries/simplepie/simplepie.inc');
    // Include FeedCreator for RSS/Atom creation
    //require_once('libraries/feedcreator/include/feedcreator.class.php');
    require_once('libraries/feedwriter/FeedWriter.php');
    require_once('libraries/feedwriter/FeedItem.php');
    // Include readability.php for identifying and extracting content from URLs
    require_once('readability.php');
    
    ////////////////////////////////
    // Check for feed URL
    ////////////////////////////////
    if (!isset($_GET['url'])) {
            die('No URL supplied');
    }
    $url = $_GET['url'];
    if (!preg_match('!^https?://.+!i', $url)) {
            $url = 'http://'.$url;
    }
    $valid_url = filter_var($url, FILTER_VALIDATE_URL);
    if ($valid_url !== false && $valid_url !== null && preg_match('!^https?://!', $valid_url)) {
            $url = filter_var($url, FILTER_SANITIZE_URL);
    } else {
            die('Invalid URL supplied');
    }
    
    ///////////////////////////////////////////////
    // Check if the request is explicitly for an HTML page
    ///////////////////////////////////////////////
    $html_only = (isset($_GET['html']) && $_GET['html'] == 'true');
    
    ////////////////////////////////
    // Check for valid format
    ////////////////////////////////
    $format = 'rss';
    
    //////////////////////////////////
    // Check for cached copy
    //////////////////////////////////
    $cache_file = 'cache/'.md5($url).'.xml';
    if (file_exists($cache_file)) {
            $cache_mtime = filemtime($cache_file);
            $diff = time() - $cache_mtime;
            $diff = $diff / 60;
            if ($diff < 10) { // cache created less than 10 minutes ago
                    header("Content-type: text/xml; charset=UTF-8");
                    if (headers_sent()) die('Some data has already been output to browser, can\'t send RSS file');
                    readfile($cache_file);
                    exit;
            }
    }
    
    ////////////////////////////////
    // Get RSS/Atom feed
    ////////////////////////////////
    if (!$html_only) {
            $feed = new SimplePie();
            $feed->set_feed_url($url);
            $feed->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);
            $feed->set_timeout(20);
            $feed->enable_cache(false);
            $feed->set_stupidly_fast(true);
            $feed->enable_order_by_date(false); // we don't want to do anything to the feed
            $feed->set_url_replacements(array());
            $result = $feed->init();
            //$feed->handle_content_type();
            //$feed->get_title();
            if ($result && (!is_array($feed->data) || count($feed->data) == 0)) {
                    die('Sorry, no feed items found');
            }
    }
    
    ////////////////////////////////////////////////////////////////////////////////
    // Extract content from HTML (if URL is not feed or explicit HTML request has been made)
    ////////////////////////////////////////////////////////////////////////////////
    if ($html_only || !$result) {
            $html = @file_get_contents($url);
            if (!$html) die('Error retrieving '.$url);
            $node = grabArticle($html);
            $title = $node->firstChild->textContent;
            $content = $node->ownerDocument->saveXML($node->lastChild);
            unset($node, $html);
            $output = new FeedWriter(); //ATOM an option
            $output->setTitle($title);
            $output->setDescription("Content extracted by fivefilters.org from $url");
            if ($format == 'atom') {
                    $output->setChannelElement('updated', date(DATE_ATOM));
                    $output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
            }
            $output->setLink($url);
            $newitem = $output->createNewItem();
            $newitem->setTitle($title);
            $newitem->setLink($url);
            if ($format == 'atom') {
                    $newitem->setDate(time());
                    $newitem->addElement('content', $content);
            } else {
                    $newitem->setDescription($content);
            }
            $output->addItem($newitem);
            $output->genarateFeed();
            exit;
    }
    
    ////////////////////////////////////////////
    // Create full-text feed
    ////////////////////////////////////////////
    
    $output = new FeedWriter(); //ATOM an option
    $output->setTitle($feed->get_title());
    $output->setDescription('[full-text feed from fivefilters.org]: '.$feed->get_description());
    $output->setLink($feed->get_link());
    if ($img_url = $feed->get_image_url()) {
            $output->setImage($feed->get_title(), $feed->get_link(), $img_url);
    }
    if ($format == 'atom') {
            $output->setChannelElement('updated', date(DATE_ATOM));
            $output->setChannelElement('author', array('name'=>'Five Filters', 'uri'=>'http://fivefilters.org'));
    }
    
    ////////////////////////////////////////////
    // Loop through feed items
    ////////////////////////////////////////////
    $items = $feed->get_items(0, 1);
    foreach ($items as $item) {
            // some URLs appear to have characters HTML encoded - does decoding affect other URLs?
            $permalink = htmlspecialchars_decode($item->get_permalink());
            $permalink = filter_var($permalink, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED);
    
            if ($permalink !== false && $permalink !== null && preg_match('!^https?://!', $permalink)) {
                    $permalink = filter_var($permalink, FILTER_SANITIZE_URL);
            } else {
                    $permalink = false;
            }
            $newitem = $output->createNewItem();
            $newitem->setTitle(htmlspecialchars_decode($item->get_title()));
            if ($permalink !== false) {
                    $newitem->setLink($permalink);
            } else {
                    $newitem->setLink($item->get_permalink());
            }
    
            if ($permalink && $html = @file_get_contents($permalink)) {
                    $html = grabArticleHtml($html, false);
                    $html = str_n_words($html, 100);
    
            } else {
                    $html = '<p><em>[fivefilters.org: unable to retrieve full-text content]</em></p>';
                    $html .= $item->get_description();
            }
            if ($format == 'atom') {
                    $newitem->addElement('content', $html);
                    $newitem->setDate((int)$item->get_date('U'));
                    if ($author = $item->get_author()) {
                            $newitem->addElement('author', array('name'=>$author->get_name()));
                    }
            } else {
                    $newitem->addElement('guid', $item->get_permalink(), array('isPermaLink'=>'true'));
                    $newitem->setDescription($html);
                    if ((int)$item->get_date('U') > 0) {
                            $newitem->setDate((int)$item->get_date('U'));
                    }
                    if ($author = $item->get_author()) {
                            $newitem->addElement('dc:creator', $author->get_name());
                    }
            }
            $output->addItem($newitem);
            unset($html);
    }
    // output feed
    ob_start();
    $output->genarateFeed();
    $output = ob_get_contents();
    ob_end_clean();
    file_put_contents($cache_file, $output);
    echo $output;
    
    
    function str_n_words($str, $word_count)
    {
        $str_split = explode(' ', $str);
        if(count($str_split) <= $word_count)
        {
            return $str;
        }
    
        array_splice($str_split, $word_count);
        return implode(' ', $str_split);
    }
    ?>
    
    Code (markup):
    and you will not get syntax error.

    I hope this helps
     
    drctaccess, Dec 30, 2010 IP
  11. mastermunj

    mastermunj Well-Known Member

    Messages:
    687
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    110
    #11
    @drctaccess, Thanks, that was the change needed.

    @sheldon365, try changes given by drctaccess and let us know if you face any difficulty.
     
    mastermunj, Dec 30, 2010 IP
  12. sheldon365

    sheldon365 Greenhorn

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #12
    Works without any errors. How many words has it been set to? If I have to increase the number of words to be displayed to say 350. What would I have to change for that?
     
    sheldon365, Dec 30, 2010 IP
  13. drctaccess

    drctaccess Peon

    Messages:
    62
    Likes Received:
    1
    Best Answers:
    1
    Trophy Points:
    0
    #13
    right now is set to 100 words .. if you want to modify the number find this line
    
    $html = str_n_words($html, 100);
    
    Code (markup):
    and replace 100 with your desired number.

    I hope this helps
     
    drctaccess, Dec 30, 2010 IP
  14. sheldon365

    sheldon365 Greenhorn

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #14
    Thank you so much. Will test it for a few days and get back to you.
     
    sheldon365, Dec 30, 2010 IP
  15. sheldon365

    sheldon365 Greenhorn

    Messages:
    60
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #15
    It Works!!!
     
    sheldon365, Feb 3, 2011 IP