Weird characters throwning off rss feed scrape

I was hoping someone could help modify this.

The php below works well when scraping until it runs across a headline with a weird character like a "&" and a few others. Is there a fix for the code below?

Thanks in advance.

<?php

// Screen scraping your way into RSS
// Example script, by Dennis Pallett
// http://www.phpit.net/tutorials/screenscrap-rss

// Get page
$url = 

"http://www.urlgoeshere.com/";
$data = implode("", file($url)); 

// Get content items
preg_match_all ("/<div class=\"headline\">([^`]*?)<\/a/", $data, $matches);

// Begin feed
header ("Content-Type: text/xml; charset=ISO-8859-1");
echo "<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?>\n";
?>
<rss version="2.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns:admin="http://webns.net/mvcb/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <channel>
        <title>News</title>
        <description>The latest news from</description>
        <link>http://www.urlgoeshere.com</link>
        <language>en-us</language>


<?
// Loop through each content item
foreach ($matches[0] as $match) {
    // First, get title
    preg_match ("/\>([^`]*?)<\/a/", $match, $temp);
    $title = $temp['1'];
    $title = strip_tags($title);
    $title = trim($title);

    // Second, get url
    preg_match ("/<a href=\"([^`]*?)\">/", $match, $temp);
    $url = $temp['1'];
    $url = trim($url);

    // Echo RSS XML
    echo "<item>\n";
        echo "\t\t\t<title>" . strip_tags($title) . "</title>\n";
        echo "\t\t\t<link>http://www.urlgoeshere.com" . strip_tags($url) . "</link>\n";
        echo "\t\t\t<description>" . strip_tags($text) . "</description>\n";
        echo "\t\t\t<content:encoded><![CDATA[ \n";
        echo $text . "\n";
        echo " ]]></content:encoded>\n";
        echo "\t\t\t<dc:creator>" . strip_tags($author) . "</dc:creator>\n";
    echo "\t\t</item>\n";
}
?>
</channel>
</rss>

PHP:

Log in or Sign up

Weird characters throwning off rss feed scrape

soggy Active Member

imvain2 Peon

Log in or Sign up

Weird characters throwning off rss feed scrape

soggy Active Member

imvain2 Peon

Useful Searches