View Full Version : RSS parser... strange characters!
AnKaRoTH
Nov 11th 2005, 3:42 am
Hi, I'm using MagpieRSS to parse some RSS feeds and add titles and links from the RSS in a database. My problem is with non-english characters, as letters with accents (á é ó...) and some other characters as "¿".
If I set MagpieRSS to run with ISO-8859-1, then only RSS in ISO-8859-1 display all the characters correctly, but I need some RSS which are in UTF-8 and in this case all non-english characters are replaced by "¿".
If I set MagpieRSS to run with UTF-8, RSS in ISO-8859-1 continue with no problems, but yes with UTF-8's RSS... in this case the characters are replaced by strange characters, i.e "ó".
So I can do a lot of eregi_replaces in order to replace strange characters with the correct ones... but each RSS feed shows different characters for the same letters! so I have too many eregi_replaces.
How can I configure correctly MagpieRSS to get spanish characters?
Thanks.
Edit: Sorry I posted this two times because I got an error and I thought it hadn't been posted.
garysims
Nov 11th 2005, 5:39 am
What do you mean by "if I set MagpieRSS to run with ISO-8859-1". MagpieRSS is only a parser.
Do you mean you set the encoding of the HTML page you create after using MagpieRSS to parse the feed?
What version of PHP are you using. What version of MagpieRSS are you using?
AnKaRoTH
Nov 11th 2005, 6:08 am
PHP Version: 4.3.11
Magpie Version: 0.72
Magpie has 3 constants you can use to set the encoding:
MAGPIE_OUTPUT_ENCODING
MAGPIE_INPUT_ENCODING
MAGPIE_DETECT_ENCODING
In the code, they appear as:
if ( !defined('MAGPIE_OUTPUT_ENCODING') ) {
define('MAGPIE_OUTPUT_ENCODING', 'UTF-8');
}
if ( !defined('MAGPIE_INPUT_ENCODING') ) {
define('MAGPIE_INPUT_ENCODING', 'UTF-8');
}
if ( !defined('MAGPIE_DETECT_ENCODING') ) {
define('MAGPIE_DETECT_ENCODING', false);
}
garysims
Nov 11th 2005, 6:33 am
What happens if you set MAGPIE_DETECT_ENCODING to true and don't set the input and output encodings?
garysims
Nov 11th 2005, 6:37 am
Reading the code there is a section in rss_parse.inc where it initiates the xml parser and there is different code for PHP4 and PHP5.
For 4 it says:
Unfortunately PHP4's support for character encodings and especially XML and character encodings sucks. As long as the documents you parse only contain characters from the ISO-8859-1 character set (a superset of ASCII, and a subset of UTF-8) you're fine. However once you step out of that comfy little world things get mad, bad, and dangerous to know.
garysims
Nov 11th 2005, 6:44 am
The PHP4 code is based on work by Steve Minutillo for http://feedonfeeds.com/
There is a pointer to the following weblog
http://minutillo.com/steve/weblog/2004/6/17/php-xml-and-character-encodings-a-tale-of-sadness-rage-and-data-loss
It might be worth you reading that.
After that I can't help much more... Sorry... My web site still uses a version of MagpieRSS < 0.7 and I don't have time now to play with the new version.
I would be intersted to hear what success you have... Looking at the code I still think auto detect is your best option...
Good luck!
AnKaRoTH
Nov 11th 2005, 6:56 am
I think I've tried all the possibilities with those 3 constants and I always have the same problem :confused:
AnKaRoTH
Nov 11th 2005, 7:12 am
In that weblog it says:
"Update: This code has been finalized and debugged, and is now shipped as part of MagpieRSS 0.7! Sadness and rage no more!"
So my Magpie version (0.72) already has the needed code... maybe it's a problem of my server configuration?
AnKaRoTH
Nov 11th 2005, 7:38 am
I've found the problem!! But don't ask me why, because I don't know. Playing a bit with the code in the rss_parse.inc file, I realized the function php4_create_parser wasn't getting the correct values for the variables $in_enc and $detect, so I gave them the values directly after the function beginning:
function php4_create_parser($source, $in_enc, $detect) {
$detect = true;
$in_enc = null;
I really don't know why the function didn't get the correct values from the variables, as the code seems to be ok.
So, now the parser detects the RSS encoding and the function xml_parser_create does its job.
Thanks garysims.
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.