1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

RSS parser... strange characters!

Discussion in 'XML & RSS' started by AnKaRoTH, Nov 11, 2005.

  1. #1
    Hi, I'm using MagpieRSS to parse some RSS feeds and add titles and links from the RSS in a database. My problem is with non-english characters, as letters with accents (á é ó...) and some other characters as "¿".
    If I set MagpieRSS to run with ISO-8859-1, then only RSS in ISO-8859-1 display all the characters correctly, but I need some RSS which are in UTF-8 and in this case all non-english characters are replaced by "¿".
    If I set MagpieRSS to run with UTF-8, RSS in ISO-8859-1 continue with no problems, but yes with UTF-8's RSS... in this case the characters are replaced by strange characters, i.e "ó".

    So I can do a lot of eregi_replaces in order to replace strange characters with the correct ones... but each RSS feed shows different characters for the same letters! so I have too many eregi_replaces.
    How can I configure correctly MagpieRSS to get spanish characters?

    Thanks.

    Edit: Sorry I posted this two times because I got an error and I thought it hadn't been posted.
     
    AnKaRoTH, Nov 11, 2005 IP
  2. garysims

    garysims Well-Known Member

    Messages:
    287
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    108
    #2
    What do you mean by "if I set MagpieRSS to run with ISO-8859-1". MagpieRSS is only a parser.

    Do you mean you set the encoding of the HTML page you create after using MagpieRSS to parse the feed?

    What version of PHP are you using. What version of MagpieRSS are you using?
     
    garysims, Nov 11, 2005 IP
  3. AnKaRoTH

    AnKaRoTH Peon

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    PHP Version: 4.3.11
    Magpie Version: 0.72

    Magpie has 3 constants you can use to set the encoding:

    MAGPIE_OUTPUT_ENCODING
    MAGPIE_INPUT_ENCODING
    MAGPIE_DETECT_ENCODING

    In the code, they appear as:

    if ( !defined('MAGPIE_OUTPUT_ENCODING') ) {
    define('MAGPIE_OUTPUT_ENCODING', 'UTF-8');
    }

    if ( !defined('MAGPIE_INPUT_ENCODING') ) {
    define('MAGPIE_INPUT_ENCODING', 'UTF-8');
    }

    if ( !defined('MAGPIE_DETECT_ENCODING') ) {
    define('MAGPIE_DETECT_ENCODING', false);
    }
     
    AnKaRoTH, Nov 11, 2005 IP
  4. garysims

    garysims Well-Known Member

    Messages:
    287
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    108
    #4
    What happens if you set MAGPIE_DETECT_ENCODING to true and don't set the input and output encodings?
     
    garysims, Nov 11, 2005 IP
  5. garysims

    garysims Well-Known Member

    Messages:
    287
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    108
    #5
    Reading the code there is a section in rss_parse.inc where it initiates the xml parser and there is different code for PHP4 and PHP5.

    For 4 it says:

    Unfortunately PHP4's support for character encodings and especially XML and character encodings sucks. As long as the documents you parse only contain characters from the ISO-8859-1 character set (a superset of ASCII, and a subset of UTF-8) you're fine. However once you step out of that comfy little world things get mad, bad, and dangerous to know.
     
    garysims, Nov 11, 2005 IP
  6. garysims

    garysims Well-Known Member

    Messages:
    287
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    108
    #6
    The PHP4 code is based on work by Steve Minutillo for http://feedonfeeds.com/

    There is a pointer to the following weblog
    http://minutillo.com/steve/weblog/2004/6/17/php-xml-and-character-encodings-a-tale-of-sadness-rage-and-data-loss

    It might be worth you reading that.

    After that I can't help much more... Sorry... My web site still uses a version of MagpieRSS < 0.7 and I don't have time now to play with the new version.

    I would be intersted to hear what success you have... Looking at the code I still think auto detect is your best option...

    Good luck!
     
    garysims, Nov 11, 2005 IP
  7. AnKaRoTH

    AnKaRoTH Peon

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I think I've tried all the possibilities with those 3 constants and I always have the same problem :confused:
     
    AnKaRoTH, Nov 11, 2005 IP
  8. AnKaRoTH

    AnKaRoTH Peon

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    In that weblog it says:

    "Update: This code has been finalized and debugged, and is now shipped as part of MagpieRSS 0.7! Sadness and rage no more!"

    So my Magpie version (0.72) already has the needed code... maybe it's a problem of my server configuration?
     
    AnKaRoTH, Nov 11, 2005 IP
  9. AnKaRoTH

    AnKaRoTH Peon

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I've found the problem!! But don't ask me why, because I don't know. Playing a bit with the code in the rss_parse.inc file, I realized the function php4_create_parser wasn't getting the correct values for the variables $in_enc and $detect, so I gave them the values directly after the function beginning:

    function php4_create_parser($source, $in_enc, $detect) {

    $detect = true;
    $in_enc = null;

    I really don't know why the function didn't get the correct values from the variables, as the code seems to be ok.
    So, now the parser detects the RSS encoding and the function xml_parser_create does its job.

    Thanks garysims.
     
    AnKaRoTH, Nov 11, 2005 IP