Unknown Characters in an RSS Feed

Discussion in 'Site & Server Administration' started by yfs1, Jul 25, 2005.

  1. #1
    I just finished the first round of RSS Feeds for Article Depot and I have come across an issue. Sometimes I get errors when the RSS Feed is displayed in IE. This error comes whether I use UTF-8 or ISO for the feed.

    Here is one displaying properly:
    http://www.articledepot.co.uk/rss/advertisingnew.xml

    And here is one that comes across an unrecognized character:
    http://www.articledepot.co.uk/rss/generalrandom.xml

    They display correctly in readers but how can I get the display to look correct?
    (The main site uses UTF-8 and displays no problem)
     
    yfs1, Jul 25, 2005 IP
  2. DangerMouse

    DangerMouse Peon

    Messages:
    275
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi YSF,

    You could try adding this to the top of your feed:

    <!DOCTYPE rss [<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">]>
    Code (markup):
    It deals with html entities and should fix your internet explorer problems here... I think ;)
     
    DangerMouse, Jul 25, 2005 IP
  3. yfs1

    yfs1 User Title Not Found

    Messages:
    13,798
    Likes Received:
    922
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I had thought the following function in my feed builder that spits out the xml should care of that:
    function xmlentities($string) {
    
    	return htmlentities($string, ENT_QUOTES, 'UTF-8');
    Code (markup):
     
    yfs1, Jul 25, 2005 IP
  4. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #4
    The problem is your dash character (the one following "How to prepare"). Try adding an XML version line to the beginning of the output:

    <?xml version="1.0" encoding="utf-8" ?>

    J.D.
     
    J.D., Jul 25, 2005 IP
  5. yfs1

    yfs1 User Title Not Found

    Messages:
    13,798
    Likes Received:
    922
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I have the version line (Its basically the same feed in structure to the first link)
     
    yfs1, Jul 25, 2005 IP
  6. gemini

    gemini Peon

    Messages:
    256
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    I had to replace all the characters with the HTML code in my RSS output like that:

    
    function removespecial($str) {
       $str = str_replace('"', '"', $str);  // make clean
       $str = str_replace(''', "'", $str);     
       $str = str_replace('"', '"', $str);     	
       $str = str_replace('³„', 'A', $str);
       $str = str_replace('³¤', 'a', $str);
       $str = str_replace('³…', 'A', $str);
       $str = str_replace('³ð', 'a', $str);
       $str = str_replace('³–', 'O', $str);
       $str = str_replace('³¶', 'o', $str);
       $str = str_replace('³ú', 'U', $str);
       $str = str_replace('³ø', 'u', $str);
       $str = str_replace('³ÿ', 'ss', $str);
       $str = str_replace('³´', 'o', $str);
       $str = str_replace('—', '-', $str); 
       $str = str_replace('®', '®', $str); 
       $str = str_replace('&', '&', $str); // prevent &str&str;   
       $str = str_replace('&', '&', $str);         
       return $str;  
    } 
    Code (markup):
    it works fine now.
     
    gemini, Jul 25, 2005 IP
  7. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Not RSS version, XML version. I saved your feed and I don't get this error in IE with the version line I quoted.

    J.D.
     
    J.D., Jul 25, 2005 IP
  8. J.D.

    J.D. Peon

    Messages:
    1,198
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Ok, I figured it out. The dash in your text is x96 (in hexadecimal), which is the default encoding for the &ndash; (a long dash) character (U+2013) in Windows 1252 character set (this is probably where your feed is coming from).

    What you need to do is to encode this string in UTF-8 (e.g. try using utf8_encode). In general, though, you will need to know the encoding of your source, because some conversions may not work without some additional work (e.g. x96 in some other character set may mean a different thing - e.g. in ISO-8859-1).

    J.D.
     
    J.D., Jul 25, 2005 IP
  9. yfs1

    yfs1 User Title Not Found

    Messages:
    13,798
    Likes Received:
    922
    Best Answers:
    0
    Trophy Points:
    0
    #9
    That seems to have done the trick:
    http://www.articledepot.co.uk/rss/generalnew.xml
    Thanks everyone!
     
    yfs1, Jul 26, 2005 IP