Wordpress rss parsing weird characters

Discussion in 'Programming' started by fadetoblack22, Jul 27, 2009.

  1. #1
    When I parse wordpress feeds onto other pages of my site some of the characters change.

    a dash "-" changes to "?".

    Also at the end of the text for each article it has [...]

    Does anyone know how to remove this?

    thanks.
     
    fadetoblack22, Jul 27, 2009 IP
  2. kblessinggr

    kblessinggr Peon

    Messages:
    539
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Sounds like character encoding mismatch. Meaning for example you have a page in latin1 , but the characters are in utf8. Or something of the sort. I guess make sure your data's character encoding (this can be checked in phpmyadmin, under the field's collagate(sp?)) matches your output (this can be checked in the wp-config.php file).

    Basically if wordpress is showing the data find, but you're parsing rss feeds outside of wordpress, then you need to make sure to have something like this in your header of your html:

    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    	<head>
    		<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    
    Code (markup):
    Note the meta line and how it defines the character set. Once the character sets are matched that problem should disappear, also since you're basically extending a 7bit charset (latin1) to an 8bit , your existing page content shouldn't be affected.
     
    kblessinggr, Jul 27, 2009 IP
  3. fadetoblack22

    fadetoblack22 Well-Known Member

    Messages:
    2,399
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    160
    #3
    thanks for the reply. I had charset=iso-8859-1 on my site for some reason. I don't even know what that means.

    My wordpress wp-config.php is utf-8 and displays correctly in wordpress so I am guessing the db is fine.

    Adding what you suggested to the head of the file didn't help :(
     
    fadetoblack22, Jul 27, 2009 IP
  4. kblessinggr

    kblessinggr Peon

    Messages:
    539
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #4
    What is the page, maybe the character encoding got reset once again elsewhere.

    Also the iso-8859-1 is the ISO name for latin1 or basically standard western configuration (where as UTF8 or unicode is an 8bit character configuration that can show a lot more characters).

    Also check to see if how you're doing an RSS feed has a cache of any sort.
     
    kblessinggr, Jul 27, 2009 IP
  5. fadetoblack22

    fadetoblack22 Well-Known Member

    Messages:
    2,399
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    160
    #5
    I don't want to list the page here because I am trying to get it out of google's index as it is a test page for my main site. I can pm it to you if you want to see it.

    Yes the rss feed does have a cache, but as soon as I add charset=UTF-8 I lose other characters as well. It does it straight away, so it is not affected by the cache.
     
    fadetoblack22, Jul 27, 2009 IP
  6. MayaLocke

    MayaLocke Peon

    Messages:
    1,016
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #6
    A lot of people on WordPress just paste their content into FeedForAll http://www.feedforall.com to make their RSS feed, the WP feed tends to be buggy.

    HTH
     
    MayaLocke, Jul 29, 2009 IP
  7. fadetoblack22

    fadetoblack22 Well-Known Member

    Messages:
    2,399
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    160
    #7
    Thats not really what I'm looking for.
     
    fadetoblack22, Jul 29, 2009 IP