Hello, In some cases, a strange � sign appears in syndicated content. How can we get rid of it? I tried uTF-8 as this could have been some char. from another charset, but that was useless. Also, what effect does that have on SE's? Thanks, Ruslan
It means that the web browser does not understand what the symbol is. I tend to accept the content in from a news feed, then the script which uploads it to the database scans the file for characters I know to be problematic and replaces them with unicode references.
there are two possibilities: 1) Change your utf-8 to ascii code 2) or try to make some change html code in your web page and before uploading check in browser. even though it's not solved then give me your site's url.
You need to have the character encoding that fits the language. So we can't help you with that, unless you tell us the language in question.
1- bad call IMHO, you can not use ascii if you know your site is multilingual as it will never show anything but english letters. 2- maybe better. i can do some changes if you give me any clues. Checking before uploading is not an option as my website is as big as technorati itself and has like 10K feeds syndicated every minute.. i can't control (check etc..) each of those. At the moment, i am thinking deeper by playing with the XML parser itself. But the thing i discovered is that the XML parser gets correct data and saves to cache (local filesystem) also good. The problem is in the rendering process.. Regarding the URL you asked for, i promise you will be the first to know about it (within 12 hours). The site is not officially released now, still preparing PR Thanks and still waiting for clues.. Ruslan