I'm trying to help a friend with a PHP problem. While I'm much more experienced at PHP and MySQL and XML as he is, much like him, I've not previously encountered a situation where an ascii character was getting converted into something else in the variable. I tried searching on this but had little luck. Here are excerpts of his post at another forum. Any help would be appreciated. The XML sample is: <specification name="Image Quality"> <spec-value> <name>Camera Resolution</name> <value>6.2 Megapixel •</value> </spec-value> <spec-value> <name>Image Resolutions</name> <value>640 x 480 • 2816 x 2112 • 2272 x 1704 • 1600 x 1200 •</value> </spec-value> </specification> Code (markup): He is handling all XML parsing through the typical setup through the three common routines: function startElement function characterData function endElement The section of code where this is happening looks sort of like this: function characterData ($xmlParser, $data) { global $tagFlag, $specsData; $specsData[$tagFlag] .= $data; } Code (markup): The output ends up looking like this: 640 x 480 ? 2816 x 2112 ? 2272 x 1704 ? 1600 x 1200 ? Code (markup): He can do a preg_replace on "/\?/", however he's concerned that he may take out valid ?'s that appear within elements in the XML. Any suggestions or assistance is appreciated.
There's some black magic called character encoding that you're suffering from. I don't really understand it - hell, I probably have the name wrong. I'll take a stab at explaining it anyways... Each encoding type allots a certain number of bits for each character. Let's say you have 8 bit encoding, the letter A might be 10000000, B might be 0100000, C could be 110000, and so on. If there's more bits then there's more possibilities, or space, for characters in the encoding. Fewer bits, less characters and the weird ones get axed. Hopefully I'm not to far from the mark on this... Anyways, your browser supports multiple encodings but PHP only supports a few, or one, so your dots come across as ? as PHP doesn't know what to make of them (there's no bit assignment for the dot). I found a function on PHP.net that sorta works. Turns weird characters like that into ascii. It doesn't quite work on yours though, but one more str replace should do it... $txt = '640 x 480 • 2816 x 2112 • 2272 x 1704 • 1600 x 1200 •'; $utf8 =''; $max = strlen($txt); for ($i = 0; $i < $max; $i++) { if ($txt{i} == "&") { $neu = "&x26;"; } elseif ((ord($txt{$i}) < 32) or (ord($txt{$i}) > 127)) { $neu = urlencode(utf8_encode($txt{$i})); $neu = preg_replace('#\%(..)\%(..)\%(..)#','&#x\1;&#x\2;&#x\3;',$neu); $neu = preg_replace('#\%(..)\%(..)#','&#x\1;&#x\2;',$neu); $neu = preg_replace('#\%(..)#','&#x\1;',$neu); } else { $neu = $txt{$i}; } $utf8 .= $neu; } // for $i $textnew = str_replace('Â', '', $utf8); echo $textnew; // = 640 x 480 • 2816 x 2112 • 2272 x 1704 • 1600 x 1200 • Code (markup): Hopefully that gets you going in the right direction. -the mole