Character encoding woes

Discussion in 'HTML & Website Design' started by Cobnut, Apr 7, 2009.

  1. #1
    I have a website that's got a good chance of needing to display all the languages in the world and, at the moment, at least needs to be able to display German (and other Western European) and Polish. To date, I've been storing the language vars in a UTF-8 general collated db and encoding/decoding in and out in PHP and using htmlentities. All this has worked fine for the WE languages.

    However, we've just taken on a Polish customer and it's all fallen around my ears. I understand (I think!) that ideally I'm going to have to encode the pages as UTF-8 but I'm having problems storing the Polish characters and getting them to display properly on screen. The PHP functions utf8_encode/decode only work with ISO-8859-1 so are no good for Polish.

    What should I be doing to have the best chance of getting this site to work with all the languages? Should I be storing html encodes (e.g. ą) in the db? Should I ditch the utf8_encode/decode steps?

    Any help or guidance greatly appreciated - bearing in mind I already have an entire translation into German using UTF8 codes and need the customer to be able to use a form to do their own translations (i.e. it's not static).

    Jon
     
    Cobnut, Apr 7, 2009 IP