Since in Tokyo I have always heard people say things about using Unicode. I have even had to take scripts & change them from Unicode to the language I want. I really doubt that Unicode will really work when it comes the internet. I'm on a Mac so this might not be a problem for Window users. If I'm searching on a site that is made with Unicode charset. Japanese results will not come out. Even the Unicode pages here there is a problem. I'll explain. Will every English based web site chage over to Unicode? No. Then most Japanese, or other 2 byte language pages won't either. If you have a Unicode search engine or other feature on your site that draws the data from a regular non Unicode site. It doesn't display correctly. Like the Adsense sandbox is in Unicode but when I put in a Japanese URL it doesn't show Japanese text correctly until I change my character set to Japanese which defeats the purpose of having Unicode anyway. Unicode is a good concept but not used widely enough yet for me personally. If my users are coming from a regular Japanese site & hits my Japanese Unicode site their browser may not automatically change which might make the kanji (Chinese characters) messed up. This is my drawback this alone makes me not fully test out unicode yet.
Unicode is the *only* thing that will work in the Internet (eventually). Check this out - these pages are served as Unicode (UTF-8, to be precise) and this is only why I can display Japanese and Korean text at the same time: é›»å辞書<PWï¼ï¼™ï¼™ï¼‘ï¼ï¼žã‚’発表 ì¹´ìžíìŠ¤íƒ„ì„ ë‹¬êµ°â€˜LG ì „êµë…¸ëž˜ìžëž‘’ Again, eventually. This forum and Google are examples of this. It's because you can't just throw characters from one charset to a page that is displayed in another charset. You have to convert them to Unicode first. J.D.
Nice thing about unicode is it's backwardly compatible with the command western character sets. So English sites don't need to change anything when they switch to unicode.
That's my main point. It's really good for just English sites. Sort of biased. It's a real pain to convert all the text from Japanese to Unicode. It's not as simple as just changing the character set. Sometimes the text actually has to be recopied & pasted after the page has been converted to unicode. This is the problem. If I was just making English sites. It would be fine but if you are looking at it from the Asian character set standpoint it's a bit different. I wish they utilized unicode at first. Then it wouldn't be a problem.
Yes it is. You convert the character set first (e.g. from Shift-JIS to UTF-8) and then you serve your pages as UTF-8 (content-type: text/html; charset=utf-8). That's it. You can use iconv to convert files. For example, this command converts jp-shift-jis.html to jp-utf-8.html: iconv -f shift-jis -t utf-8 -o jp-utf-8.html jp-shift-jis.html Code (markup): J.D.
I was one of the original team of six to set up the BBC News website (http://news.bbc.co.uk/) eight years ago. The BBC News is a world wide organisation and so has to provide it's content globally. It originally used MS SQL Server V6.5, back then this version did not support UTF8 so we had to store our UTF8 in a binary field. When someone generates web page content or copies it from an existing html file it requires no conversion if done in a Windows NT environment as Windows NT is native unicode and does all the work for you. If not on Win NT there are character maps available and conversion is easy to achieve. The last stage of publishing at the BBC uses such character maps to convert the pages to supported character pages in browsers as UTF8 wasn't a globally supported character set eight years ago. BBC News Interactive has now been running under Oracle for many years which provides no problems publishing multi languages to the web. Russian, Mandarin, Spanish, Arabic, Welsh. All published from a single client server environment. Macs are nice and leading edge design but in many ways lag the market in software support. The browsers being one of them. Macs currently have plenty of short comings for browser compatibility. They support UTF8 but switching character sets in Safari can be a problem with caching on the same URL. UTF8 is the future. ASCII is still common but more common are other character code pages. The shift to UTF8 is visible. How far away? If you are using NT you have it now but you may not realise it. If you are still on 98 how long for? I don't really know anything about Macs!
Yeah I am all the mac browsers are really bad at switching from the different character sets. I didn't realize Windows users don't have that problem.
I had to deal with OS X when it came out and I recall having some problems with character sets. What is exactly the problem you are facing - your initial description was too general. J.D.
I just have to chime in here. Mac OS X is probably the most language agnostic OS out there. Safari does just fine with text-encodings including UTF-8, Shift-JIS, EUC-JP, etc. (I live in Japan so I'm most familiar with the Japanese ones). The most common cause of pages not being displayed in the correct encoding is bad HTML coding. That is, developers forget to define the charset in the meta tags. Anyway, text-encoding issues are still a bit troublesome on any platform. I don't think it's in any way unique to Macs.