I'm facing a problem converting PHP & MySQL code to use data from a Unicode file. I want to parse a Unicode text file and extract the data to update a MySQL database. The ANSI version worked fine, however the new code using the Unicode doesn't work. I tried many variations of the code, but here's what I mainly tried. 1. The content of the file “test1.txt†is UTF-8 example: à éëê 2. Here's the PHP code. a. I call “mysql_set_charset('utf8', $link_dblink);†just after I connect to the database. However, this call changes 'latin' to 'utf8' when I call mysql_client_encoding() afterward. This call doesn't seem to influence. b. First, I tried $test1 = file_get_contents("test1.txt", NULL, NULL, 2); echo "test1 = $test1<br>"; Results: IE: test1 = UTF-8 example: ���� Firefox: test1 = U�T�F�-�8� �e�x�a�m�p�l�e�:� ��������� c. Second, I tried $test1 = file_get_contents("test1.txt", NULL, NULL, 2); $test1 = utf8_encode($test1); echo "test1 = $test1<br>"; Results: IE: test1 = UTF-8 example: à éëê Firefox: test1 = U�T�F�-�8� �e�x�a�m�p�l�e�:� ��������� IE is correct this time, but not Firefox. utf8_encode is used to “Encodes an ISO-8859-1 string to UTF-8â€. That seems to give good result in IE, but the data should already be Unicode. I don't think I should be using this function. I suspect there is already a problem at this stage. d. I also tried the following code because I need to escape the quotes in the string to make a valid SQL query. $test1 = file_get_contents("test1.txt", NULL, NULL, 2); $test1 = utf8_encode($test1); // $test1 = mysql_escape_string($test1); // Also tried, but same results as addslashes(). $test1 = addslashes($test1); echo "test1 = $test1<br>"; Results: IE: test1 = U\0T\0F\0-\08\0 \0e\0x\0a\0m\0p\0l\0e\0:\0 \0à \0é\0ë\0ê\0 Firefox: U\0T\0F\0-\08\0 \0e\0x\0a\0m\0p\0l\0e\0:\0 \0à \0é\0ë\0ê\0 I don't think this function should escape all characters this way. e. The following SQL query doesn't work when called with the php code: $query = “UPDATE imagesprop SET title_fr='Test 1' WHERE id=11002;†$queryID = mysql_query($query); It always returns FALSE. However, if I do “echo†to print out the query on the screen and I execute the result in the SQL tab of phpMyAdmin, it works! The record is updated. That means that the query syntax is good. So, I checked more at the PHP code's level. f. The collation of the field is “utf8_unicode_ciâ€. I also tried other collations, but that didn't work more to update the database. I'm using PHP 5.3.3 with MySQL 5.0.7. Finally, after many searches, the problem persists. Does anyone here have experience with PHP and Unicode?
I had the same problem but you need to use 'header' to display UTF-8 header("Content-type: utf8") or something (please search using searchmachines)
Thank you for your answers. This stuff is not documented a lot because of the poor support. I thought it was more supported, but it looks like we will have to wait until PHP 6 for full easy support. Time is missing for check all this for my current project. Now, I only need English and French and the simple workaround I got seems to do the job. I decided to export my data files to ANSI format and to load them directly in PHP. I still use Unicode for the database. However, because I use UTF-8 for the page encoding, I had to use $title_fr = iconv("ISO-8859-1", "UTF-8", $title_fr); or $title_fr = utf8_encode($title_fr); Both calls were equivalent. Furthermore, I had to use mysql_set_charset('utf8', $link_dblink); before writing data to the DB and before reading data. I haven't tested any specific language data for the web forms. For example, I don't know what would happen if someone entered a Chinese name in a registration form. I'll figure this out in a second step.