Crazy character encoding problem!

Discussion in 'Databases' started by gandalf117, Dec 4, 2012.

  1. #1
    This problem has been bothering me for a long long time and I haven't found a solution yet.

    Needless to say I am not an expert on database character encoding and this could be something very basic for those who know what is going on.

    Here is the problem:

    When I submit foreign text through an ordinary webform to my local server (XAMPP) the text is encoded in a certain way before it is stored in the local database. For example if I submit the following word (it is in Cyrillic): едно after it is encoded in the database it shows as: едно Of course when I retrieve the text from the database in browser it's automatically decoded back to normal and it shows normally.

    However when I submit foreign text through the same webform but to servers on the web the same text is encoded in a totally different way before it is stored in the database on those servers. For example that same word (it is in Cyrillic): едно after it is encoded in the database it shows as: едно This also shows normally back in the browser but only on their servers. God forbid I transfer the same encoded information to my local server.

    Now I don't understand how and why the same word is encoded in two totally different ways: едно and едно. The problem is that because of this difference I cannot transfer information in foreign language from my server to other servers and vice versa. When I import information from one server to the other the other is just failing to decode the text and is showing the encoded version in the browser. I think it has to do with how those servers and my server are configured. Not all servers are like that by the way. There are plenty of servers where this problem doesn't exist. I think go-daddy is one of them.

    Any help with this problem is greatly appreciated.
    Please, help me figure this out!
     
    Last edited: Dec 4, 2012
    gandalf117, Dec 4, 2012 IP
  2. PYO

    PYO Member

    Messages:
    38
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    36
    #2
    What are the character encodings on both servers.

    Solution is to make a data for transfer with UTF-8 encoding.
     
    PYO, Dec 4, 2012 IP
  3. gandalf117

    gandalf117 Active Member

    Messages:
    111
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #3
    I am not 100% sure what do you mean by character encoding on a server, but I think it's utf-8 on both.

    When I am exporting the database it says that the character set of the file is: utf-8

    When I made the tables I set the collation of the table fields that contain foreign language of course to: utf8_general_ci

    That's what baffles me most. Unless I am missing something else all the character encoding and collations everywhere are in utf-8. It's seems that on different servers the utf-8 encoded result for the same text is different (as I pointed above). Is that even possible? Those servers that are different than mine and that I am having problem with are not american servers. They are in countries whose official alphabet is not even Latin. Their interface is still in English but they seem to be configured in a different way. At least that's where I think the problem stems from.
     
    gandalf117, Dec 4, 2012 IP
  4. PYO

    PYO Member

    Messages:
    38
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    36
    #4
    If you are sure that the both servers uses UTF-8 and exporting also uses it, than you have a problem with php or server default encoding. The both settings can be changed on most servers.

    едно - this is UTF-8 showing with WINDOWS-1251 encoding.

    едно - this is UTF-8 showing with WINDOWS-1250 encoding.
     
    PYO, Dec 4, 2012 IP
  5. Rukbat

    Rukbat Well-Known Member

    Messages:
    2,908
    Likes Received:
    37
    Best Answers:
    51
    Trophy Points:
    125
    #5
    Look at the collation in the database that works. It has to be the same in the other databases. Alternatively, if you have no control over the other ones, and they all use the same collation, set yours to be the same as theirs.

    If they all use different collation, you have a problem.
     
    Rukbat, Dec 5, 2012 IP
  6. gandalf117

    gandalf117 Active Member

    Messages:
    111
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #6
    PYO, thank you for the input!

    Now I know what these exact encodings are called. I have no idea how to change those setting on a server though. If you know how please. share the information. It can't be too hard.

    I finally did find a workaround for my problem, thanks to your post. I searched around and found that programs like Notepad2, Notepad++, etc. can easily convert encoded texts like these. All I have to do is just dump the database containing information encoded with WINDOWS-1251 and open it with one of those programs. Then convert the encoded text in there to a readable format and import it in my database. Even though this doesn't seem like the most professional approach it does work... yay
     
    gandalf117, Dec 6, 2012 IP
  7. PYO

    PYO Member

    Messages:
    38
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    36
    #7
    There are billions ways to change character encoding on server: per locale, per service, per user, per client software, etc. I don't know what is a solution for you.

    In common all you need is to configure client to use UTF-8, and no matter what encoding server use you'll get UTF-8. The only issue can be that computer where client is running is using other then UTF-8 encoding, so it can be reencoded to non UTF-8 with somewhat data loss or the situation you had: showing two-byte UTF-8 char as 2 one-byte chars.
     
    PYO, Dec 6, 2012 IP