Which mysql collation best to use for html text

Discussion in 'MySQL' started by joshm, May 23, 2008.

  1. #1
    Some of my databases have html code which can have all types of weird characters in them. It is all automated so there is nothing I can do to change the characters before they are inserted. What would be the best collation to use in my mysql databases? The default collation is "latin1_swedish_ci" for some reason. I changed a few to "utf8_bin" to test because the webpages are encoded as:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    Code (markup):
    I need a collation that can handle encoding of a wide range of characters. I don't know much about it so i'm just wondering which is the best to use for html. I see a lot of sites that can handle the encoding better than mine with the same content using charset=utf-8 so it must be the collation in their db that must be set differently.

    Also, when inserting html into a db I use: mysql_real_escape_string($html_content_to_be_inserted) for example. But a new line \n is not inserted into the db or it is affected in some way. So it outputs all on one line and looks like a mess. I know if a newline \n can be inserted into the db I can simply use the php nl2br function to convert every \n to a <br /> tag, so it outputs correctly.
     
    joshm, May 23, 2008 IP
  2. cmanns

    cmanns Peon

    Messages:
    62
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I believe the proper html charset is UTF-8 (capitalized)

    Also use utf8_collation_ci
     
    cmanns, May 24, 2008 IP
  3. joshm

    joshm Peon

    Messages:
    59
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Oops, I didn't realize the UTF-8 part is case-sensitive. What happens if it's written as "utf-8"?

    Also, what's the difference between utf8_collation_ci and utf8_unicode_ci and utf8_general_ci ?
     
    joshm, May 24, 2008 IP
  4. 2beers

    2beers Well-Known Member

    Messages:
    195
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    110
    #4
    I use latin1_swedish_ci and it works fine. UTF is good too and it's a most common standard
     
    2beers, May 24, 2008 IP
  5. cmanns

    cmanns Peon

    Messages:
    62
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I think collation has more char support if i recall...
     
    cmanns, May 24, 2008 IP