Special Character Become ?s

Discussion in 'Apache' started by T0PS3O, Sep 19, 2005.

  1. #1
    I'm moving servers. Platforms are nearly identical in terms of Apache, PHP and MySQL versions. However, all apostrophes, pound sterling signs, quotation marks, copyright and celsius signs etc. become question marks.

    Any of you know where the character setting is done, where I need to look?

    Right now I need to look through all content and replace all with html ascii codes which is probably better long term but it's such a pain on 2500+ pages.
     
    T0PS3O, Sep 19, 2005 IP
  2. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You need the character set to be either...

    iso-8859-1

    or UTF-8

    Try with both and see if it works.

    A search and replace is easy enough to do on that...
     
    SEbasic, Sep 19, 2005 IP
    lorien1973 likes this.
  3. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #3
    OK!

    Now.. Where do I do that? Are you talking server directives or html headers?
     
    T0PS3O, Sep 19, 2005 IP
  4. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Sorry, HTML headers...
     
    SEbasic, Sep 19, 2005 IP
  5. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I'll try that thanks.

    BTW these are the relevant lines being generated by the CMS:

    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
    <html dir="LTR" lang="en">
    [...]
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    
    Code (markup):
    Does the xml schema have anything to do with all this as well?

    (Trying the char set now.)
     
    T0PS3O, Sep 19, 2005 IP
  6. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #6
    It's this you need to change...
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

    I don't see any XML Schema (Only Doctype, HTML Tag, and Charset)
     
    SEbasic, Sep 19, 2005 IP
  7. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Yeah doctype is what I meant. Cheers mate!

    I get the same ?s with:

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    Code (markup):
     
    T0PS3O, Sep 19, 2005 IP
  8. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Hmmm... Have you checked the server config?
     
    SEbasic, Sep 19, 2005 IP
  9. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Yes and I'm not any wiser because of it...

    # Specify a default charset for all pages sent out. This is
    # always a good idea and opens the door for future internationalisation
    # of your web site, should you ever want it. Specifying it as
    # a default does little harm; as the standard dictates that a page
    # is in iso-8859-1 (latin1) unless specified otherwise i.e. you
    # are merely stating the obvious. There are also some security
    # reasons in browsers, related to javascript and URL parsing
    # which encourage you to always set a default char set.
    #
    AddDefaultCharset UTF-8
    
    #
    # Commonly used filename extensions to character sets. You probably
    # want to avoid clashes with the language extensions, unless you
    # are good at carefully testing your setup after each change.
    # See http://www.iana.org/assignments/character-sets for the
    # official list of charset names and their respective RFCs
    #
    AddCharset ISO-8859-1  .iso8859-1  .latin1
    AddCharset ISO-8859-2  .iso8859-2  .latin2 .cen
    AddCharset ISO-8859-3  .iso8859-3  .latin3
    AddCharset ISO-8859-4  .iso8859-4  .latin4
    AddCharset ISO-8859-5  .iso8859-5  .latin5 .cyr .iso-ru
    AddCharset ISO-8859-6  .iso8859-6  .latin6 .arb
    AddCharset ISO-8859-7  .iso8859-7  .latin7 .grk
    AddCharset ISO-8859-8  .iso8859-8  .latin8 .heb
    AddCharset ISO-8859-9  .iso8859-9  .latin9 .trk
    AddCharset ISO-2022-JP .iso2022-jp .jis
    AddCharset ISO-2022-KR .iso2022-kr .kis
    AddCharset ISO-2022-CN .iso2022-cn .cis
    AddCharset Big5        .Big5       .big5
    # For russian, more than one charset is used (depends on client, mostly):
    AddCharset WINDOWS-1251 .cp-1251   .win-1251
    AddCharset CP866       .cp866
    AddCharset KOI8-r      .koi8-r .koi8-ru
    AddCharset KOI8-ru     .koi8-uk .ua
    AddCharset ISO-10646-UCS-2 .ucs2
    AddCharset ISO-10646-UCS-4 .ucs4
    AddCharset UTF-8       .utf8
    
    # The set below does not map to a specific (iso) standard
    # but works on a fairly wide range of browsers. Note that
    # capitalization actually matters (it should not, but it
    # does for some browsers).
    #
    # See http://www.iana.org/assignments/character-sets
    # for a list of sorts. But browsers support few.
    #
    AddCharset GB2312      .gb2312 .gb 
    AddCharset utf-7       .utf7
    AddCharset utf-8       .utf8
    AddCharset big5        .big5 .b5
    AddCharset EUC-TW      .euc-tw
    AddCharset EUC-JP      .euc-jp
    AddCharset EUC-KR      .euc-kr
    AddCharset shift_jis   .sjis
    
    Code (markup):
    I also see this:

    AddLanguage da .dk
    AddLanguage nl .nl
    AddLanguage en .en
    [more]
    Code (markup):
    Maybe english should be moved up? :confused:
     
    T0PS3O, Sep 19, 2005 IP
  10. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Meh - Give it a go...

    I doubt it will change anything.

    Is iso-8859-1 in that list (I imagine it is).... : |
     
    SEbasic, Sep 19, 2005 IP
  11. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Yes it is but ISO in caps though. I may have to try that, maybe it's anal about capitalization.

    Just stumbled on another osCommerce setting:

    @setlocale(LC_TIME, 'en_UK.ISO_8859-1');
    PHP:
    Will have to play with that as well to line it up with the charset.
     
    T0PS3O, Sep 19, 2005 IP
  12. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Dunno mate - PM me a link to the page...

    It could be something specific to your machine?
     
    SEbasic, Sep 19, 2005 IP
  13. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #13
    setlocale on RedHat seems supposed ot be en_UK without the charset after it.

    And it seems I'm a total timewasting muppet. I was changing all these things reloading the page only to now realize this:

    When my colleague entered the data, like the word "it's" back when the charset was wrong, it was stored into the DB as "It?s". So no matter how many times I refreshed the browser, it would serve up a ? nontheless.

    Gotta do a couple more checks but seems like the locale sussed it.

    So for other people moving from FreeBSD to RedHat:

    Instead of:

    @setlocale(LC_TIME, 'en_UK.ISO_8859-1'); 
    PHP:
    use
    @setlocale(LC_TIME, 'en_UK'); 
    PHP:
     
    T0PS3O, Sep 19, 2005 IP
  14. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuh.
     
    SEbasic, Sep 19, 2005 IP
  15. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #15
    The saga continues...

    I now also have it posted at the osCommerce forums here.

    What's left is an issue with e-mail and PDF. I can get the site to display everything just fine. The pound sign shows as a pound sign but only when it's stored in the database as "£" without the quotes, inc the weird A.

    Current locale is set as en_GB.ISO_8859-1 and charset as iso-8859-1.

    I tried many many combo's of the two. Typing locale -a on the command line gives me, amongst many other, these:

    en_GB
    en_GB.iso885915
    en_GB.utf8

    So how should I format my locale and charset is I see no underscores or hyphens on the server?
     
    T0PS3O, Sep 25, 2005 IP
  16. SEbasic

    SEbasic Peon

    Messages:
    6,317
    Likes Received:
    318
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Just for the record - I have no idea mate :)
     
    SEbasic, Sep 26, 2005 IP
  17. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Just had reply from the host, they were forcing it in httpd.conf as UTF-8. Switched that abck to ISO whatever and it works.

    Thanks for checking anyway! This has been such a waste of time and all due to this one bloody setting. Gotta hate computers :)
     
    T0PS3O, Sep 26, 2005 IP
  18. Bobscrachy

    Bobscrachy Peon

    Messages:
    3
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #18
    How would i set the autoindex to use UTF-8? I don't use html files. I use the default index. Some of the descriptions that are necessary for file listings have special characters.

    Thank you,
    ~Bob
     
    Bobscrachy, May 16, 2007 IP