1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

To change "windows-1250" to "iso-8859-1" or not

Discussion in 'HTML & Website Design' started by tayiper, Mar 15, 2007.

  1. #1
    Hey all, I am curious: should I replace the "windows-1250" thing for specifying font-encoding with for example "iso-8859-1" (or some other one) or it doesn't really matter??


    On my sites this encoding-specification appears on two different places (both in the documents' headers), i.e. in the XML declaration and as the value of "content" attribute of "meta" element:

    <?xml version='1.0' encoding='windows-1250' standalone='yes'?>
    Code (markup):
    <meta http-equiv="content-type" content="text/html; charset=windows-1250" />
    Code (markup):

    tayiper
     
    tayiper, Mar 15, 2007 IP
  2. the_pm

    the_pm Peon

    Messages:
    332
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Ditch the Windows/Latin character sets for Unicode (UTF-8). It's the nice thing to do for your international visitors. Of course this also means making sure you save your pages with UTF-8 encoding, and make sure all of your characters are compliant.
     
    the_pm, Mar 15, 2007 IP
    tayiper likes this.
  3. tayiper

    tayiper Active Member

    Messages:
    421
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    78
    #3
    Hmmm thanks much the_pm, but could you please also tell me how exactly can I do that?!! I mean the only thing that I know for sure is that all my documents were originally created as common .txt files (as on Windows operating-system) and later saved with .html or .htm extension, while I use EditPad Lite text-editor to edit them for quite some time now. Yes, I will check if there are any related settings somewhere in the EditPad program.


    tayiper
     
    tayiper, Mar 15, 2007 IP
  4. tayiper

    tayiper Active Member

    Messages:
    421
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    78
    #4
    Oh and also, could you tell me which one in particular do you suggest me to use, i.e. "iso-8859-1", "iso-8859-2", or maybe even some other etc.??!


    thanks again, tayiper
     
    tayiper, Mar 15, 2007 IP
  5. kk5st

    kk5st Prominent Member

    Messages:
    3,497
    Likes Received:
    376
    Best Answers:
    29
    Trophy Points:
    335
    #5
    The charset sent by the server response header is authoritative. It should be utf-8 as recommended by W3. This is no problem for any charset except windows-xxxx, which uses invalid character entities. You can use windows charsets only if the server specifies that charset or does not specify one at all. In the latter case, the document and browser negotiate an encoding based on the meta http-equiv charset, or for xml documents the xml declaration. The xml declaration should not be used for xhtml documents served as text/html as, 1) it is not an xml document, and 2) it throws IE6 into quirks mode where it screws up even worse than usual.

    In your editor, choose the iso- encoding appropriate to your language, or choose utf-8 which serves for all languages and charsets (except windows-xxxx). ASCII is a subset of all charsets, so that's no problem.

    See http://www.w3.org/International/O-charset.en.php

    cheers,

    gary
     
    kk5st, Mar 15, 2007 IP
  6. the_pm

    the_pm Peon

    Messages:
    332
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Just make sure when you save your documents that any default save settings say UTF-8. I haven't used Editpad, so I don't know how this would work there, but I'm assuming it's similar to how Notepad saves things in UTF-8. When you choose Save As... in Notepad, there's an option to select Encoding. Set this to UTF-8. Likewise, look for the equivalent of this in your editing software of choice :)
     
    the_pm, Mar 16, 2007 IP
  7. tayiper

    tayiper Active Member

    Messages:
    421
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    78
    #7
    /EDIT:

    I see now that should've written "utf-8" and not "iso-8859-1" ??


    tayiper
     
    tayiper, Mar 18, 2007 IP
  8. tayiper

    tayiper Active Member

    Messages:
    421
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    78
    #8
    Well well, I finally tested all the possible combinations, and I discovered that for my site in slovenian language (and letters), the windows-1250 charset (the second example below) is the only one that displays those few "special" characters as it should; I mean all the other encodings sow for instance letter "č" as "?" (a question char), of course, each charset a bit differently ...


    Here are the examples:

    Charset-test1.html (using "utf-8")

    Charset-test1.html (using "windows-1250")

    Charset-test3.html (using "windows-1252")

    Charset-test4.html (using "iso-8859-1")

    Charset-test5.html (using "iso-8859-2")


    So I am curious, should I use the mentioned "windows-1250" charset for my slovenian websites/documents or what??!


    P.S. - Oh and one additional banal more or less "theoretical" question: could <meta http-equiv="content-type" content="text/html; charset=windows-1250" /> be written in a "long form" as <meta http-equiv="content-type" content="text/html" /><meta http-equiv="charset" content="windows-1250" />, and it would be still correct??!


    tayiper
     
    tayiper, Apr 2, 2007 IP
  9. kk5st

    kk5st Prominent Member

    Messages:
    3,497
    Likes Received:
    376
    Best Answers:
    29
    Trophy Points:
    335
    #9
    You may say you're using utf-8 or another iso charset, but the output of your editor is still windows-1250.

    You have to use the correct character set in your editor.

    Do this, step by step:
    • Go here
    • Download the binary and install
    • Open a file.
    • Click on file⇒encoding
    • Click utf-8
    • Fix the odd messed up characters
    • Save
    It's now in utf-8, and will match your charset declaration.

    From now on, specify utf-8 for every document.

    cheers,

    gary
     
    kk5st, Apr 2, 2007 IP
  10. tayiper

    tayiper Active Member

    Messages:
    421
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    78
    #10
    Yeah, I should assume that. but anyway, this complicated this so much.... I mean so many different encodings, so many potential problems (i.e. a set/group of HTML files having mixed encodings), etc. And by the way:. is there a way for one to discover in which encoding some particular file is written??!

    Thanks, but if you meant to suggest to do that in this program that you linked... you see, I know Notepad2 program, but I really much much prefer EditPad Lite over any other text-editor that I've tried so far.


    tayiper
     
    tayiper, Apr 2, 2007 IP
  11. kk5st

    kk5st Prominent Member

    Messages:
    3,497
    Likes Received:
    376
    Best Answers:
    29
    Trophy Points:
    335
    #11
    I specified that editor because I happen to have it, and could tell you how to go into utf-8, and I don't think you're up to Emacs. It doesn't matter which editor you use as long as you save your work with the right charset.

    How do I know you're using windows-1250? Look at the w3 validation. It can't validate because there are invalid characters. All charsets except windows-xxxx are valid with utf-8. You can override the autodetection and specify windows-1250, and the validator works. Ergo, you're using a windows proprietary charset.

    Configure your editor for utf-8, set the meta http-equiv to utf-8 and configure your server to default to utf-8.

    gary
     
    kk5st, Apr 2, 2007 IP
  12. tayiper

    tayiper Active Member

    Messages:
    421
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    78
    #12
    Oh and Gary one more thing: could you please tell me which term is the appropriate one (or should I say more appropriate) for what we are discussing in this thread (so that I will use the correct one next time): "encoding" or "charset" ??!


    thanks again, tayiper
     
    tayiper, Apr 4, 2007 IP
  13. kk5st

    kk5st Prominent Member

    Messages:
    3,497
    Likes Received:
    376
    Best Answers:
    29
    Trophy Points:
    335
    #13
    Loosely, the character set is the list of characters, while the encoding is about how they are represented as bits. Read Lachlan Hunt's Guide to Unicode, Part 1.

    cheers,

    gary
     
    kk5st, Apr 4, 2007 IP
  14. tayiper

    tayiper Active Member

    Messages:
    421
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    78
    #14
    Thanks again much, Gary. And sorry, but I totally forgot to add also this question: why in IE's right-click menu there is an "Encoding" option if you can't change it "on-the-fly" when some document is written in a particular encoding??!


    tayiper
     
    tayiper, Apr 10, 2007 IP
  15. kk5st

    kk5st Prominent Member

    Messages:
    3,497
    Likes Received:
    376
    Best Answers:
    29
    Trophy Points:
    335
    #15
    As far as I know, and I know very little about using IE, it does. The thing is, that only affects how that one browser treats the encoding. It has nothing at all to do with which char-set was used with which encoding, or which response header was used by the server. That's your job as the author to get all that right in the first place.

    cheers,

    gary
     
    kk5st, Apr 10, 2007 IP
  16. tayiper

    tayiper Active Member

    Messages:
    421
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    78
    #16
    So if I understand correctly: some particular charset can have many different encodings (or better yet, is or can be encoded), and reversed, some particular encoding can be "used" by many different charsets??


    tayiper
     
    tayiper, Apr 10, 2007 IP
  17. kk5st

    kk5st Prominent Member

    Messages:
    3,497
    Likes Received:
    376
    Best Answers:
    29
    Trophy Points:
    335
    #17
    Yeah, that's close enough. For example, there are a lot of char-sets encoded as iso-8859-x. Each char-set is encoded as single octets, that is, each character is a single byte. That's why you might see something as simple as a right single quote mark, (&#8217;) in utf-8 rendered as ’ if the browser thinks the encoding is iso-8859-1. The utf-8 takes three octets to encode the character, but iso-8859-x looks at each octet as a single character.

    cheers,

    gary
     
    kk5st, Apr 10, 2007 IP
  18. tayiper

    tayiper Active Member

    Messages:
    421
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    78
    #18
    Hey, I am just letting you know that I have found the "convert" feature in EditPad Lite program; see the attached image below ...


    Here's a link to the full-size image of the dialog-window's screenshot: editpadencodingec7.png, while below is the attached thumbnail image:

    [​IMG]


    tayiper
     
    tayiper, Apr 19, 2007 IP