Which CHARSET/DOCTYPE etc. declarations to use?

Discussion in 'HTML & Website Design' started by Mr.Dog, Dec 2, 2014.

  1. #1
    Hi,

    I'm perplexed about these - I admit I kept using these for ages with every site I manage. Not sure whether it has an effect on SEO, SERPs...

    This doctype (not the full code):

    W3C//DTD XHTML 1.0 Transitional//EN

    and

    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

    I hear it's best to use utf-8 because of HTML 5.

    I am using the "transitional" doctype, because as I switched to something else, the W3 HTML validator found tonnes of errors, with this one it works fine :)

    Also, the transitional doesn't mix up my code. I tried other doctypes and those mad my site look like a mess.

    I have no idea whether this doctype thing is really that important.

    Should I also declare english language in the meta?

    What charset and doctype should I use?

    Anything else important to declare?
     
    Mr.Dog, Dec 2, 2014 IP
  2. themes4all

    themes4all Well-Known Member

    Messages:
    662
    Likes Received:
    47
    Best Answers:
    6
    Trophy Points:
    100
    #2
    Hello,

    Doctype Declaration is simply an instruction to the web browser about what version of HTML the page is written in... You only have to be sure about the Markup to use...

    If your page is build with HTML5 and you want to Validate it, so the Doctype have to be : <!DOCTYPE html>

    If you want to Validate in XHTML 1.0 Transitional so the Doctype have to be :

    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><htmlxmlns="http://www.w3.org/1999/xhtml">
    
    Code (markup):
    Concerning the SEO, There is no value in using XHTML or HTML5 markups, as long as the page is validate then Google, Bing and Yahoo will index and process your data without problems...

    For More Informations about the Doctype Declarations : Link

    Goodluck
     
    themes4all, Dec 2, 2014 IP
  3. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,732
    Likes Received:
    1,999
    Best Answers:
    253
    Trophy Points:
    515
    #3
    I prefer XHTML Strict, but the XHTML vs. HTML thing is more of a personal preference -- I just like XHTML's more consistent structural rules which make code clearer and makes you less likely to make mistakes if you have good code formatting habits. (which most people can't seem to be bothered to do which is why their code is quite often bug-ridden)

    IMHO HTML 5 is pointless code bloat and other than a few redundancies being shoved down our throats in their attempt to promote vendor lock-in (AUDIO, VIDEO) there is no reason to deploy as it -- and even if one has to use those, I (and many others) suggest writing as HTML 4 STRICT or XHTML 1.0 STRICT first, then slapping 5's doctype on right before deployment. Most of the new "structural tags" like SECTION, ARTICLE, FOOTER and NAV are redundant to existing semantic tags (specifically numbered headings and horizontal rules) serving no legitimate purpose other than encouraging the sloppy practice of "wrapping existing tags in extra wrappers for nothing" -- see how NAV replaces the pointless DIV people slap around menu UL, when there's little if anything you can do to a DIV you can't do to a UL.

    ...or at least most of the new tags are redundant if you have ANY clue what semantic markup is... Though I'm souring on the term "semantic markup" as we REALLY need to start calling it what it is -- USING HTML PROPERLY... of course most people sleazing out HTML any-old-way have no clue why numbered headings have numbers and just assume the presentational meaning. (admittedly they do the same thing with B and I which is also wrong).

    As to a charset, at this point there is NO reason to not be deploying as UTF-8. It automatically opens the door to 98% of possible modern languages -- while in any language ASCII7 (character codes 32..127) can convey it is no larger. (as opposed to UTF-16 which is pointlessly large for things like US English).

    Another advantage of UTF-8 is there are some non-language character blocks that you can use to substitute for graphics. There are various "symbol" and "graphics" type codepages that can be very VERY useful -- though you have to do some research to be sure said characters are available in your selected fonts.

    Just beware in your editors when you save as "UTF-8" that you omit what's called the BOM -- "Byte Order Mark" -- some browsers (IE) don't understand it. Basically the BOM is a header that's supposed to say "This is UTF-8 data", but certain browsers don't check for it and instead output it as the raw characters... which ends up rendering as total garbage at the top of the page.

    ISO-8859-1 is the 'old' standard Internet encoding. It is good for most romanesque languages, but limited. If you are planning on only working in the "major" eastern languages it's fine, but really when all you have to do is change "ISO-8859-1" to "UTF-8", make sure the server is sending as UTF-8 (htaccess or config files will do)

    Sending a different character encoding by default over HTTP requires a proper HTTP header to be included. Usually you can configure your static files from your servers config files or using things like .htaccess -- but when it comes to server generated data, you'll need to send those headers yourself. If you are working with PHP that's actually pretty simple as they give you a nice 'header' command. You just have to be sure to call any and all instances of 'header' BEFORE you allow anything else to be output. (Cookies are handled the same way, and must be set/sent BEFORE data as they too are passed in the headers)

    For PHP that's a simple:
    header('Content-Type', 'text/html; charset=utf-8');
    Code (markup):
    You should be aware that whatever is said in the HTTP headers will trump anything you declare in the markup, so if the server is sending ISO-8859-1 (the default), sending UTF-8 formatted data, even saying:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    Code (markup):
    ... in the markup will STILL be treated as ISO-8859-1, meaning any utf-8 encoded values will come across as gibberish. It's called "Mismatched encoding" -- Ever notice how sometimes sites suddenly have screwed up "styled quotes" showing the "unknown character" question mark in a diamond? That's mis-matched encoding in action.

    Which of course begs the question "why say it in the markup at all" -- and indeed HTML 5's lip-service charset declaration is built on that idea... the reason to declare it is as a fallback for if the header is missing/corrupted. An excellent case in which it might be missing is during local file development/access. You access the HTML file locally directly off the hard drive, there are no HTTP headers.

    Hope this helps. It's actually a pretty complex topic not a lot of people take the time to ask "why" -- they just blindly copy what everyone else is doing.

    -- edit --

    Oh, and so far as SEO goes, so long as the collars match the cuff's, it shouldn't make one lick of difference one way or the others. That is to say, so long as your markup is valid, semantic (tags saying what things are NOT what they look like), and the character encoding declared matches the one deployed in the file, it doesn't matter.

    Remember, so far as your markup is concerned SEO is about your content first and foremost. Part of why I advocate starting with content or a reasonable facsimile of future content FIRST, putting it into a text editor in a logical order -- then and only then marking it up semantically with a logical document structure, then and ONLY then worrying about building my layoutS with CSS, then and ONLY then worrying about the graphics to hang on the layout. Content FIRST means progressive enhancement so you have graceful degradation, SEO without even having to think about it, proper accessibility...

    ... and is why dicking around drawing pictures in Photoshop before doing any of that has jack *** to do with web design; no matter how many ignorant PSD jockeys claim otherwise and scam thousands of people daily. It's also why off the shelf templates are utter and complete trash too -- see the scam artist whorehouses like TemplateMonster and ThemeForest for proof of that.
     
    Last edited: Dec 3, 2014
    deathshadow, Dec 3, 2014 IP
  4. Mr.Dog

    Mr.Dog Active Member

    Messages:
    912
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    60
    #4
    Thing is, I coded lots of pages with various HTML codes, not sure which version, really...
    I used basic tags mostly, basic commands... nothing like <article> or <section>, but my CSS goes as far as hover effects, rounded DIVs...

    I have no idea which DOCTYPE or which HTML version would suit my sites best. I guess, I'm left with experiments, A/B testing only.

    W3 Validator gives me various errors with the STRICT, so I kept putting back the "good-old" TRANNY. Which is frowned upon, I hear.
     
    Mr.Dog, Dec 3, 2014 IP