Likely to be parent rather than child element

Discussion in 'HTML & Website Design' started by joebert, Nov 21, 2007.

  1. #1
    Has anyone ever done a study about which elements are more likely to be parent elements, rather than child elements ?

    For instance, would I be more likely to see
    <pre> <*> <form>
    Code (markup):
    Or would this be more likely
    <form> <*> <pre>
    Code (markup):
    But for the global set of [X]HTML elements.

    It would be nice to have consideration given to which orders are likely to contain more child elements as well.
    For instance if <form> was more likely to contain <pre> by a rate of 1.5:1, it would be nice to know when it's the other way around, ie <pre><form> that there's an average of 4 child <form> elements per <pre>.

    The reason I'm looking for this, is I will be removing sections of [X]HTML based on their context before finally working with what's left over.

    It makes sense to remove all <pre> elements before removing <form> elements if <pre> is more likely to contain a <form>, that way only one removal is done since the child <form> would be removed with the <pre>.

    But if there's likely to be instances where <form> would contain multiple <pre> elements that outnumber the other way around, it would make sense to remove <form> first.
     
    joebert, Nov 21, 2007 IP
  2. Stomme poes

    Stomme poes Peon

    Messages:
    3,195
    Likes Received:
    136
    Best Answers:
    0
    Trophy Points:
    0
    #2
    First, can <pre> even have <form> as a valid child? First go to the w3c page to check what's even valid (people manage to stick <p>s inside <a>s all the time... a nono).

    Better to know that valid forms have inside <fieldset>, <legend>, <label>, <input>, and often <div> inside. Better to know that doing something like removing all <a>'s is a bad idea if there are many images that are clickable for instance.

    Mostly, I'd say better to remove based on the content inside, so you know which content you're removing. If you're just trying to take content out of (x)html code, why not just copy the page (as viewed in a browser) and paste in a text editor? You'll only get the text then.

    I think I'm misunderstanding you.
     
    Stomme poes, Nov 21, 2007 IP
  3. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #3
    Looks like <pre><form> can be ruled out, I checked a few DOCTYPEs on that one. Bad example on my part.

    Removing things based on the content, defeats the entire purpose of [X]HTML to begin with, doesn't it ?

    I'm judging whether the context of certain elements is likely going to be outside of the context of the parents text.
    For instance <blockquote> will be removed because it is likely to be in the context of what someone else has said, not the context of the author of the text around it.

    I'm calculating Flesch-Kincaid Readability/Gradelevel, & Gunning-Fog Index scores from the text, to include the text which is outside of the context of the author, such as that found in <blockquote> will skew the results of the scores.

    I could just handle it with Notepad, but then the Javascript I'm working on wouldn't work. ;)

    By the way, I have somthing in place which handles what needs to be done, but the order in which it's done could become an issue with larger pages.
     
    joebert, Nov 21, 2007 IP
  4. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #4
    It's looking like just doing block-level elements, then doing inline elements is the simple solution.
     
    joebert, Nov 21, 2007 IP
  5. twistedspikes

    twistedspikes Notable Member

    Messages:
    5,694
    Likes Received:
    293
    Best Answers:
    0
    Trophy Points:
    280
    #5
    block level elements cant be inside inline elements.

    paragraphs can't be inside paragraphs.

    Theres a bunch of rules, i've got them somewhere, i'll try find them.
     
    twistedspikes, Nov 21, 2007 IP
  6. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #6
    I know, that's why I'm going to do block-level elements, AND THEN do inline elements. There's not many inline elements to do, mainly things like <del> <strike> & <acronym>.

    Do share though, I'm sure there's somthing to be learned. :D
     
    joebert, Nov 21, 2007 IP
  7. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #7
    Then again, I should expand <acronym> elements now that I think about that.
     
    joebert, Nov 21, 2007 IP