1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Valid Code Benefits Experiment

Discussion in 'Search Engine Optimization' started by walshy, Dec 5, 2005.

  1. #1
    It has been debated for a while now on DP whether or not having valid code helps get a site indexed quicker and is more attractive to search engine bots.
    It seems to me people can't decide if it helps with SEO or makes no difference, time for a definitive answer?

    I checked my email today and received an interesting newsletter. One article was particularly interesting . . .

    The full article is here

    http://www.marketingsherpa.com/sample.cfm?contentID=3130

    When a top 10 site like download.com sees radical results for changing to standards code it makes me sit up and listen!

    What I propose is that we set up an experiment, set up three websites, with marginally different content, targeting some obscure keyword phrase and see which does best.

    Site 1 has completely valid, clean, light-weight code structure with CSS layout.

    Site 2 has tables for layout, no docutype and validation errors (ie most websites ;) )

    Site 3 is the control neutral amount of valid code and code with errors, tables for layout and CSS.

    I think we could get some decent data from this, if other people agree and see that this would be a good experiment i don't mind paying for three domains and hosting them on my server. I will also install a PHP based stats package to track the bot visits.

    The target results I think we could look for are . . .

    1. Which site is crawled the deepest, and the fastest.
    2. Which site achieves SERPS the fastest
    2a. Which site is out of the sandbox first
    3. Which site achieves the highest SERP poss (by a certain deadline)
    4. Which engines show preference for one type of code structure

    On page experiment only so no backlinks just links in DP sig to get the bots to the site.

    I don't mind setting up all three sites but would prefer participation from someone who is convinced that valid, standards code is a complete waste of time as far as SEO and search engine bots are concerned ;) .

    I would really like advice from people regarding avoiding duplicate content penalty, obscure keyword phrase to target and making the experiment as water tight as possible.

    Good idea? Or do I need to get out more :eek:
     
    walshy, Dec 5, 2005 IP
  2. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #2
    Bear in mind that a site redesign usually comprises a lot more than W3C validation... I'm not sure download.com can conclude that W3C validation is the reason or even a primary contributor to the change in spidering.

    The other problem is that your three suggested groups differ in too many ways to allow you to conclude that W3C validation does or does not contribute to increased spidering.

    What validation errors are included for 2 and 3? Why the deletion of the DOCTYPE declaration in 2? Why are you including the CSS vs. tables comparison to confound the study?

    Ideally, a good experiment aloows one to evaluate the effects of a single variable or the independent effects and interactions of more than one variable while controlling ALL others.

    As described, yours falls a little short of that.
     
    minstrel, Dec 5, 2005 IP
  3. Djohn

    Djohn Peon

    Messages:
    75
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I would imagine, as you briefly mention, that the avoidance of duplicate content would have a serious impact. As Minstrel wrote, it kinda removes the 'all things being equal' part...
     
    Djohn, Dec 6, 2005 IP
  4. walshy

    walshy Banned

    Messages:
    124
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #4
    This is what I am trying to prove either way

    Like I said I'm looking for advice, these where just my suggestions I don't pretend to be an authority on the subject.

    A set of errors that according to certain people have no negative bearing on the ability to be crawled effectivley.

    Many people on these boards mention that even having no DOCTYPE doesn't matter to search bots and point to Google themselves having no DOCTYPE declared as proof that this doesnt have bearing. I just would like to know one way or the other.

    This goes back to the download.com example, they state that the move over to Standards design from table based layout as helping them get crawled deeper and with more frequency.

    What I would like to see from such a study is an indication that the bots prefer a site that has valid, clean code with css code as apposed to a site that has no DOCTYPE, tables for layout and un valid code.

    If the Clean site achieves vastly more visits and much deeper crawls than the standard (everyday) site we can deduce with a certain amount of acuracy that the clean site is being held in higher regard by the search engine bots than the standard site.

    The transitional hybrid site is indended to be a nuetral indicator to base a possitive and negative effect on.

    I understand what you say Minstrel but this could never be a scientifically acurate study becuase of the variables involved, it is mearly indeded to give a deeper undersytanding if clean code has clear benefit in terms of crawl depth and frequency.

    Perhaps this could be narrowed down to just a crawl test, same server environment, similar domains (perhaps one character out) same code structure i.e. H1 tags in same place accross all docs, and same content on site?
     
    walshy, Dec 6, 2005 IP
  5. Djohn

    Djohn Peon

    Messages:
    75
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    In any case it would make for very interesting reading, even if non-conclusive.
     
    Djohn, Dec 6, 2005 IP
  6. walshy

    walshy Banned

    Messages:
    124
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Glad you agree Djohn I too think we could get some interesting insights for an experiment along these lines. I wouldn't bother doing it though if people would discount it as not being credible?? Perhaps Minstrel can help devise a way so it stands up to the famous DP scrutiny? ;)
     
    walshy, Dec 6, 2005 IP
  7. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #7
    walshy, see above.

    Basically, while I know it's difficult to do well-controlled in vivo experiments, the best approach would be to take 3 or 4 pages, same basic content but move it around just enough to avoid duplicate content filter, try to make everything identical except for one factor at a time -- in other words, go for a planned series of tests -- if you try to answer all questions in one, you'll hopelessly confound it and any conclusions you try to draw will be meaningless.

    I'm not trying to be discouraging here - I'm just trying to help you do it in a way which makes it more than just an exercise in futility. I don't believe the spiders give a damn about W3C but I would love it if the next time someone trots that one out I could reply, "See walshy's experiment at www.xxx.com/test.html".
     
    minstrel, Dec 6, 2005 IP
  8. walshy

    walshy Banned

    Messages:
    124
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Thanks Minstrel I appreciate your advice and guidance. You're probably right on the W3C front, they really do pick up on nonsense that doesnt matter a bit.

    Perhaps the Standards recommendations promoted by http://www.webstandards.org/ should be refered to rather than W3C specifically

    Perhaps first experiment could be

    Scemantic, CSS layout, high content to code ratio vs traditional table layout pages with invalid DOCTYPES (as in average web site in todays SERPS)

    I would love to prove to a certain extent that the former was markedly better for achieving deeper and more frequent indexing. But I could be completely wrong of course.

    I think the key would be to aim for as close a document in terms of structure, content and size as possible which should remove certain variables? Any other ideas to tighten things up?

    In your opionion could this experiment be run off subdomains? Or would two different domains be better?
     
    walshy, Dec 6, 2005 IP
  9. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #9
    If you're asking a question about validation, keep the DOCTYPEs in there and valid. You can look at that issue in another test.

    Also suggest two pages, slight variations in title and page content, slight variation in filenames, same domain, same folder -- i.e., don't add any other variables into the experiment.
     
    minstrel, Dec 6, 2005 IP
  10. Interlogic

    Interlogic Peon

    Messages:
    451
    Likes Received:
    67
    Best Answers:
    0
    Trophy Points:
    0
    #10
    What about a longer term test that will really prove if a re-design is worth the effort. Setup a site that is "badly" designed in the first place with 10 pages or so. Setup 20-30 links to the site and leave it sit for 3-4 months to see how it ranks, then re-design it to standards and see if the ranking improves at all. (Maybe put some sort of rotating quote of the day on the homepage to keep the spiders visiting?)
     
    Interlogic, Dec 6, 2005 IP
  11. rmccarley

    rmccarley Peon

    Messages:
    231
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    0
    #11
    There is no good reason to have your site not validate. By writing valid code you ensure a more uniform user experience, you speed up development time, reduce troubleshooting and speed the load time of the pages (ok maybe by 1 second but every bit helps). These reasons alone make it worth doing. On the SEO front, I have 10 years web design experience. The sites that consistantly perform are xhtml 1.0 transitional. Tight structure, with use of emphasizing tags like <b> to highlight KWs.

    It makes sense that bots would be programed to understand standards-driven code first and then be patched to figure out the wierd stuff. I say make the job as easy as posible for the SEs - they will thank you for it in the SERPs.
     
    rmccarley, Dec 6, 2005 IP
  12. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #12
    Again, too many variables. I think it has to be two or more NEW single pages to eliminate issues with variations in number of internal and external links, nav structure, fluctuations over time in algorithms, etc.
     
    minstrel, Dec 6, 2005 IP
  13. walshy

    walshy Banned

    Messages:
    124
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Thats a very valid point I would agree with that entirely, I just want to prove it to some very sceptical people and hopefuly use it as a case study for promoting standards based design. (depending on results of course)
     
    walshy, Dec 6, 2005 IP
  14. walshy

    walshy Banned

    Messages:
    124
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #14
    My suggestion for keyword phrase to test is

    standards vs SERP standard

    Any thoughts?

    Go with what Minstrel suggests to start with, two pages as similar as possible, same domain, same title, slight changes in body content to avoid dupe content filter.

    Two links in my sig, one to each page

    Results to monitor - Which flavor of code gets the bots returning most often
    Which page achieves highest SERP for keyword phrase

    Simple test to start with then as Minstrel recommended build on that once results have been analysed. Thats if there is any difference to report!
     
    walshy, Dec 6, 2005 IP
  15. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #15
    Nobody is suggesting NOT writing "valid code". But that is NOT the same as "W3C validated code".

    Again, you are confusing standards with W3C recommendations, which have remained recommendations since like 1999. Why are they still only recommendations do you think?
     
    minstrel, Dec 6, 2005 IP
  16. rmccarley

    rmccarley Peon

    Messages:
    231
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    0
    #16
    OK, what is "valid" code if it isn't code that validates according to the W3C's DTD?
     
    rmccarley, Dec 6, 2005 IP
  17. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #17
    I know you already know this, but it would be pretty silly for search engines to put any sort of weight on "validated" code considering www.google.com itself is one of the worst bits of HTML code in terms of W3C validation.

    In some cases there are perfectly good reasons to not be a W3C validated code purest. For example speed... When you have a site that gets say 1000 page views per second and you start cutting every little bit of HTML code that doesn't "matter" (carriage returns, tabs, quotes around tag attributes, etc.), you start noticing a decent bandwidth savings/speed increase.
     
    digitalpoint, Dec 6, 2005 IP
  18. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #18
    Valid code is simply code that does not contain errors - all tags closed, quotation marks where they should be, elements spelled correctly, correct syntax.

    W3C validation includes a number of recommendations to avoid what they consider to be deprecated but which work perfectly well in all browsers and present no difficulties whatsoever to spiders. Among these are the use of tables, <b> and <i>, the use of target="_blank", etc., etc. Spiders have no problems with these. Neither do browsers. Ensuring that your code eliminates such elements or tags will do absolutely nothing for your human visitors or to help your ranking. All it will do is give you the right to display one of those little W3C buttons on the page.
     
    minstrel, Dec 6, 2005 IP
  19. rmccarley

    rmccarley Peon

    Messages:
    231
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Shawn - actually G's default home page does validate now and the number of errors in the rest of the site has dropped dramatically over the last year. Maybe they're moving toward something...? Also, MSN does validate.

    But I'll take your wrod for the second part - you would obviously know better than me! And good job for it!

    minstrel - you are reffering to the difference between "strict" and "transitional" DTDs. But the W3C itself acknowledges these differences by providing a transitional DTD to work from! Your site is still valid under the transitional banner.
     
    rmccarley, Dec 7, 2005 IP
  20. wrmineo

    wrmineo Peon

    Messages:
    3,087
    Likes Received:
    379
    Best Answers:
    0
    Trophy Points:
    0
    #20
    wrmineo, Dec 7, 2005 IP