Rebuilding The Web

Articles, advocacy, discussion and debate about the many problems of the Web and the challenges of rebuilding it.

Why is valid HTML important to everyone?

The Web works with valid and invalid HTML. So why is valid HTML important? And how does invalid HTML affect everyone who uses the Web?

Why is it important to write HTML to standards?

Technical standards are the bedrock of innovation and it is a recognized fact that smaller companies often lead in the creation of innovative technologies. Standards create a level playing field on which these smaller companies compete with giants. Non-standard use of HTML on the Web thus makes it extremely difficult for smaller companies to compete with their larger competitors, because an enormous amount of their development resources must go to dealing with invalid HTML. The result is that we have less competition, a limited choice of applications, and fewer innovative technologies.

Invalid HTML limits innovation

In 1996, I worked for company that built a standards-compliant Web browser. It took two programmers and a small testing team about a year to build. It had some innovative features and added something new to the browsing experience. By comparison, the Internet Explorer team at the same time comprised 100 people (several years later over 1,000 at cost of $100 million per year). While Microsoft had dedicated developers working on routines to parse/render invalid HTML, smaller companies like the one I worked for could not afford to dedicate a special team of developers to processing invalid HTML. So the non-standard use of HTML created real barriers to competition for smaller companies.

Today, primarily because of non-standard use of HTML, the costs of building a Web browser rendering engine have become so absurd, that even giants like Apple and Google find it too difficult (i.e. too expensive) to build such engines from scratch, and both Safari and Chrome are using a rendering engine that was derived from a third-party software library.

And do you ever wonder why so many WYSIWYG editors suck? Web browsers have it easy; all they have to do is render invalid HTML. But WYSWIYG editors have to render invalid HTML and at the same time must make some kind of sense of it in order to edit it. So how do some smaller companies manage to produce WYSIWYG editors? Most of them use third-party editing controls or libraries that significantly limit the features these WYSIWYG editors offer to users. The company I work for managed to build an innovative WYSIWYG editor from scratch, but at the cost of not accepting invalid HTML.

In the CMS (content management) field, there are lots of smaller vendors, so why does it appear that they are not affected by invalid HTML? The reason is that they treat HTML content as though it's binary data and are therefore unable to offer innovative features that would require them to manipulate the HTML that they manage. For example, take what sounds like a very simple feature to implement - a change of class name in a CSS file managed by a CMS that is automatically propagated to thousands of HTML documents managed by the CMS. In reality this feature is extremely difficult to implement reliably in invalid HTML documents.

Everyone who uses the Web is negatively affected by invalid HTML

Whether you use a Web browser, a WYSIWYG editor, a CMS, a screen reader, a search engine, or HTML email, your Web experience is affected as a consequence of the invalid use of HTML. Enormous amounts of development resources continue to be sapped by the need to deal with invalid HTML; resources that could have been invested in developing new features and making the Web easier to use. Likewise, the limited choice of applications available to you today is in large part a consequence of innovation being stifled in smaller companies that not able to compete with giants, because of the staggering amount of their development costs that is consumed by processing invalid HTML.

When standards are not followed, everyone loses.

Public comments

1. Posted by dani
on Wednesday 2009-12-16 at 04:49:33 PST

Vlad, if valid X/HTML is important, why W3 and Validome validator still have some different results? Eg. when validating strict and application/xhtml+xml.

I knew that Opera 9+ rendering named-entity differently on 'true XHTML' (application/xhtml+xml).

2. Posted by Vlad Alexander
on Wednesday 2009-12-16 at 09:12:50 PST

dani wrote: "if valid X/HTML is important, why W3 and Validome validator still have some different results?"

This is something that you should ask the developers of those products. Keep in mind:

  • The X/HTML spec leaves a lot to interpretation
  • Products have bugs
  • Products have varying levels of support for given features

dani, my argument is that the development costs of dealing with invalid HTML are prohibitive for small companies and discourage them from making innovations in Web technology.

3. Posted by Kevin O'Gorman
on Wednesday 2009-12-16 at 16:27:41 PST

Thanks for these brief but pertinent thoughts. Having struggled with the excessive restrictions imposed by the use a particular CMS used to manage and deploy a governm web site over the last four years I well understand the problems you talk about. Let's hope that someone sees sense in thew long term.

4. Posted by porscha
on Wednesday 2009-12-16 at 22:04:04 PST

Good article. What I love about valid HTML is it's easy to transfer from one place to another especially when a company decides to change it's look/feel + cms.

5. Posted by Dani
on Thursday 2009-12-17 at 02:03:09 PST

And Giants still do not care. I think the idea is similar with Jason Grant post, Big corporations will kill the Web.

6. Posted by Vlad Alexander
on Thursday 2009-12-17 at 20:55:11 PST

Thanks for sharing the link with us Dani. Jason's article brings up a number of interesting points that I hope he will explore in more detail in subsequent posts.

7. Posted by TJ Downes
on Saturday 2009-12-19 at 17:31:17 PST

All good points. Another reason to use valid HTML is so that people who have to maintain your code after you aren't going to get violent with you if they ever meet you :)

It's really a shame that there's no way to enforce valid HTML on the web currently. One of the biggest issues with HTML and JavaScript, in mu opinion, is the lack of strictness you get with compiled code and server-side code. It is yet another flaw in the internet, along with inherent issues in TCP/IP, HTTP, SSL, SMTP and FTP. These technologies were developed for a very different internet and unfortunately they haven't been fixed to meet the stringent demands of communication, web development and web application development.

8. Posted by John Dowdell
on Sunday 2009-12-20 at 12:23:18 PST

Good point there, about how the additional complexity of varietal markup increases costs for toolmakers, reducing diversity of solutions.

Same argument holds for WhatWG's "HTML5" -- browser makers who wish to be compliant must now construct an identically-performing multimedia engine, as well as the usual hypertext engine. Instead of agreeing on a set of simple and attainable standards, they're creating a monolithic description of their current engines. Raises a barrier against future competition.


Comments are closed for this article.

Main menu