Rebuilding The Web

Articles, advocacy, discussion and debate about the many problems of the Web and the challenges of rebuilding it.

Should error messages be displayed for corrupt HTML5?

One of the primary reasons HTML is authored incorrectly today is because Web browsers do not display error messages when processing corrupt documents (constructed or transmitted incorrectly). Web browser vendors like to refer to these kinds of error messages as "draconian", i.e. unduly punitive. Why?

What's wrong with error messages?

Do Web users need to be protected from error messages?

Error messages are the norm in general computer usage. Below is a screen shot of an error message that a Microsoft Word user receives when trying to view a corrupted Word or RTF document:

Error message reads: Word was unable to read this document. It may be corrupt. Try one or more of the following: Open and Repair the file. Open the file with the  Text Recovery converter.

Below is a screen shot of an error message a user receives when trying to view a corrupted PDF document:

Error message reads: There was an error opening this document. The file is damaged and could not be repaired.

Below is a screen shot of an error message a user receives when trying to view the contents of a corrupt Zip file:

Error message reads: Unable to unarchive 'corrupt.zip' into 'Downloads'. Error 1 - Operation not permitted.)

Are Web users more sensitive and less able to deal with error messages than users of Microsoft Word?

Are documents on the Web not important?

If an important document is missing content or is displaying content incorrectly because the document is corrupt, wouldn't you want to know? By not notifying users that the document being viewed is corrupt, we are re-enforcing some peoples belief that the Web is just a collection of rubbish. Without error messages, are we saying that if a document renders incorrectly, that's okay, and we should be happy with the parts of the document that do render correctly?

Are Web browsers so good at auto-correction that there is no need to disturb users with error messages?

Web browsers will render the following markup without notifying users of any errors:

  1. <html>
  2. <head>
  3. <title>...</title>
  4. </head>
  5. <body>
  6. <p>...</p>


Is the markup above simply missing the closing </body> and </html> elements, or is it missing most of its content as well? Browser auto-correction simply cannot fix all types of corrupted documents.

Will people stop adding content to the Web if they have to author HTML to specification?

Authoring tools (CMS, WYSIWYG editors, etc.) can be made to generate valid markup. Only when content is authored by hand will authoring errors occur and if Web browsers displayed error messages, then the only people who would see these errors would be the people making them as they test/review their work in a Web browser. So, would error messages really discourage people from adding content to the Web?

Can error messages make people use HTML5 correctly?

Without error messages, HTML5 will be used incorrectly, just as previous versions of HTML have been. But it does not have to be so. If Web site creators want to go on using previous versions of HTML any way they want, creating Tag Soup, they can do so. But if they want to use new features (i.e. HTML5), then they need to author HTML to specification. And new content written to HTML5 specification can be authored correctly if Web browsers start to display error messages when processing corrupt HTML5 documents. So is this a fair and reasonable way to use error messages or is it draconian?

Public comments

1. Posted by Richard
on Tuesday 2009-12-08 at 01:51:08 PST

There are valid reasons to start displaying error messages, however it would cause such disruption to the usual web experience that it would not work. As an analogy imagine if your car were to come to a graceful stop and display an error message on your screen every time you made a driving error. After about an hour of this every one would either turn it off or return their car to the showroom.

2. Posted by Robert
on Tuesday 2009-12-08 at 02:02:30 PST

Bring it on!

I, for one, am sick to death of explaining the benefits of web-standards compliant markup to disinterested front-enders. Let an error message deal with their apathy and let me move onto issues related to this Century.

3. Posted by Jens Meiert
on Tuesday 2009-12-08 at 02:07:52 PST

What’s the advantage for users? And doesn’t real XHTML, including the XML serialization of HTML 5, already offer you what you suggest?

We might have few options other than emphasizing professionalism, which includes insisting on quality code.

4. Posted by bluesix
on Tuesday 2009-12-08 at 03:29:37 PST

Unlike Word docs and *.zip files, HTML is often written by hand, therefore prone to errors. It would not be practical to show error messages for HTML, and would disturb the UX unnecessarily - there's nothing the end user can really do about the errors.

5. Posted by Rimantas
on Tuesday 2009-12-08 at 04:03:33 PST

How about learning HTML first and writing post about its deficiences later?

6. Posted by Jason Grant
on Tuesday 2009-12-08 at 07:37:36 PST

HTML isn't (always) written 'by hand' - it is usually generated by s**t tools which don't care about spewing out rubbish code.

I think displaying error messages in HTML documents would rapidly improve the quality of the web!

It would force developers to create tools which are easier to integrate perhaps and which can work more consistently across browsers.

Browser vendors could then also focus on making the rendering of proper HTML work as best as possible, rather than learning how to deal with f***ed up mark up all day long.

I am all in favour of bringing in error messaging into HTML5.

7. Posted by Vlad Alexander
on Tuesday 2009-12-08 at 09:02:07 PST

Richard wrote: "... displaying error messages, however it would cause such disruption to the usual web experience ..."

What about my argument: "Authoring tools (CMS, WYSIWYG editors, etc.) can be made to generate valid markup. Only when content is authored by hand will authoring errors occur and if Web browsers displayed error messages, then the only people who would see these errors would be the people making them as they test/review their work in a Web browser."

Jens Meiert wrote: "What's the advantage for users?"

Good point, this has not been clearly articulated in the past. I will talk about that in my next post.

bluesix wrote: "HTML is often written by hand"

I don't have the number to back this up, but I bet most of the content authored on the Web today comes from authoring tools.

bluesix wrote: "...there's nothing the end user can really do about the errors."

What about my argument above "...only people who would see these errors would be the people making them ..."?

8. Posted by Divya
on Tuesday 2009-12-08 at 09:02:17 PST

The issue with the applications you have cited is data corruption. Applications *cannot proceed* if data is missing which is why they throw the error. HTML "errors" you have cited are not caused due to missing data, but only they do not follow a standard. Browsers accept incorrectly coded pages because of Postel’s law: Be conservative in what you do; be liberal in what you accept from others (http://en.wikipedia.org/wiki/Robustness_Principle).

9. Posted by Vlad Alexander
on Tuesday 2009-12-08 at 09:23:17 PST

Divya wrote: "HTML 'errors' you have cited are not caused due to missing data"

In my article above, I did cite an error due to missing data. Here it is again:

  1. <html>
  2. <head>
  3. <title>...</title>
  4. </head>
  5. <body>
  6. <p>...</p>

Also, the wikipedia article you cited then goes on to say "Postel's principle is often misinterpreted as discouraging checking messages for validity."

10. Posted by Chris F.A. Johnson
on Tuesday 2009-12-08 at 09:51:58 PST

# <html>
# <head>
# <title>...</title>
# </head>
# <body>
# <p>...</p>

The only missing data in that example is the doctype.

11. Posted by Niels Matthijs
on Wednesday 2009-12-09 at 00:55:32 PST

But what about user generated content? If this screws up your html (always a possibility) than everyone visiting that page will start getting the error message.

I believe it's an issue for us front-end developers, not for our users. If we shift the problem to them, we're just running away from our responsibilities. Which would be very nice indeed, but not very professional.

12. Posted by Andy
on Wednesday 2009-12-09 at 03:48:39 PST

Notice those error dialogs only have 'OK' buttons. Faced with an error when opening a Word doc, an average user would first be horrified, then blame Word, look for a backup version of the file, then call a friend or search for help. Regardless of whether it's recoverable they can take personal actions to fix the problem.

Contrast this with opening a web site. If error dialogs appear, what actions can a user take? The website isn't theirs, and the only item they're in control of is the browser. They're basically being pestered for someone else's mistake, and their only short-term option is to switch browser or website. Neither may be an option.

The element of frustration is ten times worse if the browser merrily rendered the page after closing the error message. This is what my copy of Word does every time I open a docx file.

The second and probably more important question is what constitutes an error: is it a missing solidus? An unclosed tag? Missing entitity escape? Invalid character for supplied charset? None of these are errors that actually affect users, but they were considered Errors of Death in XHTML. I'd be more concered about errors that actually affect users: missing labels around inputs, missing H1s, useless or repeated page titles, table layouts, use of unnecessary scripting, or using divs instead of semantic tags. None of these are errors in an official sense -- they're hard to detect and almost always subjective -- but they're issues that affect real people.

It'd be much better for browsers to provide a "grading" for web sites based on heuristics that detect good separation of content & presentation, use of suitable structural elements, accessibility coverage, etc. Tag soup detection would certainly contribute negatively to the grade. Useful extras such as accesskeys could even trigger "feature" icons that appear in the statusbar to highlight websites that go the extra mile.

13. Posted by Vlad Alexander
on Wednesday 2009-12-09 at 09:26:48 PST

Niels Matthijs wrote: "But what about user generated content?"

Can you please provide examples?

Andy wrote "If error dialogs appear, what actions can a user take?"

End-users would not normally see these error messages - see my argument above. But if they should see these error messages, they can get the browser to recover the data (i.e.: process the page as Tag Soup) as shown in the screen shot below from Opera.

Error message reads: Error! XML parsing failed. XML parsing failed: syntax error (Line: 9, Character 1) Reparse document as HTML

Andy wrote: "Useful extras such as accesskeys could even trigger 'feature' icons that appear in the statusbar to highlight websites that go the extra mile."

That would be a move in the right direction!

14. Posted by mattur
on Wednesday 2009-12-09 at 12:41:29 PST

If browsers had used this error display technique from the beginning, people would not have adopted using XHTML as text/html, and XHTML would probably have died a lot sooner. So that would be one advantage :-)

15. Posted by experttease
on Wednesday 2009-12-09 at 15:54:09 PST

On the whole I agree with the idea. If it matters, I am probably a 'power' user and not your average surfer, but my perspective is of the user and not the developer.

I don't think you can directly import the same scenario found in other desktop apps that read files such as .doc and what have you, where there's an error dialogue imposed on the user at a point in a linear process: click OK or Cancel or close the program. This would not be adopted by any browser if it affected users.

Perhaps contrary to your suggestion Vlad, I rather like the idea that this could be part of any web surfer's landscape if they wish it to be (a feature to be toggled on or off, which gives the user more info or less). Why? There are plenty of times when something doesn't work, or looks odd on a page and there's no explanation to the user, not even a hint. If however your browser notified you, perhaps in the URL bar with an 'info' icon, with a mouse-over message along the lines of: "this page contains errors, you may not see the page as it was intended." (..and if it finds a relevent email address: "would you like to contact the webmaster?") then I think this could help mature the web.

Browsers should be honest and tell the users what they are browsing, this doesn't mean preventing them from browsing it at all. I think Andy's ratings suggestion could work with the idea I've given above.

As for the question of 'what the user should do about it', well I think we instead could ask 'how do we best inform the users while also maintaining a streamlined surfing environment?'

I think in this way the errors would be exposed but not inhibitively of the user should they see it, while also helping developers spot various problems during development (don't ask me what, I don't develop, myself).

If we ask ourselves how ie was allowed to do what it did to the quality and development of the web, I think one of the key reasons was because 99% of users were oblivious to why the content they were viewing didn't always work. They just left frustrated with no explanation. You can only introduce real standards if they have a price on their heads, so to speak, meaning that they have a value everyone can understand. (sorry if I'm just repeating the content of your article, Vlad.)

Bringing the quality of the web to the forefront of browsing will make users more knowledgeable about the environment they're in without them having to take a course in web development. Just stick to the language users need and understand and you can't go wrong.

Of course, applying these quality checks to previous standards code may not help, but for the modern web, i.e. HTML5, CSS3 and so on, it may just work to keep them at the high standard we all dream of.

16. Posted by Lachlan Hunt
on Wednesday 2009-12-09 at 21:58:16 PST

What you're suggesting is simply not going to happen. There are far too legacy documents on the web containing errors, and introducing more DOCTYPE sniffing to try and distinguish them from those that claim to be HTML5 is a bad idea.

Browsers are free to display error messages in their error console, where they can be accessed by the developer. Developers already frequently use those consoles for JavaScript and CSS errors, so it would seem sensible for them to include HTML error messages too.

17. Posted by Vlad Alexander
on Wednesday 2009-12-09 at 23:46:07 PST

Lachlan, elsewhere you wrote "more calls for draconian error handling in HTML, but still no new or valid arguments to back it up". The arguments are not new - I searched old mailing lists and blogs to compile a list of arguments for error messages. I, however, was not able to find out why these arguments are not valid. Are you able to take each of the 4 points I mention in the article (repeated below) and provide solid arguments against them?

  • Do Web users need to be protected from error messages?
  • Are documents on the Web not important?
  • Are Web browsers so good at auto-correction that there is no need to disturb users with error messages?
  • Will people stop adding content to the Web if they have to author HTML to specification?

18. Posted by Niels Matthijs
on Thursday 2009-12-10 at 00:53:35 PST

Vlad Alexander wrote: "Can you please provide examples (of user generated content)?

Well, consider this comment form. Image you going to bed, five minutes after you're resting your eyes I post a message and manage to slip in a /p ... even though it invalidates your code, nothing can be seen in the design. So would you like the idea of users getting error messages the whole night long simply because I messed up my comment?

Or think of Digg-like service. Assume you can add online bookmarks. As a feature, it checks the meta description of the page and includes it as extra information (much like google does). What if the meta description contained invalid html. You can't control all user-generated content, unless you strip it from all possible html, and even then.

And I still don't like the idea of bothering users with such error. Just think of your family surfing the web ... most of them won't even be bothered if the design is completely screwed. The sites my dad shows me from time to time effectively hurt my eyes, but he finds them useful because the information on them is solid. Why bother my dad with the fact that the html of those pages is faulty ...

19. Posted by Mike
on Thursday 2009-12-10 at 06:16:46 PST

I would suggest that the main reason for displaying a message to users of a desktop application is the expectation that the user is able to do something about it - whether it be going back to a previous version, or asking the author to send you a new copy.
On the web, this is never likely to be practical, and all the user is interested in is the content not the markup, and since there are so many ways for the page display to be broken without generating an error it seems completely pointless.

20. Posted by Vlad Alexander
on Thursday 2009-12-10 at 09:50:19 PST

Vlad Alexander wrote: "Can you please provide examples (of user generated content)?"

Niels Matthijs wrote: "Well, consider this comment form. Image you going to bed, five minutes after you're resting your eyes I post a message and manage to slip in a /p ... even though it invalidates your code"

Neither that nor any other content entered into the comment field will invalidate the markup on this site. However, some blogs/forums do let end-users intermix content with HTML (or proprietary tags) written by hand. It is up to those applications to validate any data they accept from users.

Niels, my argument is that error messages should be displayed for corrupt HTML5. If developers for some reason cannot control what is rendered on their Web site, then they can continue to use previous versions of HTML as Tag Soup. But I bet that once browsers start to display error messages for corrupt HTML5, you will see new WYSIWYG editors that generate valid HTML and there will not be any need for end-users to author HTML by hand in forums and blog comments.

Mike wrote: "...all the user is interested in is the content not the markup..."

Are you saying that users don't want to know that content may be corrupt or missing? As I say in the article "... be happy with the parts of the document that do render correctly".

21. Posted by Brett Taylor
on Monday 2009-12-14 at 18:21:19 PST

* We need better software in CMSes and comment systems for processing marked-up content from users to valid HTML.
* We need browsers to display invalid markup errors for all documents loaded from local disk (ie, file://) by default, with the option to turn this off (for the session or forever, in case they ARE a user)
* We need browsers to provide a developer's config option which will display the same errors for pages viewed on specific servers.
* We need education to tell people how to and why they should turn this option. An idea could be to put this information with the local document error.

Browser manufacturers do not want to be the first to display developer errors to normal users: users will switch back to Internet Explorer when the internet seems to be more 'broken' when using Firefox/Safari/Opera. Browsers hide JavaScript errors from the users these days for a reason, y'know.

22. Posted by Breton
on Tuesday 2009-12-15 at 13:31:47 PST

Shall I suggest that we start with IE displaying errors? Then, everyone will switch from using IE to firefox, or some other browser!

On the other hand, they're probably more likely to just hear that the new version of IE is shit, it reports errors all the time for no reason! don't upgrade!

You're building quite an epic and misguided fantasy on this website. My only concern is that you're conning other people into believing that it's possible, or even a good idea.

By the way, I don't find argument ad populum (Everyone else is doing it!) very convincing to begin with. Maybe everyone else is wrong? Maybe, if I recieve a slightly corrupt document, I want to be able to recover as much as I can out of it, instead of being told "It's broken. Can't be fixed. Sorry".

It's like if we printed all our documents on combustion paper, and the slightest rip, dent, or scuff would cause the whole thing to neatly dissapear in a puff of flame and smoke! Given this behavior, what exactly is the advantage of storing our documents electronically again? Aren't we better off printing everything out?

23. Posted by breton
on Tuesday 2009-12-15 at 13:34:49 PST

Anyway, we already have a system whereby only developers see errors and users don't. It's called a validator, and you can get one for your favorite text editor, or just use the one at http://validator.w3.org/

24. Posted by Samuel Milton
on Friday 2009-12-18 at 05:51:14 PST

The problem as i see it is the WYSIWYG-editors. These are used by people not familiar with HTML-coding at all. Pasting MS Word content and so on.

And no, i have not seen one WYSIWYG-editor good enough. Browsers must continue to "do the best of what they get". Its a pity, but the reality.

25. Posted by lucideer
on Saturday 2009-12-19 at 18:21:32 PST

The entire "stigma" around "draconian" error messages is utter nonsense. Client-side web developers are ABSOLUTELY UNIQUE amongst programmers in working in an environment that tolerates such disgustingly poor coding standards as are seen on the web today.

Samuel Milton (above) comments that WYSIWYG's are to blame - this is backwards. True, I have not seen one WYSIWYG editor good enough either, but the fact is, if we had had "draconian" error handling from the beginning we WOULD have WYSIWYG editors good enough as they would NEED to be.

As for those arguing that (average Joe) users simply want to see content, Google's Chrome(-ium) browser has solved this issue very effectively in the context of malformed XHTML served as application/xhtml+xml - it (unlike others, such as the example image posted by Vlad in a comment above) shows both content (rendered after falling back to the HTML renderer) AND the error. This would be perfect, as bad code would not deprive users of content, but would still demand fixing and eventually lead to much improved code quality across the web.

Comments are closed for this article.

Main menu

Check out the a11y bugs project that aims to help browser / tool vendors fix accessibility bugs.