Why do WYSIWYG editors hate HTML5?
What happens when Web browser vendors design new HTML features but don't consult with other parties such as vendors of WYSIWYG editors? We end up with a markup language that's difficult to author and prone to the same misuse as previous versions of HTML.
Background
Much of the HTML that we use today was simply created by browser vendors to support their own invented elements. If these elements became popular, W3C incorporated them into subsequent HTML specifications. This process is repeating itself today as browser vendors are again the only ones determining the future direction of HTML, leaving authoring tool vendors out of the conversation.
When less is more
HTML 4 has too many elements. Each element, or in some cases the attribute of an element, requires a button or a list box on a WYSIWYG editor toolbar. The screen shot below shows an overcrowded authoring toolbar that does not even include all of the features of HTML 4.

Now, with HTML5 we have additional new elements that do the same thing as the <div> element, such as <section>, <nav>, <article>, <aside>, <header> and <footer>. They too will probably each require their own button on the toolbar of WYSIWYG editors. But how many more buttons can a toolbar sustain before it becomes ridiculous?
Sometimes it means this, and sometimes it means that... and only under certain conditions
WYSIWYG editors (in fact all GUI applications) need simple, invariable rules. Unfortunately, HTML5 is full of rules with exceptions, or rules too complex for authoring tools to implement. Take the <time> element as an example. The <time> element has 3 different meanings depending on how it is used.
First, if it is used without the pubdate attribute, it provides a machine-readable date. For example:
<time datetime="2009-11-16">November 16, 2009</time>
Second, if used with the pubdate attribute that is not inside an <article> element, it indicates the publication date of the entire document. For example:
<body>...<time datetime="2009-11-16" pubdate>November 16, 2009</time>...</body>
Third, if used with the pubdate attribute that is inside an <article> element, it indicates the publication date only of the article. For example:
<article>...<time datetime="2009-11-16" pubdate>November 16, 2009</time>...</article>
As a result, a WYSIWYG editor must somehow create a single user-friendly interface that makes the user understand the 3 possible usages of the <time> element. And if that is not enough, there are more rules for the <time> element that push the limits of user-interface design further.
For example, the pubdate attribute is always optional. The datetime attribute is sometimes optional and at other times it is required. The datetime attribute is required when the pubdate attribute is used and when the element does not contain a string in the valid date time format. For example:
<time datetime="2009-11-16" pubdate>Monday</time>
But the datetime attribute is optional when the <time> element does contain a valid date time format. For example:
<time>2009-11-16</time><time pubdate>2009-11-16</time>
The <time> element can also be empty. For example:
<time datetime="2009-11-16"></time><time datetime="2009-11-16" pubdate></time>
A further rule states that no more than one <time> element can be permitted directly within an <article> element, or no more than one <time> element outside the <article> element. It is unclear how WYSIWYG editors are supposed to allow users to use copy and paste without breaking this rule.
Solutions in search of problems
HTML 4 has a satisfactory way of dealing with alternate text for images. Images are used for either decoration or as content. The alt attribute is required. If the alt attribute is empty, the image is used for decoration. If the alt attribute is non-empty, it contains alternate text for the image. So a WYSIWYG editor can build an interface to ask the user if the image is decorative or not and can then, depending on the user's selection, make the alternate text field required or can disable it as shown in the screen shots below.


But HTML5 makes the alt attribute optional, and the absence of the alt attribute means that the content author does not know what alternate text should be entered, if any. This leads inevitably to the assumption by users that alternate text is not important.
So how are WYSIWYG editors supposed to create a user-friendly interface that conveys the 3 possible states of alternate text for an image? Below is a screen mock-up that comes to mind:

But even an interface such as the one above is not that simple as it looks to implement, because HTML5 makes the alt attribute optional only if one of the following conditions are met:
- if the
title attribute has content - if the
<img> element is inside a <figure> element that contains a non-empty <dt> element - if the
<img> element is part of the only paragraph directly in its section, and is the only <img> element without an alt attribute in its section, and its section has an associated heading.
Going from bad to worse
Headings in HTML5 go from bad to worse. Headings are supposed to help readers navigate through long documents, to skip directly to sections of interest, and to break up long stretches of text. The mechanism for creating headings in HTML (<h1> to <h6>) has always been bad and practically no one is using headings correctly in HTML. Because headings are stand-alone elements, WYSIWYG editors cannot enforce their proper use and as a result content authors often use headings incorrectly as a way to format text or as a way to emphasize text so it jumps off the page.
Alas, HTML5 offers no solution. Indeed, the use of headings in HTML5 is so convoluted that after reading the spec several times and scouring third-party articles on the topic, I am not able to summarize the correct use of headings as per the HTML5 spec.
I did notice that there is a new heading element called <hgroup>, which must contain two or more <h1> to <h6> elements. I can only guess that this element is used to group two or more headings together, and then to hide one of these headings from the document outline. So you would use one heading element to create a kind of sub-heading and then use another element to hide the sub-heading. But why would you do that? Also, as a lead developer for a WYSIWYG editor, I cannot even imagine a user-friendly interface that we could build in order to let content authors apply the <hgroup> element as intended.
So how would a WYSIWYG editor vendor want to see headings implemented? Since headings don't make sense on their own, they should be part of a <section> element to form a compound construct, similar to <table>, <ol>, <ul> and <dl>. For example:
<section role="article"><heading>...</heading><content>...</content></section>
A WYSIWYG editor could implement this feature with just a single toolbar button as shown in a screen mock-up below.

And sections could be nested like this:
<section role="article"><heading>...</heading><content><section><heading>...</heading><content>...</content></section><section><heading>...</heading><content>...</content></section></content></section>
WYSIWYG editors can make authoring the above markup idiot proof, whereas HTML5 headings are next to impossible to author correctly.
Don't blame the WYSIWYG vendors
Building good authoring tools starts with a good markup language. Unfortunately, HTML5 makes building WYSIWYG editors with user-friendly interfaces that comply with the rules of the language, impractical or next to impossible. It does not have to be so. To the Web browser vendors designing HTML5 I say let's work together to improve the Web.
Comments are closed for this article.