Is HTML on the web a special case?


Some random thoughts and responses to lots of blog discussion sparked by the XML2 article, where I asked “Is HTML on the Web a special case?”

By which, I mean, if you go through all the effort of writing down all the syntax rules used by the union of browsers that you care about, then go through the pain of getting consensus within a standards body, will the resulting document be useful beyond HTML on the Web, much like how XML is useful beyond being a vehicle for XHTML?

I don’t know if Tim Bray had that same version of the question in mind, but he answers “obviously ‘yes'”.

But I don’t think so. Once you have that set of rules, wouldn’t it be useful in other areas, say, notoriously RSS on the web? SVG? MathML? In fact, I’d go as far as saying that any hand-authored markup would be a candidate for XML2 syntax.

What about mobile? Anne van Kesteren responds:

in that article Micah Dubinko mentions mobile browsers living up to their premise and all that. What he says however, isn’t really true. Mobile browsers and XHTML is tag soup parsing all the way.

He links to this page, which does a rather poor job of making a point the author seems to have decided upon before starting the experiment. If you look at the specific test cases, one tests completely bizarro markup that no author or tool I can imagine would ever produce. Another test checks the handling of content-type, not markup. On the other axis, the choices there seem a bit jumbled: lists of user-agent strings, one for stock Mozilla, and a footnote indicating confusion about what browser is really in use. If anything, this page shows that the browsers tested here, with the exception of Opera Mini, are crap. If you spend more than a few minutes in mobile, you’ll discover this widespread trend. (And I’m working on a solution…watch this space).

Look at this from a pragmatic viewpoint. Check the doctype used on Yahoo! front page vs mobile front page. Despite the poor browsers, XHTML adoption is still farther ahead on the mobile web then the desktop web.

The last thing nagging at me (for now) is whether XML2 will have an infoset. Will it be possible to use XPath, XQuery, and XML tools on XML2 content? How well will these map to each other? In the strict sense, no, XML2 won’t have a conforming infoset because it will never include namespaces. But might it support a subset of the infoset? (Would that be a infosubset?) That’s a huge open question at this point. -m

Related Posts

6 Replies to “Is HTML on the web a special case?”

  1. Hey Anne,

    You’re in a better position to answer that than I am. :-) But all the HTML parsing rules I’ve seen have avoided namespaces and URI baggage. At best, they allow meaningless xmlns attributes. (which is a good thing in my book)

    This is *probably* a rich enough topic to merit a future article…

    Thanks! -m

  2. Hi,

    Sorry if I wasn’t clear. I wasn’t aiming for a tone of ‘surprise’. :) My goal is to get lots of folks to look at the issues here, rather than digging in behind one side or the other.

    I agree with your premise that “For a mobile developer, there is a clear reward in going with XHTML. You are much less likely to have your page break on a random phone if you stick to XHTML MP than you are if you go with HTML.”

    That’s why I disagreed with the methodology of that little mobile experiment I linked to–it’s true that many browsers fail to implement XHTML flawlessly, but despite this, XHTML is still farther ahead in mobile. In that respect, XHTML is living up to it’s goal of enabling smaller, simpler devices. -m

  3. Micah, ah, that’s certainly true for HTML parsers. If there’s ever an XML 2.0 though with graceful error handling as proposed it will most certainly be able to handle namespaces.

    This can’t really be merged with HTML parsing though. Although I suppose parts could be shared.

    Regarding mobiles and XHTML. What are you basing your statements on? Mobiles can just as well handle HTML. That they perhaps only support a subset of the elements is a different issue and has not much to do with the syntax you express these features in.

  4. Test cases often have “completely bizarro markup that no author or tool I can imagine would ever produce”. The purpose of test cases is to test things, not to reflect what authors or tools produce. I didn’t choose which browsers were to be tested; I asked on a forum if people could test their mobiles, so the list of browsers should reflect what people actually use. The “confusion about what browser is really in use” was due to some testers not reporting the UA strings along with their results.

    If my research didn’t convince you then I encourage you to do your own research. Henri Sivonen has a more complete set of tests at


Comments are closed.

© All Right Reserved
Proudly powered by WordPress | Theme: Shree Clean by Canyon Themes.