Elemental XHTML

Examining XHTML 2.0 attribute vs. element decisions

Editor: Micah Dubinko (mdubinko at yahoo dot com)

Last update: November 12, 2002

1. Introduction

“[The] DTD for Internet drafts puts human text in attrs. bummer, that.” -- Dan Connolly

One reason often given for keeping human-readable text out of attribute values is for internationalization reasons. The W3C XML Schema Part 2 Recommendation, under the xsd:string datatype, says:

NOTE: Many human languages have writing systems that require child elements for control of aspects such as bidirectional formating or ruby annotation.... Thus, string, as a simple type that can contain only characters but not child elements, is often not suitable for representing text. In such situations, a complex type that allows mixed content should be considered.

In so many words, “don't use attributes for human-readable text”.

The question “elements vs. attributes” has raged on since about 5 minutes after elements and attributes both existed. Some of the element-vs-attribute decisions in HTML date back to what is essentially antiquity. This paper examines what XHTML would look like with a few possible constraints on the element-vs-attribute design. The purpose of this is for exploration, not advocacy. The constraints are:

2. Attribute types subject to change

The following attributeTypes require change

AttributeType

Used in XHTML2?

Human-readable (HR)

Space-separated (SS) list

Comma-separated (CS) list

Micro-parse (MP) value

CDATA

Yes

X




PCDATA


X




IDREFS

Yes


X



NMTOKENS



X



Charsets



X



Class

Yes


X



ContentTypes




X


Coordinates

Yes



X


Length

Yes




%

LinkTypes

Yes


X



MediaDesc

Yes



X


MultiLength

Yes




*

MultiLengths




X

*

Text

Yes

X




URIs

Yes


X



URI List

Yes



X




(Note that a few of these are defined but never used in XHTML 2.0, and there's some confusion in the spec among CDATA, PCDATA, and Text.)

By looking for attributes with these datatypes, a few immediate consequences can be discerned.

One large-scale change is with the 'title' attribute, which is defined on nearly every element. One way to do this is to define a new content model “advisory”, which would be by default not be rendered. (Of course, stylesheets would remain free to do as they please with any element.) XHTML already has a <title> element, currently defined as a required element in the <head>. Allowing <title> elements throughout the document would seem to be a straightforward progression. The end of this document has several examples.

Another interesting case it the 'class' attribute, which can take a space-separated list of classes. In theory, it would be better to use markup to delimit the list. Examples of this are shown later.

Other than changes related to 'class' and 'title', the following elements would also need to be adjusted:

Element

Reasons

<a>

SS ('rel' and 'rev'); CS ('coords')

<area>

HR ('alt'); CS ('coords')

<link>

SS ('rel' and 'rev'); CS ('media')

<object>

HR ('standby'); CS ('archive'); MP ('content-length')

<param>

HR ('value')

<style>

CS ('media')

<table>

HR ('summary')

<tr>

HR ('abbr'); SS ('headers'); CS ('axis')

<td>

HR ('abbr'); SS ('headers'); CS ('axis')



Nine elements in all; this was less than I thought it would be.

3. Examples of Elemental XHTML

<p><title>My favorite opener</title>Call me Ishmael.</p>

<p><title>title one</title>What does this do?<title>title two</title></p>

<p>I <em><title>emphasis added by us</title>really</em> want to go!</p>

<br>
  <title>Three days passed...</title>
  <class>separator</class>
  <class>compact</class>
</br>

<head><title>My Document</title></head>
<!-- note that currently the head element allows a title attribute.
This becomes redundant -->
<p>Further details <a href='..'><rel>Alternate</rel><rev>Index</rev>
<title>The Wumpus Website</title>here</a></p>
<object classid='http://www.observer.mars/TheEarth.py'> <title>The Earth as seen from space</title> <standby>Loading applet...</standby> <!-- Else, try the MPEG video --> <object data='TheEarth.mpeg' type='application/mpeg'> <standby>Loading movie...</standby> <!-- Else, try the GIF image --> <object data='TheEarth.gif' type='image/gif'> <!-- Else render the text --> The <strong>Earth</strong> as seen from space. </object> </object> </object>
<table> <summary>This table charts the number of cups of coffee consumed by each senator, the type of coffee (decaf or regular), and whether taken with sugar.</summary> <caption>Cups of coffee consumed by each senator</caption> <title>Last update, Nov 1</title> <tbody> <tr> <th id='t1'><abbr>Name</abbr>Senator Name</th> <th id='t2'>Cups</th> <th id='t3'><abbr>Type</abbr>Type of Coffee</th> <th id='t4'>Sugar?</th> </tr> <tr> <td><headers>t1</headers><title>Dem, ND</title>T. Sexton</td> <td><headers>t2</headers>10</td> <td><headers>t3</headers>Espresso</td> <td><headers>t4</headers>No</td> </tr> ... </tbody> </table>



4. Conclusions

Having done this, my conclusions so far are:

5. Possible Interesting Directions

Due to the exploratory nature of this paper, this section is subject to changes and suggestion.

One possibility is to leverage mixed content to provide a substitute for character entities—using elements. For example:

<div>This is copyright <char:copy/> 2002.</div>



Different predefined character sets (HTML, MathML, etc.) can be assigned different namespaces. Additionally, for environments that don't understand character elements, inner text content can be used:

<div>This is copyright <char:copy>(c)</char:copy> 2002.</div>

A processor that understands character elements would suppress the inner content. A processor that doesn't understand it could safely ignore it.

More?