Archive for October, 2007

Monday, October 29th, 2007

Five percent rules

Many things in life are simpler when you only need to be within 5%:

  • Pi is pretty much 3
  • Water weighs pretty much 8 pounds a gallon
  • A quart is pretty much a liter (and a gallon, 4 liters)
  • A year has pretty much 360 days, and pretty much 31 million seconds
  • The speed of light is pretty much 300,000 km/s, which is pretty much one foot/nanosecond

Of course, there’s even more things that get more convenient when you have 10% or 20% to work with… -m

Thursday, October 25th, 2007

Will Leopard run OK on an 800Mhz G4?

I see from the system requirements that Leopard requires an “866” Mhz processor. Is this a hard limit, or just an advisory? My first Mac–the one I wrote the book on–is a lowly 800 Mhz. box. Is it worth trying to upgrade it? -m

Monday, October 22nd, 2007

Is there fertile ground between RDFa and GRDDL?

The more I look at RDFa, the more I like it. But still it doesn’t help with the pain-point of namespaces, specifically of unmemorable URLs all over the place and qnames (or CURIEs) in content.

Does GRDDL offer a way out? Could, for instance, the namespace name for Dublin Core metadata be assigned to the prefix “dc:” in an external file, linked via transformation to the document in question? Then it would be simpler, from a producer or consumer viewpoint, to simply use names like “dc:title” with no problems or ambiguity.

This could be especially useful not that discussions are reopening around XML in HTML.

As usual, comments welcome. -m

Saturday, October 20th, 2007

Building a tokenizer for XPath or XQuery

In researching for an XPath 2.0 implementation, I ran across this curious document from the W3C. Despite being labeled a Working Draft (as opposed to a Note), it appears to be a one-shot document with no future hope for updates or enhancements.

In short, it outlines several options for the first stage or two of an XPath 2.0 or XQuery implementation. (Despite the title, it talks about more than just a tokenizer; additionally a parser and a possible intermediate stage). Tokenizing and parsing XPath are significantly more difficult than other languages, because things like this are perfectly legitimate (if useless):

if(if) then then else else- +-++-**-* instance
of element(*)* * * **---++div- div -div

The document tries to standardize on some terminology for various approaches toward dealing with XPath. The remaining bulk of the document sketches out some lexical states that would be useful for one particular implementation approach. I guess the vibrant, thriving throngs of XPath 2.0 developers didn’t see the need for this kind of assistance.

In short, I didn’t find it terribly useful. Maybe some readers have, though. Feel free to comment below. Subsequent articles here will describe how I approached the problem. Stay sharp! -m

Tuesday, October 16th, 2007

Come to mead appreciation and brewing class

If you’re in the South Bay and like mead, you need to check this out. -m

Monday, October 15th, 2007

Stephen Colbert…

…should write more opinion columns. :-) -m

Monday, October 15th, 2007

XForms evening at XML 2007

Depending on who’s asking and who’s answering, W3C technologies take 5 to 10 years to get a strong foothold. Well, we’re now in the home stretch for the 5th anniversary of XForms Essentials, which was published in 2003. In past conferences, XForms coverage has been maybe a low-key tutorial, a few day sessions, and hallway conversation. I’m pleased to see it reach new heights this year.

XForms evening is on Monday December 3 at the XML 2007 conference, and runs from 7:30 until 9:00 plus however ERH takes on his keynote. :) The scheduled talks are shorter and punchier, and feature a lot of familiar faces, and a few new ones (at least to me). I’m looking forward to it–see you there! -m

Thursday, October 11th, 2007

Hacking Facebook

I didn’t get to do much for Yahoo Hack Day, but I did get to help a coworker a teeny bit with an implementation of Y! Search for social web sites, including Facebook. There could be some interesting repercussions from that, so I won’t say more now. But what did surprise me is how many Yahoos are active on Facebook.

Myself–I’m still a Facebook curmudgeon. But mostly I simply haven’t had the time to check it out, or figure out the value proposition of accepting an invitation. -m

Monday, October 8th, 2007

XML 2007 Schedule

As widely reported by now, the final schedule for XML 2007 this December in Boston is up. All I have to add is the suggestion of careful attention to the Tuesday program at 4:00. :) If you can’t wait, some technical details are forthcoming in this space. That is all. -m

Friday, October 5th, 2007

Playing with microformats

I’ll be doing some experimenting around here over maybe the next week or two. Specifically, setting up hAtom within these pages. Watch for falling debris and report any unusual observations. -m

Wednesday, October 3rd, 2007

XML Annoyance: do greater-than signs need to be escaped?

Let’s see how many downstream pieces of software trip over this post…

Do greater-than and less-than signs need to be escaped in XML? Conventional wisdom has it that less-than signs always do, since that character starts a fresh “tag”, but greater-than signs are safe.


There is a particular sequence, namely ]]> , not allowed to occur unescaped in XML “for compatibility“–a particular phrase the spec uses to indicate rules that only an SGML-head could love (but still strict requirements nonetheless). Does your software prevent this condition from causing an error? -m

Monday, October 1st, 2007

simple parsing of space-seprated attributes in XPath/XSLT

It’s a common need to parse space-separated attribute values from XPath/XSLT 1.0, usually @class or @rel. One common (but incorrect) technique is simple equality test, as in {@class=”vcard”}. This is wrong, since the value can still match and still have other literal values, like “foo vcard” or “vcard foo” or ” foo vcard bar “.

The proper way is to look at individual tokens in the attribute value. On first glance, this might require a call to EXSLT or some complex tokenization routine, but there’s a simpler way. I first discovered this on the microformats wiki, and only cleaned up the technique a tiny bit.

The solution involves three XPath 1.0 functions, contains(), concat() to join together string fragments, and normalize-space() to strip off leading and trailing spaces and convert any other sequences of whitespace into a single space.

In english, you

  • normalize the class attribute value, then
  • concatenate spaces front and back, then
  • test whether the resulting string contains your searched-for value with spaces concatenated front and back (e.g. ” vcard “

Or {contains(concat(‘ ‘,normalize-space(@class),’ ‘),’ vcard ‘)} A moment’s thought shows that this works well on all the different examples shown above, and is perhaps even less involved than resorting to extension functions that return nodes that require further processing/looping. It would be interesting to compare performance as well…

So next time you need to match class or rel values, give it a shot. Let me know how it works for you, or if you have any further improvements. -m