Archive for December, 2007

Monday, December 31st, 2007

Should documents self-version?

This blog page at the W3C discusses the TAG finding that a data format specification SHOULD provide for version information, specifically reconsidering that suggestion. As a few data points, XML 1.1 (with explicit version identifiers) is something of a non-starter, while Atom (without explicit version identifiers) is doing OK so far–though a significant revision to the core hasn’t happened and perhaps never will.

In a chat with Dave Orchard at XML 2007, I suggested that the evolution of browser User-Agent strings might be a useful model, since it developed in response to the actual kinds of problems that versioning needs to solve.

Indeed, the idea seemed familiar in my mind. In fact, I posted it here, in Feb 2004. The remainder of this posting republishes it with minor edits for clarity:

‘Standard practice’ of x.y.z versioning, where x is major, y is minor, and z is sub-minor (often build number) is not best practice. If you look at how systems actually evolve over time, a more ‘organic’ approach is needed.

For example, look at how browser user agent strings have evolved. Take this, for example:

Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows 98) Opera 7.02 [en]

Wow, if detection code is looking for a substring of “Mozilla” or “Mozilla/4” or “Mozilla/4.0”, or “MSIE” or “MSIE 6” or “MSIE 6.0” or “Opera” or “Opera 7” or “Opera 7.0” or “Opera 7.0.2” it will hit. If you look at the kind of code to determine what version of Windows is running, or the exact make and model of processor, you will see a similar pattern.

Since this is the way of nature, don’t fight it with artificial, fixed-length major.minor versioning. Embrace organically growing versions.

The first version of anything should be “1.” including the dot. (letters will work in practice too) All sample code, etc. that checks versions must stop at the first dot character; anything beyond that is on a ‘needs-to-know’ basis. A check-this-version API would be extremely useful, though a basic string compare SHOULD work.

Then, whenever revisions come out, the designers need to decide if the revision is compatible or not. A completely incompatible release would then be “2.”. However, a compatible release would be “1.1.”. All version checking code would continue to look only up to the first dot, unless it has a specific reason to need more details. Then it can go up to the 2nd dot, no more.

Now, even code that is expecting version “1.1.” will work fine with “1.1.1.” or 1.1.86.” or “1.1.2.1.42.1.536.”.

Every new release needs to decide (and explicitly encode in the version string) how compatible it is with the entire tree of earlier versions.

Now, as long as compatible revisions keep coming out, the version string gets longer and longer. This is the key benefit, and why fixed-field version numbers are so inflexible. (and why you get silly things like Samba reporting itself as “Windows 4.9”).

One possible enhancement, purely to make version numbers look more like what folks are used to, is to allow a superfluous zero at the end. This the first version is 1.0, followed by 1.1.0, 1.1.1.0, (this next one made an incompatible change) 1.2.0, and so on.

So if a document needs to self-version at all, perhaps a scheme like this should be used? -m

Monday, December 31st, 2007

XPath puzzler: solution

Thanks to all the folks who showed interest in this little XPath puzzler published here a few weeks ago. Some asked to see the dataset, but I’m not able to release it at this time (but ask me again in 3 months).

Turns out it was a combination of two bugs, one mine, one somebody else’s. Careful observers noted that I wasn’t using any namespace prefixes in the XPath, and since I did specify that it was XPath 1.0, that technically rules out XHTML as the source language. Like nearly all XML I work with these days, the first thing I do is strip off the namespaces to make it easier to work with. Bug #1 was that in a few cases, the namespaces didn’t get stripped.

Bug #2 was in the XPath engine itself. Which one? Uh, whatever one ships with the “XPath” plugin for JEdit. It’s hard to tell directly, but I think it might be an older version of Xalan-J. In the case of the expression //meta, it properly located only those elements part of no namespace. But in the case of //meta/@property, it was including all the nodes that would have been selected by //*[local-name(.)='meta']/@property. Hence, a larger number of returned nodes.

Confusing? You bet!  -m

P.S. WebPath would not have this problem, since in the default mode it matches local-names only to begin with.

Saturday, December 29th, 2007

SCO Group, long delisted from reality; Nasdaq follows suit

The new ticker symbol is SCOXQ.PK, as in “pink sheet”. From the soaring heights of the $20s, it’s now under a dime per share, as bankruptcy proceedings move forward and their attempt to charge fees for all Linux users continues to crumble. Serves ’em right. -m

Tuesday, December 25th, 2007

Thanks, Amazon!

I visited the Amazon home page today to find this:

Amazon suggested purchase: Uranium!

Thanks, Amazon! Now sit back down, you’re scaring me. -m

Monday, December 24th, 2007

OLPC is here

I’m taking some time off from work to relax a bit. And just in time for that, my OLPC arrived. Check out the photoset on Flickr. It’s an impressive little machine, and I’m very happy to have got this instead of a Kindle. :)

-m

Friday, December 21st, 2007

XML 2007 buzz: XForms 1.1

One whole evening of the program was devoted to XForms, focused around the new 1.1 Candidate Recommendation. I admit that some of the early 1.1 drafts gave me pause, but these guys did a good job cleaning up some of the dim corners and adding the right features in the right places. This is worth a careful look. -m

Friday, December 21st, 2007

XML 2007 buzz: Hadoop

OK, the majority of the buzz came from my talk, where I strongly encouraged folks to take a look at Hadoop. This article seems to be saying much the same things. If you’re curious about the future of distributed computation and storage, it’s worth a look. -m

Sunday, December 16th, 2007

Slides from XML 2007: WebPath: Querying the Web as XML

Here’s the slides from my presentation at XML 2007, dealing with an implementation of XPath 2.0 in Python. I hope to have even more news in this area soon.

WebPath (html)

WebPath (OpenDocument, 4.7 megs)

Did you notice the OpenOffice has nice slide export, that generates both graphically-accurate slides and highly indexable and accessible text versons? -m

Saturday, December 15th, 2007

XPath puzzler

While I’ve got your attention, here’s an XPath (1.0) puzzler. I have an RDFa dataset compiled from various and sundry sources. It’s all wrapped up in a single XML file. I run this XPath to see how many meta elements are present: //meta and it returns a node-set of size 762. Now, I want to see how many property elements are present, so I run the query: //meta/@property and it returns a node-set of size 764. How is it that the second node-set can be bigger than the first? -m

Saturday, December 15th, 2007

XML spell check

Surely somebody has implemented this in at least one tool.

In a text editor, I come across a misspelled close tag like </xsl:stylsheet>. My editor highlights the line as an error, which is is, not matching the start tag and all. Why can’t it go the extra step and give me the same kind of interface as I get for misspelled words, which an easy option to repair the spelling? This seems like a much simpler problem than all the hairy cases around human-language spell check…

So, what tools already do this today? -m

Friday, December 7th, 2007

MST3K is back (sort of)

Here’s the best news I’ve had all day: the creators of MST3K are reuniting under a new effort, called Cinematic Titanic, the firstfruits of which are due out this coming Monday.

I’ve been a long time fan of MST3K, watched most of the early episodes on UHF in Minnesota. And for the record, I like Joel better than Mike. :) -m

Thursday, December 6th, 2007

Lists in RDFa?

I came away from the XML 2007 conference with lots of new ideas and inspirations. I’ll write some postings about individual technologies in the coming days.

But for now, another RDFa question. If I need to represent a list, what is the best way to do it? Does it differ between ordered and unordered lists? Let’s take some concrete examples, say a shopping list and an (ordered) todo list. How would you do it? -m

P.S. What about multi-level lists?

MicahLogic is Stephen Fry proof thanks to caching by WP Super Cache