Archive for the 'xml' Category
Friday, March 5th, 2010
The xml-dev mailing list has been discussing XLink 1.1, which after a long quiet period popped up as a “Proposed Recommendation”, which means that a largely procedural vote is is all that stands between the document becoming a full W3C Recommendation. (The previous two revisions of the document date to 2008 and 2006, respectively)
In 2005 I called continued development of XLink a “reanimated spectre”. But even earlier, in 2002 I wrote one of the rare fiction pieces on xml.com, A Hyperlink Offering, which using the format of a Carrollian dialog between Tortoise and Achilles, explained a few of the problems with the XLink specification. It ended with this:
What if the W3C pushed for Working Groups to use a future XLink, just not XLink 1.0?
Indeed, this version has minor improvements. In particular, “simple” links are simpler now–you can drop an xlink:href attribute where you please and it’s now legit. The spec used to REQUIRE additional xlink:type=”simple” attributes all over the place. But it’s still awkward to use for multi-ended links, and now even farther away from the mainstream hyperlinking aspects of HTML5, which for all of its faults, embodies the grossly predominant description of linking on the web.
So in many ways, my longstanding disappointment with XLink is that it only ever became a tiny sliver of what it could have been. Dashed visions of Xanadu dance through my head. -m
Permalink
Filed under annoyance, intentional web, xml
Monday, February 22nd, 2010
Are you coming? Link. It starts on May 4 (Star Wars day!) at the InterContinental Hotel in San Francisco. Guest speakers include Chris Anderson, Editor-in-Chief of Wired and Michelle Manafy, Editor-in-Chief of EContent magazine.
Early bird registration ends Feb 28. -m
Permalink
Filed under Mark Logic, XQuery, announcement, xml
Tuesday, February 16th, 2010
As heard from my friend and Mark Logic contractor Ryan Grimm. -m
Permalink
Filed under annoyance, xml
Wednesday, October 7th, 2009
Fed Thread is a front end for the newly XMLified Federal Register. Why is this a big deal? It’s a daily publication of the goings-on of the US government. It’s a primary source for all kinds of things that normally only get rehashed through news organizations. And it is bulky–nobody can read through it on a regular basis. A yearly subscription (printed) would cost nearly $1000 and fill over 80,000 pages.
Having it in XML enables all kinds of searching, syndication, and annotation via flexible front ends like this one. Yay for transparency. -m
Permalink
Filed under IPR, announcement, search, xml
Tuesday, August 18th, 2009
All the input/output/port stuff in XProc seemed incomprehensible to me until I recognized something simple. Every time you see a <pipe> element, read it as “comes from”. For example
<p:output port="result">
<p:pipe step="validated" port="result"/>
</p:output>
reads as ‘output to the “result” port comes from the port “result” on step “validated”‘ and
<p:input port="source">
<p:pipe step="included" port="result"/>
</p:input>
reads as ‘input for the “source” port comes from the port “result” on step “included”‘. If you keep this in mind it all makes much more sense.
More collected Geek Thoughts at http://geekthoughts.info.
Permalink
Filed under geekthoughts, standards, xml
Wednesday, August 5th, 2009
On this comic’s panel 9 describes XHTML 1.1 conformance as:
the added unrealistic demand that documents must be served with an XML mime-type
I can understand this viewpoint. XHTML 1.1 is a massively misunderstood spec, particularly around the modularization angle. But because of IE, it’s pretty rare to see the XHTML media-type in use on the open web. Later, panel 23 or thereabouts:
If you want, you can even serve your documents as application/xhtml+xml, instantly transforming them from HTML 5 to XHTML 5.
Why the shift in tone? What makes serving the XML media type more realistic in the HTML 5 case? IE? Nope, still doesn’t work. I’ve observed this same shift in perspective from multiple people involved in the HTML5 work, and it baffles me. In XHTML 1.1 it’s a ridiculous demand showing how out of touch the authors were with reality. In HTML5 the exact same requirement is a brilliant solution, wink, wink, nudge, nudge.
As it stands now, the (X)HTML5 situation demotes XHTML to the backwaters of the web. Which is pretty far from “Long Live XHTML…”, as the comic concludes. Remember when X stood for Extensible?
-m
Permalink
Filed under annoyance, browsers, xml
Friday, July 24th, 2009
I’m noodling around with requirements and exploring existing work toward a solution for “decentralized extensability” on xml-dev, particularly for HTML. The notion of “Java-style” syntax, with reverse dns names and all, has come up many times in the context of these kinds of discussions, but AFAICT never been fully fleshed out. This is ongoing, slowly, in available time–which as been a post or two per week. (In case there is any doubt, this is a spare-time effort not connected with my employer)
Check it out and add your knowledge to the thread. -m
Permalink
Filed under announcement, intentional web, xml
Saturday, July 11th, 2009
Several folks have been pointing to this article which has some choice quotes along the lines of
If we examine the nontrivial-sized DBMS markets, it turns out that current relational DBMSs can be beaten by approximately a factor of 50 in most any market I can think of.
My employer is specifically mentioned:
Even in XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors
And it’s true, but don’t take my word for it. :-) The DBMS world has lots of inertia, but don’t let that blind you to seeing another way to solve problems. Particularly if that extra 50x matters. -m
Permalink
Filed under Mark Logic, commercialism, xml
Tuesday, July 7th, 2009
Come join me at the Demo Jam at Balisage this year. August 11 at 6:30 pm. There will be lots of cool demos, judged by audience participation. I’d love to see you there. -m
Permalink
Filed under Mark Logic, announcement, xml
Wednesday, June 3rd, 2009
Balisage, formerly Extreme Markup, is the kind of conference I’ve always wanted to attend.
Historically my employers have been not quite enough involved in the deep kinds of topics at this conference (or too cash-strapped, but let’s not go there) to justify spending a week on the road. So I’m glad that’s no longer the case: Mark Logic is sponsoring the conference this year. I’m looking forward to the show, and since I’m not speaking, I might be able to relax a little and soak in some of the knowledge.
See you there! -m
Permalink
Filed under Mark Logic, announcement, xml
Tuesday, March 24th, 2009
An interesting proposal from Liam Quin, relating to the need for huge rafts of namespace declarations on mixed namespace documents.
In practice, though, almost all elements [in the given example] are going to be unambiguous if you take their ancestors into account, and attributes too.
Amen. I’ve been saying things like this for five years now. Look at any introductory text on XML, and the example used to show the need for namespaces will be embarrassingly contrived. That’s not a dig against authors, it’s a dig against over-engineered solutions to non-problems.
-m
Permalink
Filed under annoyance, standards, xml
Wednesday, February 25th, 2009
This is fantastic. Brian May (yes THAT Brian May) not only blogs, but talks about all kinds of challenging subjects. Like how and why space and time are linked. Worth a read. -m
Permalink
Filed under everythingismiscellaneous, writing, xml
Thursday, December 11th, 2008
Greg Watson, IT Specialist, Defense Intelligence Agency Missile and Space Intelligence Center (apparently it IS rocket science). I installed eXist last night to follow along with the talk.
“If you have a larger dataset, eXist may not be the best choice.” Recommended reading: XQuery by Priscilla Walmsley, XQuery wikibook.
Download and install. Needs a full JDK (Mac includes this already in /Library/Java/Home), a mere JRE is insufficient. Start up with bin/startup.sh.
eXist-specific useful functions: request:get-parameter() from the URI query string. transform:transform() function invokes XSLT from within XQuery.
Example uses doc() to fetch an external URL of RSS, check individual items with contains(). Every example is a fully-formed, click-on-a-link-to-run program.
XQuery and PHP: reallly basic integration with simplexml_load_file($myXQueryURL).
Loading scripts into eXist: He uses XML Spy. eXist has a Jaa Web Start admin client.
Q&A: How bit is too big? Maybe 10-20-30 thousand docs. Generate indexes? Yes.
-m
(Production note: somehow, this one didn’t get published live. Now it is.)
Permalink
Filed under XQuery, xml
Tuesday, December 9th, 2008
Wendell Piez, Mulberry Technologies
Assertion-based schema language. A way to test XML documents. Rule-based validation language. Cool report generator. Good for capturing edge cases.
Same architecture as XSLT. (Schematron specifies, does not perform)
<schema xmlns="http://purl.cclc.org/dsdl/schematron">
<title>Check sections 12/07</title>
<pattern id="section-check">
<rule context="section">
<assert test="title">This section has no title</assert>
<report test="p">This section has paragraphs</report>
...
Demo. OxygenXML has support. Assert vs. Report – essentially opposites. Assert means “tell me if this if false”. Report means “tell me if this is true”.
“Almost as if Schematron is a harness for XPath testing.”
More examples:
<rule context="note">
<report test="ancestor::note">A note appears in a note. OK?</report>
</rule>
Binding: Default is XSLT 1, but flexible enough to allow other query langauges via attribute @queryBinding at the top. Many processors allow mix-and-match between XSLT and Schematron. Examples showing just that.
Some tests can be very useful:
test=”every $line in tokenize(., $newline) satisfies string-length($line) le 72″
Q: What if the destination is not a human, but another part of a pipeline? Varies by implementation, but SVRL is standardized as an annex in the ISO spec, part of DSDL.
Use as little or as much as you want, at different times in the document lifecycle. “Schematron is a feather duster that reaches areas other schema languages cannot.” – Rick Jelliffe
As time permits section of the talk:
Other top-level elements: title, pattern, ns, let, p, include, phase, diagnostics.
-m
Permalink
Filed under languages, xml, xpath
Tuesday, December 9th, 2008
Bob DuCharme, Innodata Isogen
Content analysis: why? You’ve “inherited” content. Need to save time or effort.
Handy tool 1: “sort”. As in the Unix command line tool. (Even Windows)
Handy tool 2: “uniq -c” (flag -c means include counts)
Elsevier contest: interface for reading journals. Download a bunch of articles, and see what’s all in there.
Handy tool 3: Trang. Schema language converter. But can infer a schema from one or more input documents. Concat all sample documents under one root, and infer–this gives a list of all doctypes in use.
trang article.dtd article.rng
trang issueContents.xml issueContents.rng
saxon article.rng compareElsRNG.xsl | sort > compareElsRNG.out
compareElsRNG.xsl has text mode output, ignores input text nodes, and checks whether the RNG has references to each element, outputing “Yes: elementname” or “No: elemenname”. (which gets sorted in step 3)
Helps ferret out places where the schema says 40 different child elements are possible but in practice only 4 are used.
Handy tool 4: James Clark’s sx, converts SGML to XML.
Another stylesheet counts elements producing a histogram. [Ed. I would do this in XQuery in CQ.] Again, can help prioritize parts of the XML to use first. Similar logic for parent/child counts; where @id gets used; find all values for a particular attribute.
Another stylesheet goes through multiple converted-to-rng schemas, looking for common substructure. Lists generated this way can be pulled into a stylesheet.
Analyze a SGML DTD? dtd2html -> tidy -> XSLT. Clients like reports (especially spreadsheets). The is more like lego bricks.
-m
Permalink
Filed under everythingismiscellaneous, patternalia, standards, xml
Tuesday, December 9th, 2008
Mark Birbeck, Web Backplane.
Problem statement: You shouldn’t have to “scrape” government sites.
Solution: RDFa
<div typeof="arg:Vacancy">
Job title: <span property="dc:title">Assistant Officer</span>
Description: <span property="dc:description">To analyse... </span>
</div>
This resolves to two full RDF triples. No separate feeds, uses existing publishing systems. Two of the most ambitious RDFa projects are taking place in the UK. Flexible arrangements possible.
Steps: 1. Create vocabulary. 2. Create demo. 3. Evangelize.
Vocabulary under Google Code: Argot Hub. Reuse terms (dc:title, foaf:name) where possible, developed in public.
Demos: Yahoo! SearchMonkey, (good for helping not-so-technical people to “get it”) then a Drupal hosted one (a little more control).
Next level, a new server that aggregates specific info (like all job openeings for Electricians), incuding geocoding. Ubiquity RDFa helps here.
Evangelizing: Detailed tutorials. Drupal code will go open source. More opportunities with companies currently screen-scrapting. More info @ rdfa.info.
Q&A: Asking about predicate overloading (dc:title). A general SemWeb issue. Context helps. Is RDFa tied to HTML? No, SearchMonkey itself uses RDFa–it’s just attributes.
-m
Permalink
Filed under everythingismiscellaneous, intentional web, metadata, xml
Tuesday, December 9th, 2008
Priscilla Walmsley, Datypic.
“I feel like crying every time I have to go back to 1.0.” Normally this is a full-day course. Familiarity with XSLT 1.0 assumed here. Venn diagram… Much of what people think of as “XQuery” is actually XPath 2.0.
XPath differences: root node -> “document node”. Namespace nodes, axis are deprecated. More atomic types, based on XML Schema. Node-set -> sequence. Path steps can be expressions, like product/(if (desc) then desc else name). Last step can return an atomic value, like sum(//item/(@price * @qty)).
Comparison operators apply to strings, dates, times. (Backwards compatibility note: comparing strings now is done by Unicode code point, not by conversion to number() as in XPath 1.0). Arithmetic possible on dates, durations. Missing value returns empty sequence rather than NaN.
(a,b) to concat sequences. New operators: idiv, union, intersect, except (latter 3 for nodes only)
<xsl:for-each select="1 to $count"> is handy. Operators << and >> test ‘precedes’ and ‘follows’ based on document order. Operator ‘is’ tests node identity.
Statement if/then/else is a more compact xsl:choose. Simplified FLWOR (only one for, no let or where).
Useful functions: ends-with(), string-join(), current-date(), distinct-values(), deep-equal().
From XPath to XSLT: <xsl:for-each-group> with current-group() and current-grouping-key(). Useful for turning a flat document (like HTML with h1, h2, etc. into nested structure. group-starting-with=”html:h1″, etc. The instruction <xsl:function> allows defining a new function. Major benefits in reuse, clarity, and handling recursion. Custom functions can be called from more places, like @select, @group-by, @match, but have the same expressive power of a named template.
Regular expressions: some XPath functions matches(), tokenize(), replace() (including subexpressions). <xsl:analyze-string> splits a string into matching and non-matching parts, handled separately in <xsl:matching-substring> and <xsl:non-matching-substring> child elements and regex-group().
I/O: Instruction <xsl:result-document> allows multiple output files. unparsed-text() allows input of non-XML documents (particularly in conjunction with regex).
Do I have to pay attention to types? “Usually, no.” BUT schemas can help catch errors, improve performance, and open new avenues of processing (like matching a template based on a schema-type).
Odds and ends: tunneling parameters (don’t have to repeat all the params for named templates), multiple modes, @select in more places, @separator attribute on xsl:attribute and xsl:value-of.
Brief Q&A: No test suite available. Probably better for new users to jump straight into 2.0. But going back to 1.0 is still painful. -m
Permalink
Filed under xml, xpath
Monday, December 8th, 2008
Overheard at XML 2008: “Wow, it’s a good thing Mark Logic sponosred, otherwise nobody would be here.” (there were only five tables in the expo area.)
Overseen on the XML 2008 schedule: only one mention of XQuery, and that’s in relation to eXist, not the aforementioned sponsor.
This conference does have a different feel to it. Is XML at the ASCII-tipping-point, where it becomes so obvious that conferences aren’t needed? -m
Permalink
Filed under Mark Logic, xml
Monday, December 8th, 2008
I was on the panel with Bob DuCharme, Frank Miller, and Evan Lenz discussing content authoring, from DITA to DocBook with some WordML sprinkled in for good measure. It was a good discussion, nothing earth-shaking. This session was laptopless, so I don’t have any significant notes. -m
Permalink
Filed under infopath, software, writing, xml
Monday, December 8th, 2008
Roy Amodeo, Stilo.
Only 4 people in attendance when the talk starts. Quick overview of DITA. Transclusion (conref), topic-level maps, specialization, metadata-based filtering. XML and SGML flavors available. Open Toolkit has been a big part of DITA’s success. Replacable components (XSLT and FO). Many editing environments and CMS’s include this.
Topic-based publishing. Works best with many small, fairly independent topics. How well does the Open Toolkit work when pushing the boundaries? DITA stress test. Raising file size increases processing time faster than linear. Average file size 300k crashed. For overall number of files, roughly linear progression, but still blows up at large volumes.
Enter the OmniMark DITA Accelerator. Behavior modeled after toolkit, but minus the limits (streaming). Uses referents (placeholders left in place, filled in later; 2-pass algorithm). Base speed improvement 4X. Works well past where the Toolkit runs out of memory. Because DITA is standardized, the accelerated implementation can be easily plugged in.
Usability: XSLT exists somewhat uneasily with DITA. DITA Accelerator augments OmniMark with DITA-specific rules.
Conclusion: Standards are about choice of tools. (But how many OmniMark implementations are there?) Still, this makes me think I should check out the OmniMark language. I remain skeptical on DITA.
-m
Permalink
Filed under xml
Monday, December 8th, 2008
Delivered by Pradeep Jain, Ictect Inc. He has a handout available: “Intelligent Content Plug-In for Microsoft Word”, though it’s not obvious from the program that Word is involved.
What is content modeling? “Getting inside of” content, semantics, from there syntax and XML tagging.
Challenges: art vs. science, tacit vs. written documentation, future-proofing, technical vs. business communication, flexibility vs. stability. Getting knowledge workers to participate. Correctness (an emphasis of Ictect).
What is correctness of a model? More than valid XML. Litmus test: SME says “yep, I think you got it!”. But some machine-generated tests are possible.
Shows a Word doc with different kinds of bibliographic references (articles vs. books). Shows Schema code not visible from the back of the room. Word plug-in displays sidebar with a “convert” function, with several possible Schemas available to work against. Automatically detected sections in the document and added <section> elements. Progressively more complex examples of generated markup.
It seems like this is actually a pretty clever application, though it is hard to tell from this talk. -m
Permalink
Filed under xml
Monday, December 8th, 2008
I will talk about one or more sessions from XML 2008 here.
Mark Birbeck of Web Backplane talking about Ubiquity XForms.
Browsers are slow to adopt new standards. Ajax libraries have attempted to work around this. Lots of experimentation which is both good and bad, but at least has legitimzed extensions to browsers. JavaScript is the assembly language of the web.
Ubiquity XForms is part of a library, which wil also include RDFa and SMIL. Initially based on YUI, but in theory sould be adaptable to other libraries like jQuery.
Declarative: tools for creation and validation. Easier to read. Ajax libraries are approaching the level of being their own language anyway, so might as well take advantage of a standard.
Example: setting the “inner value” of a span: <span value="now()"></span>.
Script can do this easily: onclick="this.innerHTML = Date().toLocaleString();" But crosses the line from semantics to specific behavior. The previous one is exactly how xforms:output works.
Another exapmple: tooltips. Breaks down to onmouseover, onmouseout event handlers, show and hide. A jQuery-like approach can search the document for all tooltip elements and add the needed handlers, avoiding explicit behavioral code. This is the essence of Ubiquity XForms (and in fact XForms itself).
Patterns like these compose under XForms. A button (xf:trigger) or any form control can easily have a tooltip (xf:hint). These are all regular elements, stylable with CSS, accesible via DOM, and so forth. Specific events (like xforms-hint) fire for specific events, and a spreadsheet-like engine can update interdependencies.
Question: Is this client-side? A: Yes, all running within Firefox. The entire presentation is one XForms document.
Demo: a range control with class=”geolocation” that displays as a map w/ Google Maps integration. The Ubiquity XForms library contains many such extensibility points.
Summary: Why? Simple, declarative. Not a programming language. Speeds up development. Validatable. Link: ubiquity.googlecode.com.
Q&A: Rich text? Not yet, but not hard (especially with YUI). Formally XForms compliant? Very nearly 1.1 conforming.
-m
Permalink
Filed under XForms, browsers, intentional web, xml
Thursday, July 10th, 2008
Traffic ain’t what it used to be there. But since I’m at a core xml technology company, it makes sense to participate again. Now, are there any topics left that haven’t been hashed to death? (hint: yes) -m
Permalink
Filed under xml
Wednesday, July 9th, 2008
Today Google announced Protocol Buffers, described as “think XML, but smaller, faster, and simpler“. Language bindings for C++, Java, and Python. Oddly not even a whisper about JSON, which is a much more apt comparison. And along with that, no JavaScript implementation. So why the omission?
My guess is that it wouldn’t compare that favorably with JSON. The extra needed compile step is a hassle, and doesn’t give enough of a relative benefit for Ajax applications. But perhaps this will unleash a torrent of people asking for ‘binary JSON’. OK, maybe not… -m
Permalink
Filed under google, xml
Wednesday, May 21st, 2008
If you are used to XSLT 1.0 and XForms, you see { $book/bk:title } and think nothing of it. XSLT 1.0 calls the curly-brace construct an Attribute Value Template, which is pretty descriptive of where it’s used. Always in an attribute, always converted into a string, even if you are actually pointing to an element.
In XQuery, though, the curly-brace construct can be used in many different places. Depending on the context, the above code might well insert a bk:title element into your output. The proper thing to do, of course, is { $book/bk:title/text() }. Many XSLT and XForms authors would omit the extra text() selector as superfluous, but in XQuery it matters.
What’s worse, depending on your browser, you might not see any output on the page within a <bk:title> element (or a title element of any namespace). Caveat browser! -m
-m
Permalink
Filed under Mark Logic, XForms, XQuery, annoyance, browsers, firefox, xml
Friday, January 25th, 2008
I’ve taken this opportunity to ditch CVS on all my existing Sourceforge projects (pyxmlwiki, xfv) while setting up my newest project. Here’s the browable subversion source. Have at it.
Where should you start with this code? Step zero, if you haven’t already, is to look through my XML 2007 slides on my site. First thing is to grab a copy of PLY, which is a dependency. Then with all these files in your current directory, run python with no parameters. At the interpreter prompt type import demo then demo.demo1(), demo.demo2(), and so on. This will give you a feel for how the system works. Look at the source of demo.py to see how it works at the high level.
To actually get into the code, I suggest opening webpath.py and scrolling down to the end, where a large series of unit tests begins. Tracing through these will be (I hope!) instructive on how the various details of the engine are put together.
There are many missing pieces (a few intentionally so). So have a look around the code and start thinking about what you could do with it. One thing I would love to have happen soon is getting rid of minidom, replacing it with something more robust.
If you want developer access on Sourceforge, drop me a note with your sf username. -m
Permalink
Filed under announcement, xml, xpath
Thursday, January 24th, 2008

WebPath, my experimental XPath 2.0 engine in Python is now an open source project with a liberal BSD license. I originally developed this during a Yahoo! Hack Day, and now I get to announce it during another Hack Day. Seems appropriate.
The focus of WebPath was rapid development and providing an experimental platform. There remains tons of potential work left to do on it…watch this space for continued discussion. I’d like to call out special thanks to the Yahoo! management for supporting me on this, and to Douglas Crockford for turning me on to Top Down Operator Precedence parsers. Have a look at the code. You might be pleasantly surprised at how small and simple a basic XPath 2 engine can be. So, who’s up for some XPath hacking?
Code download. (Coming to SourceForge with CVS, etc., in however many days it takes them to approve a new project) I hope this inspires more developers to work on similar projects, or better yet, on this one! -m
Permalink
Filed under announcement, intentional web, python, xml, xpath, yahoo
Monday, December 31st, 2007
This blog page at the W3C discusses the TAG finding that a data format specification SHOULD provide for version information, specifically reconsidering that suggestion. As a few data points, XML 1.1 (with explicit version identifiers) is something of a non-starter, while Atom (without explicit version identifiers) is doing OK so far–though a significant revision to the core hasn’t happened and perhaps never will.
In a chat with Dave Orchard at XML 2007, I suggested that the evolution of browser User-Agent strings might be a useful model, since it developed in response to the actual kinds of problems that versioning needs to solve.
Indeed, the idea seemed familiar in my mind. In fact, I posted it here, in Feb 2004. The remainder of this posting republishes it with minor edits for clarity:
‘Standard practice’ of x.y.z versioning, where x is major, y is minor, and z is sub-minor (often build number) is not best practice. If you look at how systems actually evolve over time, a more ‘organic’ approach is needed.
For example, look at how browser user agent strings have evolved. Take this, for example:
Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows 98) Opera 7.02 [en]
Wow, if detection code is looking for a substring of “Mozilla” or “Mozilla/4″ or “Mozilla/4.0″, or “MSIE” or “MSIE 6″ or “MSIE 6.0″ or “Opera” or “Opera 7″ or “Opera 7.0″ or “Opera 7.0.2″ it will hit. If you look at the kind of code to determine what version of Windows is running, or the exact make and model of processor, you will see a similar pattern.
Since this is the way of nature, don’t fight it with artificial, fixed-length major.minor versioning. Embrace organically growing versions.
The first version of anything should be “1.” including the dot. (letters will work in practice too) All sample code, etc. that checks versions must stop at the first dot character; anything beyond that is on a ‘needs-to-know’ basis. A check-this-version API would be extremely useful, though a basic string compare SHOULD work.
Then, whenever revisions come out, the designers need to decide if the revision is compatible or not. A completely incompatible release would then be “2.”. However, a compatible release would be “1.1.”. All version checking code would continue to look only up to the first dot, unless it has a specific reason to need more details. Then it can go up to the 2nd dot, no more.
Now, even code that is expecting version “1.1.” will work fine with “1.1.1.” or 1.1.86.” or “1.1.2.1.42.1.536.”.
Every new release needs to decide (and explicitly encode in the version string) how compatible it is with the entire tree of earlier versions.
Now, as long as compatible revisions keep coming out, the version string gets longer and longer. This is the key benefit, and why fixed-field version numbers are so inflexible. (and why you get silly things like Samba reporting itself as “Windows 4.9″).
One possible enhancement, purely to make version numbers look more like what folks are used to, is to allow a superfluous zero at the end. This the first version is 1.0, followed by 1.1.0, 1.1.1.0, (this next one made an incompatible change) 1.2.0, and so on.
So if a document needs to self-version at all, perhaps a scheme like this should be used? -m
Permalink
Filed under annoyance, patternalia, standards, xml
Monday, December 31st, 2007
Thanks to all the folks who showed interest in this little XPath puzzler published here a few weeks ago. Some asked to see the dataset, but I’m not able to release it at this time (but ask me again in 3 months).
Turns out it was a combination of two bugs, one mine, one somebody else’s. Careful observers noted that I wasn’t using any namespace prefixes in the XPath, and since I did specify that it was XPath 1.0, that technically rules out XHTML as the source language. Like nearly all XML I work with these days, the first thing I do is strip off the namespaces to make it easier to work with. Bug #1 was that in a few cases, the namespaces didn’t get stripped.
Bug #2 was in the XPath engine itself. Which one? Uh, whatever one ships with the “XPath” plugin for JEdit. It’s hard to tell directly, but I think it might be an older version of Xalan-J. In the case of the expression //meta, it properly located only those elements part of no namespace. But in the case of //meta/@property, it was including all the nodes that would have been selected by //*[local-name(.)='meta']/@property. Hence, a larger number of returned nodes.
Confusing? You bet! -m
P.S. WebPath would not have this problem, since in the default mode it matches local-names only to begin with.
Permalink
Filed under annoyance, xml, xpath, yahoo
Friday, December 21st, 2007
One whole evening of the program was devoted to XForms, focused around the new 1.1 Candidate Recommendation. I admit that some of the early 1.1 drafts gave me pause, but these guys did a good job cleaning up some of the dim corners and adding the right features in the right places. This is worth a careful look. -m
Permalink
Filed under XForms, browsers, software, standards, trends, xml