Archive for the 'standards' Category

Tuesday, May 12th, 2009

Google Rich Snippets powered by RDFa

The new feature called rich snippets shows that SearchMonkey has caught the eye of the 800 pound gorilla. Many of the same microformats and RDF vocabularies are supported. It seems increasingly inevitable that RDFa will catch on, no matter what the HTML5 group thinks. -m

Friday, May 8th, 2009

HTML: The Markup Language marks a new beginning

If you haven’t already, check out HTML: The Markup Langauge. Besides being a cool new recursive acronym for HTML, it is a reasonably-sane document. Also worth a look: Differences between HTML4 and HTML5. Many of the ideas from XHTML 2 (of which I was an editor at one point) are there.

I think it’s time for the W3C to show some tough love and force the two (X)HTML Working Groups together.

A while ago, I argued that the existence of both Flickr and Yahoo! Photos as an effective two-pronged strategy. Look how that worked out–Y! Photos is permanently shuttered. While there were benefits including a broader potential reach, in aggregate the benefits didn’t amount to more than the immense cost of having two parallel efforts. Same here. -m

Sunday, April 26th, 2009

XForms validator: disabling Google ads, no more blank pages

Thanks to those who wrote in with bug reports about the XForms Validator: something changed recently and made the inserted Google Ads script confuse browsers, resulting in a blank page where you’d expect results. I’ve turned off the response-page ads, which were only getting in the way, and the problem seems to have vanished. Carry on. :-) -m

Friday, April 24th, 2009

EXPath.org

I’ve always thought that the EXSLT model of developing community specifications worked well. Now a critical mass of folks has come together on a similar effort, aimed at providing extensions usable in XPath 2.0, XSLT 2.0, XQuery, and other XPath-based languages like XProc. Maybe even XForms.

Check it out, subscribe to the mailing list, and help out if you can. -m

Tuesday, April 7th, 2009

GPL’s Cloudy Future

I enjoyed this post, from Jeremy Allison as it turns out. It talks about how GPL software is “the new BSD” when it comes to cloud computing, since redistribuion of the software doesn’t happen, and thus doesn’t trigger the relevant clauses of the GPL. Any old company can use, re-use, and modify the software without sharing the code in the original spirit of the license. The community’s response–something I need to keep a closer eye on–is the AGPL, or Affero license. It works similarly to the GPL, but is triggered by remote use of the software, not just distribution, preserving the work’s copylefedness even in cloud computing situations. -m

Tuesday, March 24th, 2009

XIN: Implicit namespaces

An interesting proposal from Liam Quin, relating to the need for huge rafts of namespace declarations on mixed namespace documents.

In practice, though, almost all elements [in the given example] are going to be unambiguous if you take their ancestors into account, and attributes too.

Amen. I’ve been saying things like this for five years now. Look at any introductory text on XML, and the example used to show the need for namespaces will be embarrassingly contrived. That’s not a dig against authors, it’s a dig against over-engineered solutions to non-problems.

-m

Wednesday, February 25th, 2009

Brian May explains relativity

This is fantastic. Brian May (yes THAT Brian May) not only blogs, but talks about all kinds of challenging subjects. Like how and why space and time are linked. Worth a read. -m

Monday, February 16th, 2009

Crane Softwrights adds XQuery training

From the company home page, reknown XSLT trainer and friend G. Ken Holman has expanded his offerings to include XQuery training. The first such session is March 16-20, alongside XML Prague.

I’ve always thought there is great power in having both XSLT and XQuery tools at one’s disposal. I’ve seen people tend to polarize into one camp or the other, but in truth there is a lot of common ground, as well as cases where the right technology makes for a much more elegant solution. So learning both is easier than it seems, and more useful than it seems.

If you will be around the conference, take a look at the syllabus. I’m curious to see others’ reactions toward the combined XSLT + XQuery toolset. -m

Wednesday, February 4th, 2009

XSLTForms beta

XSLTForms, the cross-browser XForms engine (written about previously) that makes ingenious use of built-in XSLT processing, reached an important milestone today, with a beta release. Tons of bug fixes and additional support for CSS and Schema.

If you’re thinking about getting involved with XForms and are looking for something small and approachable, give it a look. -m

Monday, December 22nd, 2008

XForms for HTML

I’ve heard not a peep about this before, but here it is: XForms for HTML. Let’s read this together. Feel free to drop any comments or observations below. -m

Friday, December 19th, 2008

XSLTForms looks promising

Implementing client-side forms libraries is, and has been, all the rage. I’ve seen Mozquito Factory do amazing things in Netscape 4, Technical Pursuits TIBET on the perpetual verge of release, UGO, and others. In a more recent time scale, Ubiquity XForms impresses me and many others, and it has the right combination of funding and willing developers.

From a comment on my recent posting about Ubiquity XForms, I was pleased to learn about XSLTforms, a rebirth of AjaxForms, which I thought well of two years ago until its developer mysteriously left the project. But Software Libre lives on, and a new developer has taken over, this time using client-side XSLT instead of server-side Java to do the first pass of processing. Given the strong foundation, the project has come a long way in a short time, and already runs against a wide array of non-trivial examples. Check it out.

I’d like to hear what others think about this project. -m

Thursday, December 11th, 2008

XML 2008 liveblog: Introduction to eXist and XQuery

Greg Watson, IT Specialist, Defense Intelligence Agency Missile and Space Intelligence Center (apparently it IS rocket science). I installed eXist last night to follow along with the talk.

“If you have a larger dataset, eXist may not be the best choice.” Recommended reading: XQuery by Priscilla Walmsley, XQuery wikibook.

Download and install. Needs a full JDK (Mac includes this already in /Library/Java/Home), a mere JRE is insufficient. Start up with bin/startup.sh.

eXist-specific useful functions: request:get-parameter() from the URI query string. transform:transform() function invokes XSLT from within XQuery.

Example uses doc() to fetch an external URL of RSS, check individual items with contains(). Every example is a fully-formed, click-on-a-link-to-run program.

XQuery and PHP: reallly basic integration with simplexml_load_file($myXQueryURL).

Loading scripts into eXist: He uses XML Spy. eXist has a Jaa Web Start admin client.

Q&A: How bit is too big? Maybe 10-20-30 thousand docs. Generate indexes? Yes.

-m

(Production note: somehow, this one didn’t get published live. Now it is.)

Tuesday, December 9th, 2008

XML 2008 liveblog: Introduction to Schematron

Wendell Piez, Mulberry Technologies

Assertion-based schema language. A way to test XML documents. Rule-based validation language. Cool report generator. Good for capturing edge cases.

Same architecture as XSLT. (Schematron specifies, does not perform)

<schema xmlns="http://purl.cclc.org/dsdl/schematron">
  <title>Check sections 12/07</title>
  <pattern id="section-check">
    <rule context="section">
      <assert test="title">This section has no title</assert>
      <report test="p">This section has paragraphs</report>
      ...

Demo. OxygenXML has support. Assert vs. Report – essentially opposites. Assert means “tell me if this if false”. Report means “tell me if this is true”.

“Almost as if Schematron is a harness for XPath testing.”

More examples:

<rule context="note">
  <report test="ancestor::note">A note appears in a note. OK?</report>
</rule>

Binding: Default is XSLT 1, but flexible enough to allow other query langauges via attribute @queryBinding at the top. Many processors allow mix-and-match between XSLT and Schematron. Examples showing just that.

Some tests can be very useful:

test=”every $line in tokenize(., $newline) satisfies string-length($line) le 72″

Q: What if the destination is not a human, but another part of a pipeline? Varies by implementation, but SVRL is standardized as an annex in the ISO spec, part of DSDL.

Use as little or as much as you want, at different times in the document lifecycle. “Schematron is a feather duster that reaches areas other schema languages cannot.” – Rick Jelliffe

As time permits section of the talk:

Other top-level elements: title, pattern, ns, let, p, include, phase, diagnostics.

-m

Tuesday, December 9th, 2008

XML 2008 liveblog: Automating Content Analysis with Trang and Simple XSLT Scripts

Bob DuCharme, Innodata Isogen

Content analysis: why? You’ve “inherited” content. Need to save time or effort.

Handy tool 1: “sort”. As in the Unix command line tool. (Even Windows)

Handy tool 2: “uniq -c”  (flag -c means include counts)

Elsevier contest: interface for reading journals. Download a bunch of articles, and see what’s all in there.

Handy tool 3: Trang. Schema language converter. But can infer a schema from one or more input documents. Concat all sample documents under one root, and infer–this gives a list of all doctypes in use.

trang article.dtd article.rng
trang issueContents.xml issueContents.rng
saxon article.rng compareElsRNG.xsl | sort > compareElsRNG.out

compareElsRNG.xsl has text mode output, ignores input text nodes, and checks whether the RNG has references to each element, outputing “Yes: elementname” or “No: elemenname”. (which gets sorted in step 3)

Helps ferret out places where the schema says 40 different child elements are possible but in practice only 4 are used.

Handy tool 4: James Clark’s sx, converts SGML to XML.

Another stylesheet counts elements producing a histogram. [Ed. I would do this in XQuery in CQ.] Again, can help prioritize parts of the XML to use first. Similar logic for parent/child counts; where @id gets used; find all values for a particular attribute.

Another stylesheet goes through multiple converted-to-rng schemas, looking for common substructure. Lists generated this way can be pulled into a stylesheet.

Analyze a SGML DTD? dtd2html -> tidy -> XSLT. Clients like reports (especially spreadsheets). The is more like lego bricks.

-m

Tuesday, December 9th, 2008

XML 2008 liveblog: Using RDFa for Government Information

Mark Birbeck, Web Backplane.

Problem statement: You shouldn’t have to “scrape” government sites.

Solution: RDFa

<div typeof="arg:Vacancy">
  Job title: <span property="dc:title">Assistant Officer</span>
  Description: <span property="dc:description">To analyse... </span>
</div>

This resolves to two full RDF triples. No separate feeds, uses existing publishing systems. Two of the most ambitious RDFa projects are taking place in the UK. Flexible arrangements possible.

Steps: 1. Create vocabulary. 2. Create demo. 3. Evangelize.

Vocabulary under Google Code: Argot Hub. Reuse terms (dc:title, foaf:name) where possible, developed in public.

Demos: Yahoo! SearchMonkey, (good for helping not-so-technical people to “get it”) then a Drupal hosted one (a little more control).

Next level, a new server that aggregates specific info (like all job openeings for Electricians), incuding geocoding. Ubiquity RDFa helps here.

Evangelizing: Detailed tutorials. Drupal code will go open source. More opportunities with companies currently screen-scrapting. More info @ rdfa.info.

Q&A: Asking about predicate overloading (dc:title). A general SemWeb issue. Context helps. Is RDFa tied to HTML? No, SearchMonkey itself uses RDFa–it’s just attributes.

-m

Tuesday, December 9th, 2008

XML 2008 liveblog: Exploring the New Features of XSLT 2.0

Priscilla Walmsley, Datypic.

“I feel like crying every time I have to go back to 1.0.” Normally this is a full-day course. Familiarity with XSLT 1.0 assumed here. Venn diagram… Much of what people think of as “XQuery” is actually XPath 2.0.

XPath differences: root node -> “document node”. Namespace nodes, axis are deprecated. More atomic types, based on XML Schema. Node-set -> sequence. Path steps can be expressions, like product/(if (desc) then desc else name). Last step can return an atomic value, like sum(//item/(@price * @qty)).

Comparison operators apply to strings, dates, times. (Backwards compatibility note: comparing strings now is done by Unicode code point, not by conversion to number() as in XPath 1.0). Arithmetic possible on dates, durations. Missing value returns empty sequence rather than NaN.

(a,b) to concat sequences. New operators: idiv, union, intersect, except (latter 3 for nodes only)

<xsl:for-each select="1 to $count"> is handy. Operators << and >> test ‘precedes’ and ‘follows’ based on document order. Operator ‘is’ tests node identity.

Statement if/then/else is a more compact xsl:choose. Simplified FLWOR (only one for, no let or where).

Useful functions: ends-with(), string-join(), current-date(), distinct-values(), deep-equal().

From XPath to XSLT: <xsl:for-each-group> with current-group() and current-grouping-key(). Useful for turning a flat document (like HTML with h1, h2, etc. into nested structure. group-starting-with=”html:h1″, etc. The instruction <xsl:function> allows defining a new function. Major benefits in reuse, clarity, and handling recursion. Custom functions can be called from more places, like @select, @group-by, @match, but have the same expressive power of a named template.

Regular expressions: some XPath functions matches(), tokenize(), replace() (including subexpressions). <xsl:analyze-string> splits a string into matching and non-matching parts, handled separately in <xsl:matching-substring> and <xsl:non-matching-substring> child elements and regex-group().

I/O: Instruction <xsl:result-document> allows multiple output files. unparsed-text() allows input of non-XML documents (particularly in conjunction with regex).

Do I have to pay attention to types? “Usually, no.” BUT schemas can help catch errors, improve performance, and open new avenues of processing (like matching a template based on a schema-type).

Odds and ends: tunneling parameters (don’t have to repeat all the params for named templates), multiple modes, @select in more places, @separator attribute on xsl:attribute and xsl:value-of.

Brief Q&A: No test suite available. Probably better for new users to jump straight into 2.0. But going back to 1.0 is still painful. -m

Monday, December 8th, 2008

Overheard and overseen

Overheard at XML 2008: “Wow, it’s a good thing Mark Logic sponosred, otherwise nobody would be here.” (there were only five tables in the expo area.)

Overseen on the XML 2008 schedule: only one mention of XQuery, and that’s in relation to eXist, not the aforementioned sponsor.

This conference does have a different feel to it. Is XML at the ASCII-tipping-point, where it becomes so obvious that conferences aren’t needed? -m

Monday, December 8th, 2008

XML 2008 non-liveblog: Content Authoring Schemas

I was on the panel with Bob DuCharme, Frank Miller, and Evan Lenz discussing content authoring, from DITA to DocBook with some WordML sprinkled in for good measure. It was a good discussion, nothing earth-shaking. This session was laptopless, so I don’t have any significant notes. -m

Monday, December 8th, 2008

XML 2008 liveblog: Accelerated DITA Publishing

Roy Amodeo, Stilo.

Only 4 people in attendance when the talk starts. Quick overview of DITA. Transclusion (conref), topic-level maps, specialization, metadata-based filtering. XML and SGML flavors available. Open Toolkit has been a big part of DITA’s success. Replacable components (XSLT and FO). Many editing environments and CMS’s include this.

Topic-based publishing. Works best with many small, fairly independent topics. How well does the Open Toolkit work when pushing the boundaries? DITA stress test. Raising file size increases processing time faster than linear. Average file size 300k crashed. For overall number of files, roughly linear progression, but still blows up at large volumes.

Enter the OmniMark DITA Accelerator. Behavior modeled after toolkit, but minus the limits (streaming). Uses referents (placeholders left in place, filled in later; 2-pass algorithm). Base speed improvement 4X. Works well past where the Toolkit runs out of memory. Because DITA is standardized, the accelerated implementation can be easily plugged in.

Usability: XSLT exists somewhat uneasily with DITA. DITA Accelerator augments OmniMark with DITA-specific rules.

Conclusion: Standards are about choice of tools. (But how many OmniMark implementations are there?) Still, this makes me think I should check out the OmniMark language. I remain skeptical on DITA.

-m

Monday, December 8th, 2008

XML 2008 liveblog: Content Modeling with XSD Schema

Delivered by Pradeep Jain, Ictect Inc. He has a handout available: “Intelligent Content Plug-In for Microsoft Word”, though it’s not obvious from the program that Word is involved.

What is content modeling? “Getting inside of” content, semantics, from there syntax and XML tagging.

Challenges: art vs. science, tacit vs. written documentation, future-proofing, technical vs. business communication, flexibility vs. stability. Getting knowledge workers to participate. Correctness (an emphasis of Ictect).

What is correctness of a model? More than valid XML. Litmus test: SME says “yep, I think you got it!”. But some machine-generated tests are possible.

Shows a Word doc with different kinds of bibliographic references (articles vs. books). Shows Schema code not visible from the back of the room. Word plug-in displays sidebar with a “convert” function, with several possible Schemas available to work against. Automatically detected sections in the document and added <section> elements. Progressively more complex examples of generated markup.

It seems like this is actually a pretty clever application, though it is hard to tell from this talk. -m

Monday, December 8th, 2008

XML 2008 liveblog: Ubiquity XForms

I will talk about one or more sessions from XML 2008 here.

Mark Birbeck of Web Backplane talking about Ubiquity XForms.

Browsers are slow to adopt new standards. Ajax libraries have attempted to work around this. Lots of experimentation which is both good and bad, but at least has legitimzed extensions to browsers. JavaScript is the assembly language of the web.

Ubiquity XForms is part of a library, which wil also include RDFa and SMIL. Initially based on YUI, but in theory sould be adaptable to other libraries like jQuery.

Declarative: tools for creation and validation. Easier to read. Ajax libraries are approaching the level of being their own language anyway, so might as well take advantage of a standard.

Example: setting the “inner value” of a span: <span value="now()"></span>.

Script can do this easily: onclick="this.innerHTML = Date().toLocaleString();" But crosses the line from semantics to specific behavior. The previous one is exactly how xforms:output works.

Another exapmple: tooltips. Breaks down to onmouseover, onmouseout event handlers, show and hide. A jQuery-like approach can search the document for all tooltip elements and add the needed handlers, avoiding explicit behavioral code. This is the essence of Ubiquity XForms (and in fact XForms itself).

Patterns like these compose under XForms. A button (xf:trigger) or any form control can easily have a tooltip (xf:hint). These are all regular elements, stylable with CSS, accesible via DOM, and so forth. Specific events (like xforms-hint) fire for specific events, and a spreadsheet-like engine can update interdependencies.

Question: Is this client-side? A: Yes, all running within Firefox. The entire presentation is one XForms document.

Demo: a range control with class=”geolocation” that displays as a map w/ Google Maps integration. The Ubiquity XForms library contains many such extensibility points.

Summary: Why? Simple, declarative. Not a programming language. Speeds up development. Validatable. Link: ubiquity.googlecode.com.

Q&A: Rich text? Not yet, but not hard (especially with YUI). Formally XForms compliant? Very nearly 1.1 conforming.

-m

Friday, November 28th, 2008

Fun with xdmp:value()

Lately I’ve been playing with some more advanced XQuery. One thing nearly every XQuery engine supports is some kind of eval() function. MarkLogic has several, but my favorite is xdmp:eval. It’s lightweight because it reuses the entire calling context, so for instance you can write let $v := 5 return xdmp:value("$v"). Not too useful, but if the expression passed in comes from a variable, it gets interesting.

Now, quite a few standards based on XPath depend on the context node being set to some particular node. This turns out to be easy too, using the path operator: $context/xdmp:value($expr). According to the definition of the XPath path operator, the expression to the right is evaluated with the results of the expression on the left setting the context node.

OK, how about setting the context size and position? More difficult, but one could use a sequence on the left-hand side of the path operator, with the desired $context node in somewhere in the middle. Then last() will return the length of the sequence, and position() will return, well, the position of $context in the sequence. But it’s kind of hacky to manufacture a bunch of temporary nodes, only to throw them away in the next step of the path.

I’m curious if anyone else has done something similar. Comments? -m

Tuesday, November 4th, 2008

XiX: Details about XForms in XQuery

I was asked offline for more details about what I have in mind around XiX.

Take a simple piece of XML, like this: <root><a>3</a><b>4</b><total/></root>.

An XForms Model can be applied, in an out-of-line fashion, to that instance. This is done through a bind element, with XPath to identify the nodes in question, plus other “model item properties” to annotate the instance. The calculate property is a good one: <bind nodeset="total" calculate="../a + ../b"/>. When called upon to refresh the instance, as you would expect, the result contains the node <total>7</total>.

Like lots of algorithms, though, XForms is defined in a thoroughly procedural manner. Functional programming has a stricture against assignment operators, like setting the value “7″ into the calculated node above. So the challenge is coming up with an implementation that works within these bounds. For example, perhaps a function that takes an original instance as input, and returns a newly-created updated instance. Simple enough for the example here, but in more complex cases with different and interacting model item properties, regenerating the entire instance frequently has performance penalties.

So, I’m trying to find the right expression of the XForms Model in a functional guise. (As with RDFa). I’m curious about what anyone else has come up with in this area. -m

Thursday, October 30th, 2008

XiX (XForms in XQuery)

I’m pondering implementing the computational parts of the XForms Model in XQuery. Doing so in a largely functional environment poses some challenges, though. Has anybody tackled this before? How about in any functional language, including ML, Haskell, Scheme, XSLT, or careful Python?

I borrowed the book Purely Functional Data Structures from a friend–this looks to be a good start. What else is out there? Comment below. -m

Thursday, October 23rd, 2008

RDFa is a Recommendation

Haven’t mentioned here that RDFa is a W3C Recommendation. I’m thrilled that something that I’ve been thinking about for a while is ready for prime time.

Also, as of this writing the first page of results at Google still prominently links to a terribly outdated draft of the spec. The first page of results at Yahoo! nails it. Just sayin’.

-m

Friday, October 10th, 2008

More mobile XForms goodness

I haven’t tried this, but these guys claim to have a solution where

The form definitions are saved and exchanged as XForms, and the data as XForm[s] models. The data can be exchanged over http (if the phone users can afford GPRS and have a data connection) or over compressed SMS messages.

Sounds like they have the right idea… -m

Thursday, October 2nd, 2008

XForms spambots on the loose

A determined spambot has been submitting the XForms contact form on XForms Institute. OK, so it’s probably more Flash-aware than XForms-aware, but still. -m

Wednesday, September 17th, 2008

The case for native higher-order functions in XQuery

The XQuery Working Group is debating the need for higher-order functions in the language. I’m working on honing my description of why this is an important feature. Does this work? What would work better?

Imagine you are writing a smallish widget app, in an environment without a standard library. When you need to sort your widgets, you’d write a simple function with a signature like sort(sequence-of-widgets). That’s great.

Now imagine you find your app to be steadily growing. An accumulation of smaller one-off solutions won’t work anymore, you need a general solution. What you’ll end up with is something like qsort in C, which takes a pointer to a comparator function. By providing different comparators, you can sort anything any way you like, all through only a single sort function. C and C++ have something like this, as do PHP, Python, Java, JavaScript, and even assembly language. XSLT has it, as proven by Dimitre.

XQuery doesn’t. It should, because people are now using it for more than short queries. People are writing programs in it. -m

P. S. Comment please.

Monday, August 4th, 2008

Implementing RDFa in XQuery

Through the weekend I put most of the final touches on an implementation of RDFa in XQuery. The implementation is based on the functional specification of RDFa, an offshoot of the excellent work coming out of the W3C task force.

The spec contains a procedural description of the parsing algorithm, and several have successfully followed it to arrive at a conforming implementation. But you would have tough times explaining RDFa to someone that way. The functional description sort of fell out of the way I described RDFa to people.

“When you see an element with XXXX, you generate a triple, using SSSS as the subject, PPPP as the predicate, and OOOO as the object.”

Which arguably is the more natural way to express the algorithm for functional languages like XQuery or XSLT. Fill in the right blanks and you pretty much have it. In practice, it’s somewhat more complicated, but not nearly so much as with other W3C specs.

I hope to make the code available soon. You’ll hear about it first here.

I’ll write more when I’m not exhausted. :-) -m

Friday, July 25th, 2008

Complete this sequence…

In C, if you find yourself writing large switch statements (or rafts of if statements), you should consider using pointers to functions instead.

In C++, if you find yourself writing large switch statements (or rafts of if statements), you should consider using objects and polymorphism instead.

In XQuery, If you find yourself writing large typeswitch statements (or rafts of if statements), you should consider using _______________ instead.

Comment here. -m