Archive for the 'standards' Category

Friday, March 5th, 2010

A Hyperlink Offering revisited

The xml-dev mailing list has been discussing XLink 1.1, which after a long quiet period popped up as a “Proposed Recommendation”, which means that a largely procedural vote is is all that stands between the document becoming a full W3C Recommendation. (The previous two revisions of the document date to 2008 and 2006, respectively)

In 2005 I called continued development of XLink a “reanimated spectre”. But even earlier, in 2002 I wrote one of the rare fiction pieces on xml.com, A Hyperlink Offering, which using the format of a Carrollian dialog between Tortoise and Achilles, explained a few of the problems with the XLink specification. It ended with this:

What if the W3C pushed for Working Groups to use a future XLink, just not XLink 1.0?

Indeed, this version has minor improvements. In particular, “simple” links are simpler now–you can drop an xlink:href attribute where you please and it’s now legit. The spec used to REQUIRE additional xlink:type=”simple” attributes all over the place. But it’s still awkward to use for multi-ended links, and now even farther away from the mainstream hyperlinking aspects of HTML5, which for all of its faults, embodies the grossly predominant description of linking on the web.

So in many ways, my longstanding disappointment with XLink is that it only ever became a tiny sliver of what it could have been. Dashed visions of Xanadu dance through my head. -m

Monday, February 22nd, 2010

Mark Logic User Conference 2010

Are you coming? Link. It starts on May 4 (Star Wars day!) at the InterContinental Hotel in San Francisco. Guest speakers include Chris Anderson, Editor-in-Chief of Wired and Michelle Manafy, Editor-in-Chief of EContent magazine.

Early bird registration ends Feb 28. -m

Tuesday, February 16th, 2010

There is no honor in namespaces

As heard from my friend and Mark Logic contractor Ryan Grimm. -m

Wednesday, January 20th, 2010

XForms: binding to an optional element

I asked this on the XSLTForms list, but it’s worth casting a wider net.

Say I have an XML instance like this:
<root>
<format>xml</format>
</root>

Possible values for the format element are “xml”, “text”, “binary”, or a default setting, indicated by a complete absence of the <format> element. I’d like this to attach to a select1 with four choices:

<xf:select1 ref=”format”>…

* xml
* text
* binary
* default

Where the first three set the value of the <format> element, and the fourth omits the element altogether. In XForms 1.0, the difficulty comes from binding to a nonexistent element, which causes it to become non-relevant. Are there features from 1.1, 1.2, or extensions that make this case easier? Anyone have an example?

-m

Sunday, November 29th, 2009

The Model Endpoint Template (MET) organizational pattern for XRX apps

One of the lead bullets describing why XForms is cool always mentions that it is based on a Model View Controller framework. When building a full XRX app, though, MVC might not be the best choice to organize things overall. Why not?

Consider a typical XRX app, like MarkLogic Application Builder. (You can download a your copy of MarkLogic, including Application Builder, under the community license at the developer site.) For each page, the cycle goes like this:

  1. The browser requests a particular page, say the one that lets you configure sorting options in the app you’re building
  2. The page loads, including client-side XForms via JavaScript
  3. XForms requests the project state as XML from a designated endpoint; this becomes the XForms Instance Data
  4. Stuff happens on the page that changes the client-side state
  5. Just before leaving the page, XML representing the updated state is HTTP PUT back to the endpoint

The benefit of this approach is that you are dealing with XML all the way through, no impedance mismatches like you might find on an app that awkwardly transitions from (say) relational data to Java objects to urlencoded name/value pairs embedded in HTML syntax.

So why not do this in straight MVC? Honestly, MVC isn’t a bad choice, but it can get unwieldy. If an endpoint consists of a separate model+view+controller files, and each individual page consists of separate model+view+controller files, it adds up to a lot of stuff to keep track of. In truly huge apps, this much attention to organization might be worth it, but most apps aren’t that big. Thus the MET pattern.

Model: It still makes sense to keep the code that deals with particular models (closely aligned with Schemas) as a separate thing. All of Application Builder, for example, has only one model.

Endpoint: The job of an endpoint is to GET and PUT (and possibly POST and DELETE) XML, or other equivalent resource bundles depending on how many media types you want to deal with. It combines an aspect of controllers by being activated by a particular URL and views by providing the data in a consistent format.

Template: Since XForms documents already contain MVC mechanics, it not a high-payoff situation to further use MVC to construct the XForms and XHTML wrapper themselves. The important stuff happens within XForms, and then you need various templating mechanisms for example to provide consistent headers, footers, and other pieces across multiple pages. For this, an ordinary templating mechanism suffices. I can imagine dynamic assembly scenarios where this wouldn’t be the case, but again, many apps don’t need this kind of flexibility, and the complexity that comes along with it.

What about separation of concerns? Oh yeah, what about it? :-) Technically both Endpoints and Templates violate classical SOC. In an XRX app, this typically doesn’t lead to the kinds of spaghetti situations that it might otherwise. Endpoints are self contained, and can focus on doing just one thing well; with limited scope comes limited ability to get into trouble. For those times when you need to dig into the XQuery code of an endpoint, it’s actually helpful to see both the controller and view pieces laid out in one file.

As for Templates, simplicity wins. With the specifics of models and endpoints peeled away, the remaining challenge in developing individual pages is getting the XForms right, and again, it’s helpful to minimize the numbers of files one XForms page are split across. YAGNI applies to what’s left, at least in the stuff I’ve built.

So, I’ve been careful in the title to call this an “organizational pattern”, not a “design pattern” or an (ugh) “architectural pattern”. Nothing too profound here. I’d be happy to start seeing XRX apps laid out with directory names like “models”, “endpoints”, and “templates”.

What do you think? Comments welcome.

-m

Wednesday, October 21st, 2009

Application Builder behind-the-scenes

I’ll be speaking next Tuesday (Oct 27) at the Northern Virginia MarkLogic User Group (NOVAMUG). Here’s what I’ll be talking about.

Application Builder consists of two main parts: Search API to enable Google-style search string processing, and the actual UI wizard that steps users through building a complete search app. It uses a number of technologies that have not (at least not up until now!) been widely associated with MarkLogic. Why some technologies that seem like a perfect fit for XML apps are less used in the Mark Logic ecosystem is anyone’s guess, but one thing App Builder can contribute to the environment is some fresh DNA. Maybe your apps can benefit from these as well.

XForms and XRX. Clicking through the screens of App Builder is really a fancy way of editing XML. Upon first arriving on a page, the client makes a GET request to an “Application XML Endpoint” (axe.xqy) to get the current state of the project, which is rendered in the user interface. Interacting with the page edits the in-memory XML. Afterwards, the updated state is PUT back to the same endpoint upon clicking ‘Save’ or prior to navigating away. This is a classic XRX architecture. MarkLogic ships with a copy of the XSLTForms engine, which makes use of client-side XSLT to transform XForms Markup into divs, spans, classes, and JavaScript that can be processed entirely in the browser. Thus XForms works on all supported browsers all the way back to IE6. The apps built by the App Builder don’t use any XForms (yet!) but as App Builder itself demonstrates, it is a great platform for application development.

To be honest, many XForms apps have fallen short on the polished UI department. Not so with App Builder, IMHO. An early, and in hindsight somewhat misdirected, thrust of XForms advocacy pushed the angle of building apps with zero script needed. But one advantage of using a JavaScript implementation of XForms is that it frees you to use script as needed. So in many places, large amounts of UI, all mapped to XML, are able to be hidden away with CSS, and selectively revealed (or mapped to yet other HTML form controls) in small, self-contained overlays triggered via script. While it doesn’t fulfill the unrealistic promise of completely eliminating script, it’s a useful technique, one I predict we’ll see more of in the future.

Relax NG. XML Schema has its roots deep into the XML infrastructure. The type system of XQuery and XSLT 2.0 is based on it. Even XForms has ties to it. But for its wide reach, XML Schema 1.0 has some maddening limitations, and “takes some getting used to” before one can sight read it. In the appendices of many recent W3C specifications use the highly-readable compact syntax to describe content models is a way equally human and machine-readable.

What are these limitations I speak of? XML Schema 1.1 goes a long way toward resolving these, but isn’t yet widely in use. Take this example, the content model of the <options> element from Search API:

start = Options | Response

# Root element
OptionsType = (
 AdditionalQuery? &
 Annotation* &
 ConcurrencyLevel? &
 Constraint* &
 Debug? &
 DefaultSuggestionSource? &
 Forest* &
 Grammar? &
 Operator* &
 PageLength? &
 QualityWeight? &
 ReturnConstraints? &
 ReturnFacets? &
 ReturnMetrics? &
 ReturnQtext? &
 ReturnQuery? &
 ReturnResults? &
 ReturnSimilar? &
 SearchOption* &
 SearchableExpression? &
 SortOrder* &
 SuggestionSource* &
 Term? &
 TransformResults?
)

The start line indicates that, within this namespace, there are two possible root elements, either <options> or <response> (not shown here). An instance with a root of, say search:annotation is by definition not valid. Try representing that in XML Schema.

The definition of OptionsType allows a wide variety of child elements, some zeroOrMore times, other optional (zero or one occurrence), with no ordering restrictions at all between anything. XML Schema can’t represent this either. James Clark’s trang tool converts Relax NG into XML Schema, and has to approximate this as an xsd:choice with maxOccurs=”unbounded”, thus the elements that can only occur once are not schema-enforced. Thus the Relax NG description of the content model, besides being more readable, actually contains more information than the closest XML Schema. So particularly for XML vocabularies that are optimized for human use, Relax NG is a good choice for schema development.

Out of line validation. So if XML Schema doesn’t fully describe the <options> node, how can authors be sure they have constructed one correctly? Take a step back: even if XML Schema could fully represent the content model, for performance reasons you wouldn’t want to repeatedly validate the node on every query. The options node tends to change infrequently, mainly during a development cycle. Both of these problems can be solved with out-of-line validation: a separate function call search:check-options().

Inside this function you’ll find a validate expression that will make as much use of the Schema as it can, but also much more. The full power of XQuery can be leveraged against the proposed <options> node to check for errors or inconsistencies, and provide helpful feedback to the developer. Since it happens out-of-line, these checks can take substantially longer than actually handing the query based on them. The code can go as in-depth as it needs to without performance worries. This is a useful technique in many situations. One potential shortfall is that people might forget to call your validation function, but in practice this hasn’t been too much trouble.

Higher-order functions. The predecessor to Search API had a problem that it was so popular that users would modify it to suit their unique requirements, which lead to dozens of minor variations floating around in the wild. Different users have different needs and expectations for the library, and making a one-size-fits-all solution is perhaps not possible. One way to relieve this kind of feature pressure is to provide enough extension hotspots to allow all the kinds of tweaks that users will want, preserving the mainline code. This involves prediction, which is difficult (especially about the future). But a good design makes this possible.

Look inside the built-app and you will find a number of function pointers, implemented as a new datatype xdmp:function. XQuery 1.1 will have a more robust mechanism for this, but it might be a while before this is widespread. By modifying one file, changing a pointer to different code, nearly every aspect of the application can be adjusted.

Similarly, a few hotspots in the Search API can be customized, to hook in different kinds of parsers or snippet-generators. This powerful technique can take your own apps to the next level.

-m

Wednesday, October 21st, 2009

XForms 1.1 is out

XForms 1.1 is now a full W3C Recommendation. Compared to version 1.0, which went live a bit more than 6 years ago, version 1.1 offers lots of road-tested tools that make development easier and more powerful, including new datatypes and XPath functions, a significantly more powerful submission subsystem, and a more flexible event model.

And XSLTForms already supports almost all of the new goodies. -m

Wednesday, October 7th, 2009

US Federal Register in XML

Fed Thread is a front end for the newly XMLified Federal Register. Why is this a big deal? It’s a daily publication of the goings-on of the US government. It’s a primary source for all kinds of things that normally only get rehashed through news organizations. And it is bulky–nobody can read through it on a regular basis. A yearly subscription (printed) would cost nearly $1000 and fill over 80,000 pages.

Having it in XML enables all kinds of searching, syndication, and annotation via flexible front ends like this one. Yay for transparency. -m

Tuesday, September 22nd, 2009

XForms Developer Zone

Another XForms site launched this week. This one seems pretty close to what I would like XForms Institute to become, if I had an extra 10 hours per week. -m

Tuesday, August 18th, 2009

Geek Thoughts: reading XProc code

All the input/output/port stuff in XProc seemed incomprehensible to me until I recognized something simple. Every time you see a <pipe> element, read it as “comes from”. For example

  <p:output port="result">
    <p:pipe step="validated" port="result"/>
  </p:output>

reads as ‘output to the “result” port comes from the port “result” on step “validated”‘ and

  <p:input port="source">
    <p:pipe step="included" port="result"/>
  </p:input>

reads as ‘input for the “source” port comes from the port “result” on step “included”‘. If you keep this in mind it all makes much more sense.

More collected Geek Thoughts at http://geekthoughts.info.

Wednesday, August 5th, 2009

Misunderstanding Markup

On this comic’s panel 9 describes XHTML 1.1 conformance as:

the added unrealistic demand that documents must be served with an XML mime-type

I can understand this viewpoint. XHTML 1.1 is a massively misunderstood spec, particularly around the modularization angle. But because of IE, it’s pretty rare to see the XHTML media-type in use on the open web. Later, panel 23 or thereabouts:

If you want, you can even serve your documents as application/xhtml+xml, instantly transforming them from HTML 5 to XHTML 5.

Why the shift in tone? What makes serving the XML media type more realistic in the HTML 5 case? IE? Nope, still doesn’t work. I’ve observed this same shift in perspective from multiple people involved in the HTML5 work, and it baffles me. In XHTML 1.1 it’s a ridiculous demand showing how out of touch the authors were with reality. In HTML5 the exact same requirement is a brilliant solution, wink, wink, nudge, nudge.

As it stands now, the (X)HTML5 situation demotes XHTML to the backwaters of the web. Which is pretty far from “Long Live XHTML…”, as the comic concludes. Remember when X stood for Extensible?

-m

Friday, July 24th, 2009

Java-style namespaces for markup

I’m noodling around with requirements and exploring existing work toward a solution for “decentralized extensability” on xml-dev, particularly for HTML. The notion of “Java-style” syntax, with reverse dns names and all, has come up many times in the context of these kinds of discussions, but AFAICT never been fully fleshed out. This is ongoing, slowly, in available time–which as been a post or two per week.  (In case there is any doubt, this is a spare-time effort not connected with my employer)

Check it out and add your knowledge to the thread. -m

Saturday, July 11th, 2009

The decline of the DBMS era

Several folks have been pointing to this article which has some choice quotes along the lines of

If we examine the nontrivial-sized DBMS markets, it turns out that current relational DBMSs can be beaten by approximately a factor of 50 in most any market I can think of.

My employer is specifically mentioned:

Even in XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors

And it’s true, but don’t take my word for it. :-) The DBMS world has lots of inertia, but don’t let that blind you to seeing another way to solve problems. Particularly if that extra 50x matters. -m

Tuesday, July 7th, 2009

Demo Jam at Balisage 2009

Come join me at the Demo Jam at Balisage this year. August 11 at 6:30 pm. There will be lots of cool demos, judged by audience participation. I’d love to see you there. -m

Thursday, July 2nd, 2009

And then there were one…

On May 8 I wrote:

it’s time for the W3C to show some tough love and force the two (X)HTML Working Groups together.

On July 2, the W3C wrote:

Today the Director announces that when the XHTML 2 Working Group charter expires as scheduled at the end of 2009, the charter will not be renewed. By doing so, and by increasing resources in the Working Group, W3C hopes to accelerate the progress of HTML 5 and clarify W3C’s position regarding the future of HTML.

The real test is whether the single HTML Working Group can be held to the standard of other Working Groups, and be able to recruit some much-needed editorial help from some of the displaced XHTML 2 gang.  -m

Wednesday, June 3rd, 2009

See you at Balisage

Balisage, formerly Extreme Markup, is the kind of conference I’ve always wanted to attend.

Historically my employers have been not quite enough involved in the deep kinds of topics at this conference (or too cash-strapped, but let’s not go there) to justify spending a week on the road. So I’m glad that’s no longer the case: Mark Logic is sponsoring the conference this year. I’m looking forward to the show, and since I’m not speaking, I might be able to relax a little and soak in some of the knowledge.

See you there! -m

Saturday, May 30th, 2009

XForms Institute moved to SVN

About a week ago I moved XForms Institute over to Subversion. Now the entire site is under version control, with a local copy I can edit. Publishing is as easy as logging in and running the command ’svn up’. Honestly, I should have done this long ago. And any future sites I work on will use this approach too–it’s fantastic.

If you notice any glitches, let me know. -m

Tuesday, May 12th, 2009

Google Rich Snippets powered by RDFa

The new feature called rich snippets shows that SearchMonkey has caught the eye of the 800 pound gorilla. Many of the same microformats and RDF vocabularies are supported. It seems increasingly inevitable that RDFa will catch on, no matter what the HTML5 group thinks. -m

Friday, May 8th, 2009

HTML: The Markup Language marks a new beginning

If you haven’t already, check out HTML: The Markup Langauge. Besides being a cool new recursive acronym for HTML, it is a reasonably-sane document. Also worth a look: Differences between HTML4 and HTML5. Many of the ideas from XHTML 2 (of which I was an editor at one point) are there.

I think it’s time for the W3C to show some tough love and force the two (X)HTML Working Groups together.

A while ago, I argued that the existence of both Flickr and Yahoo! Photos as an effective two-pronged strategy. Look how that worked out–Y! Photos is permanently shuttered. While there were benefits including a broader potential reach, in aggregate the benefits didn’t amount to more than the immense cost of having two parallel efforts. Same here. -m

Sunday, April 26th, 2009

XForms validator: disabling Google ads, no more blank pages

Thanks to those who wrote in with bug reports about the XForms Validator: something changed recently and made the inserted Google Ads script confuse browsers, resulting in a blank page where you’d expect results. I’ve turned off the response-page ads, which were only getting in the way, and the problem seems to have vanished. Carry on. :-) -m

Friday, April 24th, 2009

EXPath.org

I’ve always thought that the EXSLT model of developing community specifications worked well. Now a critical mass of folks has come together on a similar effort, aimed at providing extensions usable in XPath 2.0, XSLT 2.0, XQuery, and other XPath-based languages like XProc. Maybe even XForms.

Check it out, subscribe to the mailing list, and help out if you can. -m

Tuesday, April 7th, 2009

GPL’s Cloudy Future

I enjoyed this post, from Jeremy Allison as it turns out. It talks about how GPL software is “the new BSD” when it comes to cloud computing, since redistribuion of the software doesn’t happen, and thus doesn’t trigger the relevant clauses of the GPL. Any old company can use, re-use, and modify the software without sharing the code in the original spirit of the license. The community’s response–something I need to keep a closer eye on–is the AGPL, or Affero license. It works similarly to the GPL, but is triggered by remote use of the software, not just distribution, preserving the work’s copylefedness even in cloud computing situations. -m

Tuesday, March 24th, 2009

XIN: Implicit namespaces

An interesting proposal from Liam Quin, relating to the need for huge rafts of namespace declarations on mixed namespace documents.

In practice, though, almost all elements [in the given example] are going to be unambiguous if you take their ancestors into account, and attributes too.

Amen. I’ve been saying things like this for five years now. Look at any introductory text on XML, and the example used to show the need for namespaces will be embarrassingly contrived. That’s not a dig against authors, it’s a dig against over-engineered solutions to non-problems.

-m

Wednesday, February 25th, 2009

Brian May explains relativity

This is fantastic. Brian May (yes THAT Brian May) not only blogs, but talks about all kinds of challenging subjects. Like how and why space and time are linked. Worth a read. -m

Monday, February 16th, 2009

Crane Softwrights adds XQuery training

From the company home page, reknown XSLT trainer and friend G. Ken Holman has expanded his offerings to include XQuery training. The first such session is March 16-20, alongside XML Prague.

I’ve always thought there is great power in having both XSLT and XQuery tools at one’s disposal. I’ve seen people tend to polarize into one camp or the other, but in truth there is a lot of common ground, as well as cases where the right technology makes for a much more elegant solution. So learning both is easier than it seems, and more useful than it seems.

If you will be around the conference, take a look at the syllabus. I’m curious to see others’ reactions toward the combined XSLT + XQuery toolset. -m

Wednesday, February 4th, 2009

XSLTForms beta

XSLTForms, the cross-browser XForms engine (written about previously) that makes ingenious use of built-in XSLT processing, reached an important milestone today, with a beta release. Tons of bug fixes and additional support for CSS and Schema.

If you’re thinking about getting involved with XForms and are looking for something small and approachable, give it a look. -m

Monday, December 22nd, 2008

XForms for HTML

I’ve heard not a peep about this before, but here it is: XForms for HTML. Let’s read this together. Feel free to drop any comments or observations below. -m

Friday, December 19th, 2008

XSLTForms looks promising

Implementing client-side forms libraries is, and has been, all the rage. I’ve seen Mozquito Factory do amazing things in Netscape 4, Technical Pursuits TIBET on the perpetual verge of release, UGO, and others. In a more recent time scale, Ubiquity XForms impresses me and many others, and it has the right combination of funding and willing developers.

From a comment on my recent posting about Ubiquity XForms, I was pleased to learn about XSLTforms, a rebirth of AjaxForms, which I thought well of two years ago until its developer mysteriously left the project. But Software Libre lives on, and a new developer has taken over, this time using client-side XSLT instead of server-side Java to do the first pass of processing. Given the strong foundation, the project has come a long way in a short time, and already runs against a wide array of non-trivial examples. Check it out.

I’d like to hear what others think about this project. -m

Thursday, December 11th, 2008

XML 2008 liveblog: Introduction to eXist and XQuery

Greg Watson, IT Specialist, Defense Intelligence Agency Missile and Space Intelligence Center (apparently it IS rocket science). I installed eXist last night to follow along with the talk.

“If you have a larger dataset, eXist may not be the best choice.” Recommended reading: XQuery by Priscilla Walmsley, XQuery wikibook.

Download and install. Needs a full JDK (Mac includes this already in /Library/Java/Home), a mere JRE is insufficient. Start up with bin/startup.sh.

eXist-specific useful functions: request:get-parameter() from the URI query string. transform:transform() function invokes XSLT from within XQuery.

Example uses doc() to fetch an external URL of RSS, check individual items with contains(). Every example is a fully-formed, click-on-a-link-to-run program.

XQuery and PHP: reallly basic integration with simplexml_load_file($myXQueryURL).

Loading scripts into eXist: He uses XML Spy. eXist has a Jaa Web Start admin client.

Q&A: How bit is too big? Maybe 10-20-30 thousand docs. Generate indexes? Yes.

-m

(Production note: somehow, this one didn’t get published live. Now it is.)

Tuesday, December 9th, 2008

XML 2008 liveblog: Introduction to Schematron

Wendell Piez, Mulberry Technologies

Assertion-based schema language. A way to test XML documents. Rule-based validation language. Cool report generator. Good for capturing edge cases.

Same architecture as XSLT. (Schematron specifies, does not perform)

<schema xmlns="http://purl.cclc.org/dsdl/schematron">
  <title>Check sections 12/07</title>
  <pattern id="section-check">
    <rule context="section">
      <assert test="title">This section has no title</assert>
      <report test="p">This section has paragraphs</report>
      ...

Demo. OxygenXML has support. Assert vs. Report – essentially opposites. Assert means “tell me if this if false”. Report means “tell me if this is true”.

“Almost as if Schematron is a harness for XPath testing.”

More examples:

<rule context="note">
  <report test="ancestor::note">A note appears in a note. OK?</report>
</rule>

Binding: Default is XSLT 1, but flexible enough to allow other query langauges via attribute @queryBinding at the top. Many processors allow mix-and-match between XSLT and Schematron. Examples showing just that.

Some tests can be very useful:

test=”every $line in tokenize(., $newline) satisfies string-length($line) le 72″

Q: What if the destination is not a human, but another part of a pipeline? Varies by implementation, but SVRL is standardized as an annex in the ISO spec, part of DSDL.

Use as little or as much as you want, at different times in the document lifecycle. “Schematron is a feather duster that reaches areas other schema languages cannot.” – Rick Jelliffe

As time permits section of the talk:

Other top-level elements: title, pattern, ns, let, p, include, phase, diagnostics.

-m