I’ll be speaking next Tuesday (Oct 27) at the Northern Virginia MarkLogic User Group (NOVAMUG). Here’s what I’ll be talking about.
Application Builder consists of two main parts: Search API to enable Google-style search string processing, and the actual UI wizard that steps users through building a complete search app. It uses a number of technologies that have not (at least not up until now!) been widely associated with MarkLogic. Why some technologies that seem like a perfect fit for XML apps are less used in the Mark Logic ecosystem is anyone’s guess, but one thing App Builder can contribute to the environment is some fresh DNA. Maybe your apps can benefit from these as well.
Relax NG. XML Schema has its roots deep into the XML infrastructure. The type system of XQuery and XSLT 2.0 is based on it. Even XForms has ties to it. But for its wide reach, XML Schema 1.0 has some maddening limitations, and “takes some getting used to” before one can sight read it. In the appendices of many recent W3C specifications use the highly-readable compact syntax to describe content models is a way equally human and machine-readable.
What are these limitations I speak of? XML Schema 1.1 goes a long way toward resolving these, but isn’t yet widely in use. Take this example, the content model of the <options> element from Search API:
start = Options | Response
# Root element
OptionsType = (
The start line indicates that, within this namespace, there are two possible root elements, either <options> or <response> (not shown here). An instance with a root of, say search:annotation is by definition not valid. Try representing that in XML Schema.
The definition of OptionsType allows a wide variety of child elements, some zeroOrMore times, other optional (zero or one occurrence), with no ordering restrictions at all between anything. XML Schema can’t represent this either. James Clark’s trang tool converts Relax NG into XML Schema, and has to approximate this as an xsd:choice with maxOccurs=”unbounded”, thus the elements that can only occur once are not schema-enforced. Thus the Relax NG description of the content model, besides being more readable, actually contains more information than the closest XML Schema. So particularly for XML vocabularies that are optimized for human use, Relax NG is a good choice for schema development.
Out of line validation. So if XML Schema doesn’t fully describe the <options> node, how can authors be sure they have constructed one correctly? Take a step back: even if XML Schema could fully represent the content model, for performance reasons you wouldn’t want to repeatedly validate the node on every query. The options node tends to change infrequently, mainly during a development cycle. Both of these problems can be solved with out-of-line validation: a separate function call search:check-options().
Inside this function you’ll find a validate expression that will make as much use of the Schema as it can, but also much more. The full power of XQuery can be leveraged against the proposed <options> node to check for errors or inconsistencies, and provide helpful feedback to the developer. Since it happens out-of-line, these checks can take substantially longer than actually handing the query based on them. The code can go as in-depth as it needs to without performance worries. This is a useful technique in many situations. One potential shortfall is that people might forget to call your validation function, but in practice this hasn’t been too much trouble.
Higher-order functions. The predecessor to Search API had a problem that it was so popular that users would modify it to suit their unique requirements, which lead to dozens of minor variations floating around in the wild. Different users have different needs and expectations for the library, and making a one-size-fits-all solution is perhaps not possible. One way to relieve this kind of feature pressure is to provide enough extension hotspots to allow all the kinds of tweaks that users will want, preserving the mainline code. This involves prediction, which is difficult (especially about the future). But a good design makes this possible.
Look inside the built-app and you will find a number of function pointers, implemented as a new datatype xdmp:function. XQuery 1.1 will have a more robust mechanism for this, but it might be a while before this is widespread. By modifying one file, changing a pointer to different code, nearly every aspect of the application can be adjusted.
Similarly, a few hotspots in the Search API can be customized, to hook in different kinds of parsers or snippet-generators. This powerful technique can take your own apps to the next level.