Archive for the 'Mark Logic' Category

Thursday, December 8th, 2011

Resurgence of MVC in XQuery

There’s been an increasing amount of talk about MVC in XQuery, notably David Cassel’s great discussion and to an extent Kurt Cagle’s platform discussion that touched on forms interfaces. Lots of Smart People are thinking in this area, and that’s a good thing.

A while back I recorded my thoughts on what I called MET, or the Model Endpoint Template organizational pattern, as used in MarkLogic Application Builder. One difference between 2009 and now, though, is that browsers have distanced themselves even farther from XML, which tends to undercut the eliminate-the-impedance-mismatch argument. In particular, the forms model in HTML5 continues to prefer flat data, which to me indicates that models still play an important role in XQuery web apps.

So I envision the app lifecycle like this:

  1. The browser requests a particular page, say the one that lets you configure sorting options in the app you’re building
  2. An HTML page loads.
  3. Client-side script requests the project state from a designated endpoint, the server transforms the XML into a flat list, and delivers it as JSON (as an optimization, the server can package the initial data into the page delivered in the prior step)
  4. Standard form interaction and client-side scripting happens, including manipulation of repeating structures mediated by JavaScript
  5. A standard form submit happens (possibly via script), sending a flat list back to the client, which performs an update to the stored XML.
It’s pretty easy to envision data-mapping tools and libraries that help automate the construction of the transforms mentioned in steps 3 and 5.

Another thing that’s changed is the emergence of XQuery plugin technology in MarkLogic. There’s a rapidly-growing library of reusable components, initially centered around Information Studio but soon to cover more ground. This is going to have a major impact on XQuery app designs as components of the app (think visualization widgets) can be seamlessly added to apps.

Endpoints still make a ton of sense for XQuery apps, and provide the additional advantage that you now have a testable, concern-separated data layer for your app. Other apps have a clean way to interop, and even command-line operaton is possible with off-the-shelf-tools like wget.

Lastly, Templates. Even if you use plugins for the functional core of your app, there’s still a lot of boilerplate stuff you’d not want to repeat. Something like Mustache.xq is a good fit for this.

Which is all good–but is it MVC? This organizational pattern (let’s call it MET 2.0) is a lot closer to it. Does MET need a controller? Probably. (MarkLogic now ships a pretty good one called rest:rewrite) Like MVC, MET separates the important essences of your application. XQuery will never be Ruby or Java, and its frameworks will never be Rails or Spring, but rather something uniquely poised to capture the expressive power of the language to build apps on top of unstructured and big data. -m

Tuesday, November 1st, 2011

5 things to know about MarkLogic 5

MarkLogic 5 is out today. Here’s five things beyond the official announcement that developers should know about it:

  1. If you found the CQ sample useful, you’ll love Query Console, which does everything CQ does and more (syntax highlighting!)
  2. Better Search API support for metadata: MarkLogic has always had support for storing metadata separately from documents. With new Search API support, it’s easy to set up, and it works great with databases of binary documents.
  3. The Hadoop connector, while not officially supported in this configuration, works on Mac. I know a lot of developers use Mac hardware. Once you get Hadoop itself set up (following rules like these), everything works great in my experience.
  4. “Fields” have gotten more general and more powerful. If you haven’t set aside named portions of your documents or metadata for special indexing and access, you should look in to this feature–it will rock your world.
  5. To better understand what your system is doing at any point in time, you can now use the built-in Monitoring Dashboard, which runs in-browser.
And let’s not leave out the Express license, which makes it easier to get started. Check it out.
-m

Thursday, February 17th, 2011

MarkLogic in the news

What’s that on your TV screen? Why, it’s MarkLogic, again.

Why President Obama Picked the Bay Area

And it’s true, we’re hiring big time. Maybe your resume should be in that pile… -m

Wednesday, June 9th, 2010

“Google syntax” for semantic queries?

Thought experiment: are there any commonly-expressed semantic queries–the kind of queries you’d run over a triple store, or perhaps a SearchMonkey-annotated web site–expressible in common type-in-a-searchbox query grammar?

As a refresher, here’s some things that Google and other search engines can handle. The square brackets represent the search box into which the queries are typed, not part of the queries themselves.

[term]

[term -butnotthis]

[term1 OR term2]

["phrase term"]

[tem1 OR term2 -"but not this" site:dubinko.info filetype:html]

So what kind of semantic queries would be usefully expressed in a similar way, avoiding SPARQL and the like? For example, maybe [by:"Micah Dubinko"] could map to a document containing a triple like <this document> <dc:author> “Micah Dubinko”. What other kinds of graph queries are interesting, common, and simple to express like this? Comments welcome.

-m

Sunday, May 30th, 2010

Balisage contest: solving the wikiml problem

I wish I could say I had something to do with the planning of this: part of Balisage 2010 is a contest to “encourage markup experts to review and to research the current state of wiki markup languages and to generate a proposal that serves to de-babelize the current state of affairs for the long haul.”  To enter, you must propose a set of concrete steps (organizational, social, and/or technological) that will enable wiki content interchange, a real WYSIWYG editor, and/or wiki syntax standardization.

This pushes all of my buttons. It’s got structured documents, Web, parser geekery, writing, engineering, and standards. There’s a bunch of open source prior art, including PyXMLWiki, which I adapted from some fantastic earlier work from Rick Jelliffe.

Sadly, MarkLogic employees aren’t eligible to enter. Get your write-up done by July 15 and sent to balisage-2010-contest at marklogic dot com. The winner will be announced at Balisage and will take home some serious prize winnings, and also will be strongly encouraged (but not required) to give a brief summary (~10 minutes) of their winning entry.

Can’t wait to see what comes out of this. -m

Tuesday, May 11th, 2010

XProc is ready

Brief note: The W3C XProc specification, edited by my partner-in-crime Norm Walsh, has advanced to Recommendation status. Now go use it. -m

Thursday, April 29th, 2010

DMC = developer.marklogic.com

The new MarkLogic developer site is up, cleaner, better organized, and more social. Even cooler, it’s an XSLT-heavy application running on a pre-release version of MarkLogic. The new blog gives some of the details of the new site and transition.

So, if you’re already a MarkLogic developer, this is a great resource. And if you’re not, the site itself shows how fast and simple it is to put together a XSLT and XQuery-powered app. -m

Friday, April 2nd, 2010

Recalibrating expectations of XML performance

Working at MarkLogic has forced me to recalibrate my expectations around XML-related performance issues. Not to brag or anything, but it’s screaming fast. Conventional wisdom of avoiding // in paths doesn’t apply, since that’s the sort of thing the indexes are made to do, and that’s just the start. Single milliseconds are now a noteworthy amount of time for something showing up in the profiler.

This is what XML was supposed to be like. Now that XML has fallen off the hype cycle, we’re getting some serious work done. -m

Monday, February 22nd, 2010

Mark Logic User Conference 2010

Are you coming? Link. It starts on May 4 (Star Wars day!) at the InterContinental Hotel in San Francisco. Guest speakers include Chris Anderson, Editor-in-Chief of Wired and Michelle Manafy, Editor-in-Chief of EContent magazine.

Early bird registration ends Feb 28. -m

Friday, January 15th, 2010

Economic indicators: recruiting picking up again

I got a personal email pitch from recruiters at both Facebook and Google, oddly enough both messages within a 3-minute window on a Monday morning. Hiring is on the uptick again, it seems. My team is still looking for the right front end engineer–someone who knows the JavaScript language in depth, how to use semantic HTML and CSS, AND all about browser quirks. Email me. -m

Monday, December 21st, 2009

Failure as the secret to success

Excellent article in Wired, perhaps a good explanation of my career. :-)

Dunbar observed that the skeptical (and sometimes heated) questions asked during a group session frequently triggered breakthroughs, as the scientists were forced to reconsider data they’d previously ignored.

Which sounds like a fairly typical spec review at Mark Logic. Hint: we’re hiring–email me.

-m

Friday, December 18th, 2009

Mark Logic Careers

Check out the updated careers page, including a quote from YT. If you’re looking for an amazing place to work, get in touch with me. In particular I’m looking for top-notch JavaScript/FE/UI people. -m

Sunday, November 29th, 2009

The Model Endpoint Template (MET) organizational pattern for XRX apps

One of the lead bullets describing why XForms is cool always mentions that it is based on a Model View Controller framework. When building a full XRX app, though, MVC might not be the best choice to organize things overall. Why not?

Consider a typical XRX app, like MarkLogic Application Builder. (You can download a your copy of MarkLogic, including Application Builder, under the community license at the developer site.) For each page, the cycle goes like this:

  1. The browser requests a particular page, say the one that lets you configure sorting options in the app you’re building
  2. The page loads, including client-side XForms via JavaScript
  3. XForms requests the project state as XML from a designated endpoint; this becomes the XForms Instance Data
  4. Stuff happens on the page that changes the client-side state
  5. Just before leaving the page, XML representing the updated state is HTTP PUT back to the endpoint

The benefit of this approach is that you are dealing with XML all the way through, no impedance mismatches like you might find on an app that awkwardly transitions from (say) relational data to Java objects to urlencoded name/value pairs embedded in HTML syntax.

So why not do this in straight MVC? Honestly, MVC isn’t a bad choice, but it can get unwieldy. If an endpoint consists of a separate model+view+controller files, and each individual page consists of separate model+view+controller files, it adds up to a lot of stuff to keep track of. In truly huge apps, this much attention to organization might be worth it, but most apps aren’t that big. Thus the MET pattern.

Model: It still makes sense to keep the code that deals with particular models (closely aligned with Schemas) as a separate thing. All of Application Builder, for example, has only one model.

Endpoint: The job of an endpoint is to GET and PUT (and possibly POST and DELETE) XML, or other equivalent resource bundles depending on how many media types you want to deal with. It combines an aspect of controllers by being activated by a particular URL and views by providing the data in a consistent format.

Template: Since XForms documents already contain MVC mechanics, it not a high-payoff situation to further use MVC to construct the XForms and XHTML wrapper themselves. The important stuff happens within XForms, and then you need various templating mechanisms for example to provide consistent headers, footers, and other pieces across multiple pages. For this, an ordinary templating mechanism suffices. I can imagine dynamic assembly scenarios where this wouldn’t be the case, but again, many apps don’t need this kind of flexibility, and the complexity that comes along with it.

What about separation of concerns? Oh yeah, what about it? :-) Technically both Endpoints and Templates violate classical SOC. In an XRX app, this typically doesn’t lead to the kinds of spaghetti situations that it might otherwise. Endpoints are self contained, and can focus on doing just one thing well; with limited scope comes limited ability to get into trouble. For those times when you need to dig into the XQuery code of an endpoint, it’s actually helpful to see both the controller and view pieces laid out in one file.

As for Templates, simplicity wins. With the specifics of models and endpoints peeled away, the remaining challenge in developing individual pages is getting the XForms right, and again, it’s helpful to minimize the numbers of files one XForms page are split across. YAGNI applies to what’s left, at least in the stuff I’ve built.

So, I’ve been careful in the title to call this an “organizational pattern”, not a “design pattern” or an (ugh) “architectural pattern”. Nothing too profound here. I’d be happy to start seeing XRX apps laid out with directory names like “models”, “endpoints”, and “templates”.

What do you think? Comments welcome.

-m

Wednesday, October 21st, 2009

Application Builder behind-the-scenes

I’ll be speaking next Tuesday (Oct 27) at the Northern Virginia MarkLogic User Group (NOVAMUG). Here’s what I’ll be talking about.

Application Builder consists of two main parts: Search API to enable Google-style search string processing, and the actual UI wizard that steps users through building a complete search app. It uses a number of technologies that have not (at least not up until now!) been widely associated with MarkLogic. Why some technologies that seem like a perfect fit for XML apps are less used in the Mark Logic ecosystem is anyone’s guess, but one thing App Builder can contribute to the environment is some fresh DNA. Maybe your apps can benefit from these as well.

XForms and XRX. Clicking through the screens of App Builder is really a fancy way of editing XML. Upon first arriving on a page, the client makes a GET request to an “Application XML Endpoint” (axe.xqy) to get the current state of the project, which is rendered in the user interface. Interacting with the page edits the in-memory XML. Afterwards, the updated state is PUT back to the same endpoint upon clicking ‘Save’ or prior to navigating away. This is a classic XRX architecture. MarkLogic ships with a copy of the XSLTForms engine, which makes use of client-side XSLT to transform XForms Markup into divs, spans, classes, and JavaScript that can be processed entirely in the browser. Thus XForms works on all supported browsers all the way back to IE6. The apps built by the App Builder don’t use any XForms (yet!) but as App Builder itself demonstrates, it is a great platform for application development.

To be honest, many XForms apps have fallen short on the polished UI department. Not so with App Builder, IMHO. An early, and in hindsight somewhat misdirected, thrust of XForms advocacy pushed the angle of building apps with zero script needed. But one advantage of using a JavaScript implementation of XForms is that it frees you to use script as needed. So in many places, large amounts of UI, all mapped to XML, are able to be hidden away with CSS, and selectively revealed (or mapped to yet other HTML form controls) in small, self-contained overlays triggered via script. While it doesn’t fulfill the unrealistic promise of completely eliminating script, it’s a useful technique, one I predict we’ll see more of in the future.

Relax NG. XML Schema has its roots deep into the XML infrastructure. The type system of XQuery and XSLT 2.0 is based on it. Even XForms has ties to it. But for its wide reach, XML Schema 1.0 has some maddening limitations, and “takes some getting used to” before one can sight read it. In the appendices of many recent W3C specifications use the highly-readable compact syntax to describe content models is a way equally human and machine-readable.

What are these limitations I speak of? XML Schema 1.1 goes a long way toward resolving these, but isn’t yet widely in use. Take this example, the content model of the <options> element from Search API:

start = Options | Response

# Root element
OptionsType = (
 AdditionalQuery? &
 Annotation* &
 ConcurrencyLevel? &
 Constraint* &
 Debug? &
 DefaultSuggestionSource? &
 Forest* &
 Grammar? &
 Operator* &
 PageLength? &
 QualityWeight? &
 ReturnConstraints? &
 ReturnFacets? &
 ReturnMetrics? &
 ReturnQtext? &
 ReturnQuery? &
 ReturnResults? &
 ReturnSimilar? &
 SearchOption* &
 SearchableExpression? &
 SortOrder* &
 SuggestionSource* &
 Term? &
 TransformResults?
)

The start line indicates that, within this namespace, there are two possible root elements, either <options> or <response> (not shown here). An instance with a root of, say search:annotation is by definition not valid. Try representing that in XML Schema.

The definition of OptionsType allows a wide variety of child elements, some zeroOrMore times, other optional (zero or one occurrence), with no ordering restrictions at all between anything. XML Schema can’t represent this either. James Clark’s trang tool converts Relax NG into XML Schema, and has to approximate this as an xsd:choice with maxOccurs=”unbounded”, thus the elements that can only occur once are not schema-enforced. Thus the Relax NG description of the content model, besides being more readable, actually contains more information than the closest XML Schema. So particularly for XML vocabularies that are optimized for human use, Relax NG is a good choice for schema development.

Out of line validation. So if XML Schema doesn’t fully describe the <options> node, how can authors be sure they have constructed one correctly? Take a step back: even if XML Schema could fully represent the content model, for performance reasons you wouldn’t want to repeatedly validate the node on every query. The options node tends to change infrequently, mainly during a development cycle. Both of these problems can be solved with out-of-line validation: a separate function call search:check-options().

Inside this function you’ll find a validate expression that will make as much use of the Schema as it can, but also much more. The full power of XQuery can be leveraged against the proposed <options> node to check for errors or inconsistencies, and provide helpful feedback to the developer. Since it happens out-of-line, these checks can take substantially longer than actually handing the query based on them. The code can go as in-depth as it needs to without performance worries. This is a useful technique in many situations. One potential shortfall is that people might forget to call your validation function, but in practice this hasn’t been too much trouble.

Higher-order functions. The predecessor to Search API had a problem that it was so popular that users would modify it to suit their unique requirements, which lead to dozens of minor variations floating around in the wild. Different users have different needs and expectations for the library, and making a one-size-fits-all solution is perhaps not possible. One way to relieve this kind of feature pressure is to provide enough extension hotspots to allow all the kinds of tweaks that users will want, preserving the mainline code. This involves prediction, which is difficult (especially about the future). But a good design makes this possible.

Look inside the built-app and you will find a number of function pointers, implemented as a new datatype xdmp:function. XQuery 1.1 will have a more robust mechanism for this, but it might be a while before this is widespread. By modifying one file, changing a pointer to different code, nearly every aspect of the application can be adjusted.

Similarly, a few hotspots in the Search API can be customized, to hook in different kinds of parsers or snippet-generators. This powerful technique can take your own apps to the next level.

-m

Monday, October 12th, 2009

Speaking at Northern Virginia Mark Logic User Group Oct 27

Come learn more about Mark Logic and get a behind-the-scenes look at the new Application Builder. I’ll be speaking at the NOVA MUG (Northern Virginia Mark Logic User Group) on October 27. This turns out to be pretty close to the big Semantic Web conference, so I’ll stick my head in there too. Stop by and look me up!

Details at the developer site.

-m

Saturday, July 11th, 2009

The decline of the DBMS era

Several folks have been pointing to this article which has some choice quotes along the lines of

If we examine the nontrivial-sized DBMS markets, it turns out that current relational DBMSs can be beaten by approximately a factor of 50 in most any market I can think of.

My employer is specifically mentioned:

Even in XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors

And it’s true, but don’t take my word for it. :-) The DBMS world has lots of inertia, but don’t let that blind you to seeing another way to solve problems. Particularly if that extra 50x matters. -m

Tuesday, July 7th, 2009

Demo Jam at Balisage 2009

Come join me at the Demo Jam at Balisage this year. August 11 at 6:30 pm. There will be lots of cool demos, judged by audience participation. I’d love to see you there. -m

Thursday, June 25th, 2009

MarkLogic Server 4.1, App Services released

I’m thrilled to announce MarkLogic 4.1 and with it my project App Services, is here. Top-of-the-post props go out to Colleen, David, and Ryan who made it happen.

You might already know that MarkLogic Server is a super-powerful database slash search engine powering projects like MarkMail. (But did you know there’s a free-as-in-beer edition?) The next step is to make it easier to use and build your own apps on top of the server.

The first big piece is the Search API, which lets you do “Google-style” searches over your content like this:

search:search(“MP3 OR iPod AND color:black -Zune”)

The built-in grammar includes AND, OR, parens for grouping, – for negation, quotations for phrases, and easy ways to define facets like date:today or author:”Bill Shakespeare” or GPA:3.95. By passing in additional options, you can redefine the grammar and control all aspects of the search and how the results are returned. Numerous grass-roots efforts at doing someting like this had begun to spring up, so the time was right to come out with an officially-sanctioned API. For those developers who haven’t seen the light yet and don’t fancy XQuery, an API like this is a huge benefit.

The next piece builds on the Search API to offer a graphical App Builder tool that produces a simplified MarkMail-type app around your content. It looks like this:

App Builder screen shot, Search page

The App Builder itself is based on XForms via the excellent XSLTForms library and REST, making it a full-blown XRX application.

Lots more info, videos, screencasts, articles, and more are coming soon.

You can start playing with this now by visiting the download page. Under the Community License, you can put 10 gigs of content into it for noncommercial production free-as-in-beer.

Enjoy! I’ll be catching my breath for the next two months*. -m

* Not really

Wednesday, June 3rd, 2009

See you at Balisage

Balisage, formerly Extreme Markup, is the kind of conference I’ve always wanted to attend.

Historically my employers have been not quite enough involved in the deep kinds of topics at this conference (or too cash-strapped, but let’s not go there) to justify spending a week on the road. So I’m glad that’s no longer the case: Mark Logic is sponsoring the conference this year. I’m looking forward to the show, and since I’m not speaking, I might be able to relax a little and soak in some of the knowledge.

See you there! -m

Thursday, May 21st, 2009

One year at Mark Logic

Another anniversary this week, one year at Mark Logic. Much of it in stealth mode, but more details of what I’ve been up to are forthcoming. -m

Friday, April 24th, 2009

EXPath.org

I’ve always thought that the EXSLT model of developing community specifications worked well. Now a critical mass of folks has come together on a similar effort, aimed at providing extensions usable in XPath 2.0, XSLT 2.0, XQuery, and other XPath-based languages like XProc. Maybe even XForms.

Check it out, subscribe to the mailing list, and help out if you can. -m

Sunday, March 8th, 2009

Wolfram Alpha

The remarkable (and prolific) Stephen Wolfram has an idea called Wolfram Alpha. People used to assume the “Star Trek” model of computers:

that one would be able to ask a computer any factual question, and have it compute the answer.

Which has proved to be quite distant from reality. Instead

But armed with Mathematica and NKS [A New Kind of Science] I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable.

It’s not easy to do this. Every different kind of method and model—and data—has its own special features and character. But with a mixture of Mathematica and NKS automation, and a lot of human experts, I’m happy to say that we’ve gotten a very long way.

I’m still a SearchMonkey guy at heart, so I wonder how much Wofram’s team is familiar with existing Semantic Web research and practice–because at a high level this seems very much like RDF with suitable queries thereupon. If that’s a good characterization, that’s A Good Thing, since practical application has been one of SemWeb’s weak spots.

-m

Monday, February 16th, 2009

Crane Softwrights adds XQuery training

From the company home page, reknown XSLT trainer and friend G. Ken Holman has expanded his offerings to include XQuery training. The first such session is March 16-20, alongside XML Prague.

I’ve always thought there is great power in having both XSLT and XQuery tools at one’s disposal. I’ve seen people tend to polarize into one camp or the other, but in truth there is a lot of common ground, as well as cases where the right technology makes for a much more elegant solution. So learning both is easier than it seems, and more useful than it seems.

If you will be around the conference, take a look at the syllabus. I’m curious to see others’ reactions toward the combined XSLT + XQuery toolset. -m

Tuesday, January 27th, 2009

Call for speakers: MarkLogic user conference

This year’s Mark Logic User Conference is May 12-14, in beautiful San Francisco. Attend the conference at no charge as a speaker! Submit a proposal for a breakout session on business applications, technical implementation, or best practices. Deadline is February 13th. Thanks! -m

Monday, January 26th, 2009

MarkMail 2.0 launches

If you’ve seen MarkMail before, you may be pleased to know that a new version launched last week, including new features (like saved search sets) for power users. If you haven’t seen MarkMail before, what are you waiting for? -m

P.S. If you could use something like this behind your firewall, ping me.

Monday, January 12th, 2009

Conferencing

Busy week ahead. Minimal posting. -m

Tuesday, December 30th, 2008

RDFa parser in XQuery now open source

After a delay, the code to my RDFa parser in XQuery is now available under an Apache license. Go get it. This is some of the earliest XQuery code I ever wrote, so go easy on me. It follows the earlier work on a functional definition of RDFa. And feel free to send in patches. -m

Monday, December 8th, 2008

Overheard and overseen

Overheard at XML 2008: “Wow, it’s a good thing Mark Logic sponosred, otherwise nobody would be here.” (there were only five tables in the expo area.)

Overseen on the XML 2008 schedule: only one mention of XQuery, and that’s in relation to eXist, not the aforementioned sponsor.

This conference does have a different feel to it. Is XML at the ASCII-tipping-point, where it becomes so obvious that conferences aren’t needed? -m

Friday, November 28th, 2008

Fun with xdmp:value()

Lately I’ve been playing with some more advanced XQuery. One thing nearly every XQuery engine supports is some kind of eval() function. MarkLogic has several, but my favorite is xdmp:eval. It’s lightweight because it reuses the entire calling context, so for instance you can write let $v := 5 return xdmp:value("$v"). Not too useful, but if the expression passed in comes from a variable, it gets interesting.

Now, quite a few standards based on XPath depend on the context node being set to some particular node. This turns out to be easy too, using the path operator: $context/xdmp:value($expr). According to the definition of the XPath path operator, the expression to the right is evaluated with the results of the expression on the left setting the context node.

OK, how about setting the context size and position? More difficult, but one could use a sequence on the left-hand side of the path operator, with the desired $context node in somewhere in the middle. Then last() will return the length of the sequence, and position() will return, well, the position of $context in the sequence. But it’s kind of hacky to manufacture a bunch of temporary nodes, only to throw them away in the next step of the path.

I’m curious if anyone else has done something similar. Comments? -m

Tuesday, November 25th, 2008

MarkLogic 4.0 review

Kurt Cagle has a thorough review of MarkLogic 4.0, worth a read itself. But check out the comments: one poster says he interviewed with the company and didn’t get reimbursed. The MarkLogic CEO responds personally with an offer to make it right. Why can’t more companies be like this? -m