Archive for the 'languages' Category

Thursday, September 2nd, 2010

Is XForms really MVC?

This epic posting on MVC helped me better understand the pattern, and all the variants that have flowed outward from the original design. One interesting observation is that the earlier designs used Views primarily as output-only, and Controllers primarily as input-only, and as a consequence the Controller was the one true path for getting data into the Model.

But with browser forms, input and output are tightly intermingled. The View takes care of input and output. Something else has primary responsibility for mediating the data flow to and from the model–and that something has been called a Presenter. This yields the MVP pattern.

The terminology gets confusing quickly, but roughly

XForms Instance == MVP Model

XForms Model == MVP Presenter

XForms User Interface == MVP View

It’s not wrong to associate XForms with MVC–the term has become so blurry that it’s easy to lump variants like MVP into the same bucket. But to the extent that it makes sense to talk about more specific patterns, maybe we should be calling the XForms design pattern MVP instead of MVC. Comments? Criticism? Fire away below. -m

Sunday, May 30th, 2010

Balisage contest: solving the wikiml problem

I wish I could say I had something to do with the planning of this: part of Balisage 2010 is a contest to “encourage markup experts to review and to research the current state of wiki markup languages and to generate a proposal that serves to de-babelize the current state of affairs for the long haul.”  To enter, you must propose a set of concrete steps (organizational, social, and/or technological) that will enable wiki content interchange, a real WYSIWYG editor, and/or wiki syntax standardization.

This pushes all of my buttons. It’s got structured documents, Web, parser geekery, writing, engineering, and standards. There’s a bunch of open source prior art, including PyXMLWiki, which I adapted from some fantastic earlier work from Rick Jelliffe.

Sadly, MarkLogic employees aren’t eligible to enter. Get your write-up done by July 15 and sent to balisage-2010-contest at marklogic dot com. The winner will be announced at Balisage and will take home some serious prize winnings, and also will be strongly encouraged (but not required) to give a brief summary (~10 minutes) of their winning entry.

Can’t wait to see what comes out of this. -m

Friday, May 15th, 2009

A nugget from _A Canticle for Leibowitz_

This brilliant bit is almost a throwaway paragraph on page 304, near the end.

[Two men in a satirical dialog] managed only to demonstrate that the mathematical limit of an infinite sequence of “doubting the certainty with which something doubted is known to be unknowable  when the ’something doubted’ is still a preceding statement ‘unknowability’ of something doubted,” that the limit of this process at infinity can only be equivalent to a statement of absolute certainty, even though phrased ans an infinite series of negations of certainty.

It’s not like the whole book is like this…far from it. But it is chock full of little gems.

-m

Tuesday, May 12th, 2009

Google Rich Snippets powered by RDFa

The new feature called rich snippets shows that SearchMonkey has caught the eye of the 800 pound gorilla. Many of the same microformats and RDF vocabularies are supported. It seems increasingly inevitable that RDFa will catch on, no matter what the HTML5 group thinks. -m

Wednesday, March 11th, 2009

Geek Thoughts: omito

Omito:

(Spanish) First-person singular (yo) present indicative form of omitir (to omit).

(Proto-English) Shortened word form of an error of omission, e.g. in written.

More collected Geek Thoughts at http://geekthoughts.info.

Sunday, March 8th, 2009

Wolfram Alpha

The remarkable (and prolific) Stephen Wolfram has an idea called Wolfram Alpha. People used to assume the “Star Trek” model of computers:

that one would be able to ask a computer any factual question, and have it compute the answer.

Which has proved to be quite distant from reality. Instead

But armed with Mathematica and NKS [A New Kind of Science] I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable.

It’s not easy to do this. Every different kind of method and model—and data—has its own special features and character. But with a mixture of Mathematica and NKS automation, and a lot of human experts, I’m happy to say that we’ve gotten a very long way.

I’m still a SearchMonkey guy at heart, so I wonder how much Wofram’s team is familiar with existing Semantic Web research and practice–because at a high level this seems very much like RDF with suitable queries thereupon. If that’s a good characterization, that’s A Good Thing, since practical application has been one of SemWeb’s weak spots.

-m

Monday, February 16th, 2009

Crane Softwrights adds XQuery training

From the company home page, reknown XSLT trainer and friend G. Ken Holman has expanded his offerings to include XQuery training. The first such session is March 16-20, alongside XML Prague.

I’ve always thought there is great power in having both XSLT and XQuery tools at one’s disposal. I’ve seen people tend to polarize into one camp or the other, but in truth there is a lot of common ground, as well as cases where the right technology makes for a much more elegant solution. So learning both is easier than it seems, and more useful than it seems.

If you will be around the conference, take a look at the syllabus. I’m curious to see others’ reactions toward the combined XSLT + XQuery toolset. -m

Wednesday, January 7th, 2009

On porting WebPath to Python 3k

I’ve started looking into porting the WebPath code (and eventually XForms Validator) over to Python 3. The first step is external libraries, of which there is only one. WebPath uses the lex.py module from PLY. I had got it into my head that Python 2.x and 3.x were thoroughly incompatible, but leave it to the remarkable David Beazley to blow that assumption out of the water: the latest version of lex.py from SVN works in both 2.x and 3.x.

From there the included 2to3 tool was easy enough to run. (Relatively more difficult was getting 2.6 and 3.0 versions of Python frameworks installed on Mac, but even that wasn’t too bad.) The tool made some moderate changes, and I can run the unit tests, and a few even pass!

The primary remaining problem stems from code where the documentation is a little unclear, and my inexperience is severe. The part of the code in platonicweb.py that reads nasty, grotty HTML via Tidy and produces a clean DOM throws an exception every time. Seems to be a mismatch between String and Byte (encoded string) types, but manifested as a failed XML parse. Sans exception handling, the code looks like:

    page = urllib.request.urlopen(fullurl)
    markup = page.read()
    dom = xml.dom.minidom.parseString(markup)

urlopen() returns a file-like object, but the docs didn’t seem clear on whether it’s like a file opened in byte or string mode. In any case, I’m almost certainly doing it wrong. Suggestions?

-m

Tuesday, December 9th, 2008

XML 2008 liveblog: Introduction to Schematron

Wendell Piez, Mulberry Technologies

Assertion-based schema language. A way to test XML documents. Rule-based validation language. Cool report generator. Good for capturing edge cases.

Same architecture as XSLT. (Schematron specifies, does not perform)

<schema xmlns="http://purl.cclc.org/dsdl/schematron">
  <title>Check sections 12/07</title>
  <pattern id="section-check">
    <rule context="section">
      <assert test="title">This section has no title</assert>
      <report test="p">This section has paragraphs</report>
      ...

Demo. OxygenXML has support. Assert vs. Report – essentially opposites. Assert means “tell me if this if false”. Report means “tell me if this is true”.

“Almost as if Schematron is a harness for XPath testing.”

More examples:

<rule context="note">
  <report test="ancestor::note">A note appears in a note. OK?</report>
</rule>

Binding: Default is XSLT 1, but flexible enough to allow other query langauges via attribute @queryBinding at the top. Many processors allow mix-and-match between XSLT and Schematron. Examples showing just that.

Some tests can be very useful:

test=”every $line in tokenize(., $newline) satisfies string-length($line) le 72″

Q: What if the destination is not a human, but another part of a pipeline? Varies by implementation, but SVRL is standardized as an annex in the ISO spec, part of DSDL.

Use as little or as much as you want, at different times in the document lifecycle. “Schematron is a feather duster that reaches areas other schema languages cannot.” – Rick Jelliffe

As time permits section of the talk:

Other top-level elements: title, pattern, ns, let, p, include, phase, diagnostics.

-m

Friday, November 28th, 2008

Fun with xdmp:value()

Lately I’ve been playing with some more advanced XQuery. One thing nearly every XQuery engine supports is some kind of eval() function. MarkLogic has several, but my favorite is xdmp:eval. It’s lightweight because it reuses the entire calling context, so for instance you can write let $v := 5 return xdmp:value("$v"). Not too useful, but if the expression passed in comes from a variable, it gets interesting.

Now, quite a few standards based on XPath depend on the context node being set to some particular node. This turns out to be easy too, using the path operator: $context/xdmp:value($expr). According to the definition of the XPath path operator, the expression to the right is evaluated with the results of the expression on the left setting the context node.

OK, how about setting the context size and position? More difficult, but one could use a sequence on the left-hand side of the path operator, with the desired $context node in somewhere in the middle. Then last() will return the length of the sequence, and position() will return, well, the position of $context in the sequence. But it’s kind of hacky to manufacture a bunch of temporary nodes, only to throw them away in the next step of the path.

I’m curious if anyone else has done something similar. Comments? -m

Friday, October 24th, 2008

Online etymology database

I’ve been playing lately with this site, and it’s a fantastic resource. The word carboy probably comes from Persian qarabah “large flagon.” Who knew? -m

Wednesday, September 17th, 2008

The case for native higher-order functions in XQuery

The XQuery Working Group is debating the need for higher-order functions in the language. I’m working on honing my description of why this is an important feature. Does this work? What would work better?

Imagine you are writing a smallish widget app, in an environment without a standard library. When you need to sort your widgets, you’d write a simple function with a signature like sort(sequence-of-widgets). That’s great.

Now imagine you find your app to be steadily growing. An accumulation of smaller one-off solutions won’t work anymore, you need a general solution. What you’ll end up with is something like qsort in C, which takes a pointer to a comparator function. By providing different comparators, you can sort anything any way you like, all through only a single sort function. C and C++ have something like this, as do PHP, Python, Java, JavaScript, and even assembly language. XSLT has it, as proven by Dimitre.

XQuery doesn’t. It should, because people are now using it for more than short queries. People are writing programs in it. -m

P. S. Comment please.

Wednesday, September 10th, 2008

Geek Thoughts: English is funny, part 1

Has there ever been a case of mitigated gall?

More collected Geek Thoughts at http://geekthoughts.info/.

Friday, August 8th, 2008

It would be awesome if somebody…

It would be awesome of someone made a site that catalogued all the common mis-encodings. Even in 2008, I see these things all over the web–mangled quotation marks, apostrophes, em-dashes. I’d love to see a pictoral guide.

curly apostrophe looks like ?’ – original encoding=_________ mislabeled as __________ .

That sort of thing. Surely somebody has done this arleady, right? -m

Wednesday, May 28th, 2008

XForms Validator on Google App Engine?

I registered ‘xfv’ on Google App Engine. Too bad there doesn’t appear to be any significant XML libraries supported. I have XPath covered by my pure-python WebPath, but what about Relax NG? Anyone know of anything in pure python? -m

Tuesday, May 20th, 2008

The two-line CV

In my about page, I’ve written my CV in two lines. Why don’t you try it, then link back to here?

I’ve been known to use this as an interview question, and it’s quite a bit harder than it looks. A clever candidate will turn the paper sideways giving themselves more room to write “two lines”, but that’s not the point. This exercise forces one to really think about their qualifications, skills, and experience; one’s “unique selling proposition”.

Writing short, as opposed to rambling on, is notoriously difficult. Someone who can do that with their own CV is off to a good start in my book. -m

P. S. Mark Logic is looking for some high-caliber XML and web folks. Contact me offline if you know anyone looking…

Thursday, May 8th, 2008

14 ways…

When making hash browns xkcd style, there are at least 14 ways it could go badly.

  1. That’s not a potato, it’s a misshapen rock.
  2. Unexpectedly flammable tennis racket.
  3. Sparks landing on gas can.
  4. Food poisoning via undercooked hash browns due to limited flame contact time.
  5. Broken plate fragments.
  6. Dripping, flaming gasoline.
  7. Swing and a miss; balance lost.
  8. Flaming potato fragments in the eye socket.
  9. Diving catch ends badly.
  10. Spontaneous combustion.
  11. Tennis elbow.
  12. Repetitive stress injury.
  13. Fork misfire.
  14. Heat death of the universe.

(17 if that fork is a dangerous crossbreed) -m

Monday, March 10th, 2008

Dear readers…

You are awesome. Just sayin’. -m

Friday, November 30th, 2007

4 things I’ve learned writing (mostly) 4 novels

If you want to get anything done, give it to a busy person…

In my life, I’ve started four novels, completed my goals on three, gotten to “The End” on two, and completely flamed out on one.

The first was in 2001. I hadn’t written much since high school. Something clicked in my head that made me realize that writing wasn’t some kind of black art (as one particular teacher had drilled into his credulous students). It was doable. You take pencil and paper and write one word after another. Voilà. I was so taken with this simple idea that every single thing I ever learned about writing went out the window. I had Swifties, danglers, tell-vs-show, you name it. There’s enough material in there for several Bulwer-Lytton contests. By the time I had 70 hand-written pages, the thing collapsed under it’s own weight and the story reached an abrupt, borderline-surrealistic “ending” to abuse the term. I have evidence that I even typed it all in and pressed on for a 2nd draft.

By 2003 my non-fiction book was published–my writing career was under way! Part of the elaborate book proposal dance involved me writing some online articles, including one piece of fiction that was well-received in the tiny circle that was its intended audience. At this stage I adopted electronic writing, and ditched my crashy Windows laptop for a Mac, a vast improvement.

In 2005 I discovered NaNoWriMo, and though I thought it would be a lost cause, I signed up. No way it could be as bad as the previous attempt. I had a new job, and was able to skip a few lunches to write, not to mention intense evenings and weekends. The end goal is 50,000 words during the 30 days of November, that’s 1,666 and two-thirds words per day. All of the prior month I spent outlining, making maps, creating my universe. I used the simplest of tools, my text editor and one file per chapter. I learned that the command wc *.txt could easily give me a combined word count. To my surprise, it worked. I reemerged into daylight with a completed a full story arc loosely based on the earlier story, and ended up with just over 50,000 words. The text itself was very rough, but I read the whole thing out loud in a podcast to edit it. In terms of improvement, it was huge, but still far from publishable.

2006 and another NaNoWriMo rolled around, and I took off on a more ambitious storyline with far fewer notes going into it. The story itself involved the same general characters of the previous two episodes, but with a deeper, more mature feeling to it. In short, I finally wrote a piece of fiction to be proud about afterwards, though when I hit 50,000 words I felt really burned out; hit “save” and left the story arc unfinished.

The pull to dig in to an intensive 2nd draft of the story was immense, but just too many things were going on, including a new arrival in the family and a new set of job responsibilities. I never got more than a few dozen pages into the rewrite. When NaNoWriMo 2007 came upon me, I had a tough choice…do I write something fresh, or try to rework the previous novel? Fresh. A completely new story line, new characters, new setting, new everything. As of a few days ago, I finished the draft, compressing parts of the story as needed to meet both the 50 kiloword goal and the complete story arc. In preparation, I read a number of books, but as far as written outlines, maps, etc. go, almost nothing happened before November 1. I saved enough of the “fun stuff” that a second revision of this story will be a joy. Overall, another improvement year-over-year.

There’s only one kink to the “if you want to get something done…” idea: my slides for the XML Conference talk I have in a few days are still unfinished… -m

Monday, November 19th, 2007

Kindle my disappointment

Where’s Project Gutenberg? One difficulty in launching an ebook platform is the lack of available titles. I keep hearing about 80,000+ titles, but expressed as a percentage of Amazon’s book catalog, it’s minuscule. There should be all kind of public domain titles ready to go on day one. And where’s the Creative Commons books?
There’s some public domain books to be found, but none are free. Take, for example, A Connecticut Yankee in King Arthur’s Court, a book (in paper form) sitting just out of arm’s reach as I write this, waiting to be read. If I had it on a device, particularly one with a good screen, I’d be more inclined to keep it, and dozens others, on hand in my backback and be ready to read at a moment’s notice. But no.

The problem is the the “we take care of the wireless delivery” part, called Whispernet(tm). It’s not really free, nor bundled in the service price. It’s bundled in to the cost of every media access. Is it fair to pay $9.99 for a New York Times bestseller? Sure. But it sucks to pay $1 for an A-list blog that’s free everywhere else, or to get literally nickeled and dimed for the privelege of “converting” and delivering your own content to your own device.

By the way, who gets the money paid for accessing, say, a CreativeCommons non-commercial licensed blog via the Kindle? Somebody should look into that.

I applaud Amazon for pushing to innovate in a space that badly needs it, but the financial model behind the wireless access encourages the wrong kind of things. Exceptions, like unlimited Wikipedia access (be still my heart!) still need to be hand approved by the gatekeeper. Information wants to be free, it doesn’t want to be a service, though that’s hard to see when the dollar signs get in your eyes.

Many folks are comparing this to the original iPod launch–remember, the huge klunky one with a tiny capacity, black and white screen, and a mechanical click-wheel? There’s some strong points of similarity, but stronger differences. For one, anyone with an iPod can easily rip their existing CDs, not to mention obtain MP3s from other methods (so I hear). There’s nothing like that yet for books.
Where’s the documentation for the new, proprietary ebook format? I don’t care about the DRM crap. I care about being able to create new content, or repackage existing content for which I have the rights, and for that, I’m having trouble coming up with a rationale for an entire new format. I would love to do some cool things with this platform. Perhaps I will some day, though my enthusiasm is somewhat lessened by the difficulties I would face getting anything cool onto the devices. -m

Sunday, November 18th, 2007

Gettysburg Address PowerPoint

As one who, in the all-too-near future, will be hammering out the visuals to go with my talk at XML 2007, this made my day. (be sure to check out the deeper pages too) -m

Saturday, November 10th, 2007

RDFa question

What is the difference between placing instanceof=”prefix:val” vs. rel=”prefix:val” on something? How do I decide between the two?

In the example of hEvent data, why is it better/more accurate to use instanceof=”cal:Vevent” instead of a blank node via rel=”cal:Vevent”?

-m

Monday, October 22nd, 2007

Is there fertile ground between RDFa and GRDDL?

The more I look at RDFa, the more I like it. But still it doesn’t help with the pain-point of namespaces, specifically of unmemorable URLs all over the place and qnames (or CURIEs) in content.

Does GRDDL offer a way out? Could, for instance, the namespace name for Dublin Core metadata be assigned to the prefix “dc:” in an external file, linked via transformation to the document in question? Then it would be simpler, from a producer or consumer viewpoint, to simply use names like “dc:title” with no problems or ambiguity.

This could be especially useful not that discussions are reopening around XML in HTML.

As usual, comments welcome. -m

Monday, October 1st, 2007

simple parsing of space-seprated attributes in XPath/XSLT

It’s a common need to parse space-separated attribute values from XPath/XSLT 1.0, usually @class or @rel. One common (but incorrect) technique is simple equality test, as in {@class=”vcard”}. This is wrong, since the value can still match and still have other literal values, like “foo vcard” or “vcard foo” or ” foo vcard bar “.

The proper way is to look at individual tokens in the attribute value. On first glance, this might require a call to EXSLT or some complex tokenization routine, but there’s a simpler way. I first discovered this on the microformats wiki, and only cleaned up the technique a tiny bit.

The solution involves three XPath 1.0 functions, contains(), concat() to join together string fragments, and normalize-space() to strip off leading and trailing spaces and convert any other sequences of whitespace into a single space.

In english, you

  • normalize the class attribute value, then
  • concatenate spaces front and back, then
  • test whether the resulting string contains your searched-for value with spaces concatenated front and back (e.g. ” vcard “

Or {contains(concat(‘ ‘,normalize-space(@class),’ ‘),’ vcard ‘)} A moment’s thought shows that this works well on all the different examples shown above, and is perhaps even less involved than resorting to extension functions that return nodes that require further processing/looping. It would be interesting to compare performance as well…

So next time you need to match class or rel values, give it a shot. Let me know how it works for you, or if you have any further improvements. -m

Sunday, September 16th, 2007

Evaluating fiction vs. evaluating libation

My Copious Free Time(tm) has been filled lately by two different evaluation projects. One is the 2nd Annual Writing Show Best First Chapter of a Novel Contest, for which the first round of judging is just winding up. The main benefit for contest entrants is that every submission gets a professional critique of at least 750 words. But additionally, each submisison gets a score on a 50-point scale, based on:

  • 10 points for Story. Is it a compelling read with a great hook? Are we engaged?
  • 10 points for Style. Is the writing smooth and tight, without awkward constructions, extraneous verbiage, and redundancies?
  • 10 points for Dialog. Is the dialog natural and does it move the story along?
  • 10 points for Character. Are the characters interesting? Do we care about them?
  • 10 points for Mechanics. Are grammar, spelling, and punctuation correct?

I’m also attending some classes aiming toward becoming a Certified Beer Judge (details on Meadblog). This isn’t as fun as it sounds. (Well, OK, maybe it is…). The idea is to build up better sensory perception so that my personal brewing and cooking projects can benefit. But the upcoming test is 70% written essay questions like “Identify three distinctly different top-fermenting beer styles with a starting gravity of 1.070 or higher, and describe the similarities and differences between the styles”. 30% of the test is based on actual tasting and filling out a tasting sheet. Of interest, the scoring here is also based on a 50-point scale:

  • 12 points for Aroma.
  • 3 points for Appearance.
  • 20 points for Taste.
  • 5 points for Mouthfeel.
  • 10 points for Overall Impression.

The interesting part is that there’s similarities between the two tasks. For both, I need to work off of physical paper, not in my head on on a computer screen. For both, I first “skim”, building an overall impression, then dig down into individual categories to assign a score for each one. Then I step back and look at my numbers, and check whether everything makes sense and accurately records my impressions. When I’m satisfied, I add everything up and am done.

Most day-to-day problems aren’t so well structured or normalized, but nonetheless, I find myself tackling all kinds of problems with a similar approach. There you have it. Writing and drinking beer make you a better person. :) -m

Wednesday, August 22nd, 2007

What I’m reading

Yeah, they’re related. -m

Wednesday, August 8th, 2007

New W3C Validator

Go check it out. It even has a Tidy option to clean up the markup. But they missed an important feature: it should include an option to run Tidy on the markup first then validate. This is becoming the defacto bar for web page validity anyway… -m

Monday, July 16th, 2007

Beautiful Code

If it’s been quiet on this front it’s because I’ve been engrossed in my continuing education. Andy Oram sent me a copy of Beautiful Code, a thoroughly enjoyable work from O’Reilly. If you like stretching your brain by reading code-intense essays from top-tier coders, I recommend this volume. In particular, I’m been digging into Douglas Crockford’s Top Down Operator Precedence chapter.

Other than that, some interesting BJCP classes, but I’m keeping that non-tech stuff over on meadblog. -m

Wednesday, January 24th, 2007

Histogram of top 10 words used in the 2007 State of the Union address:

I’ve always had a thing for text analysis.

  • the 352
  • and 250
  • to 225
  • of 188
  • in 118
  • a 108
  • we 100
  • is 76
  • our 75
  • that 72

Source. -m

Tuesday, January 23rd, 2007

On language design…

A semi-random thought that occurred to me.

One marker of a well-designed markup language is that it looks to the future. This doesn’t mean it’s an amorphous blob of abstract indirections mapped to tags. It can (and arguably should) be concrete and solid, but designed in such a way that keeps bigger things in mind.

HTML and XHTML are, I suppose, canonical examples of this, giving birth to microformats and many other uses outside of a browser. -m