Archive for the 'languages' Category
Friday, May 15th, 2009
This brilliant bit is almost a throwaway paragraph on page 304, near the end.
[Two men in a satirical dialog] managed only to demonstrate that the mathematical limit of an infinite sequence of “doubting the certainty with which something doubted is known to be unknowable when the ’something doubted’ is still a preceding statement ‘unknowability’ of something doubted,” that the limit of this process at infinity can only be equivalent to a statement of absolute certainty, even though phrased ans an infinite series of negations of certainty.
It’s not like the whole book is like this…far from it. But it is chock full of little gems.
-m
Permalink
Filed under everythingismiscellaneous, languages, math, metadata, writing
Tuesday, May 12th, 2009
The new feature called rich snippets shows that SearchMonkey has caught the eye of the 800 pound gorilla. Many of the same microformats and RDF vocabularies are supported. It seems increasingly inevitable that RDFa will catch on, no matter what the HTML5 group thinks. -m
Permalink
Filed under commercialism, google, intentional web, languages, metadata, microformats, search, yahoo
Wednesday, March 11th, 2009
Omito:
(Spanish) First-person singular (yo) present indicative form of omitir (to omit).
(Proto-English) Shortened word form of an error of omission, e.g. in written.
More collected Geek Thoughts at http://geekthoughts.info.
Permalink
Filed under geekthoughts, languages
Sunday, March 8th, 2009
The remarkable (and prolific) Stephen Wolfram has an idea called Wolfram Alpha. People used to assume the “Star Trek” model of computers:
that one would be able to ask a computer any factual question, and have it compute the answer.
Which has proved to be quite distant from reality. Instead
But armed with Mathematica and NKS [A New Kind of Science] I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable.
It’s not easy to do this. Every different kind of method and model—and data—has its own special features and character. But with a mixture of Mathematica and NKS automation, and a lot of human experts, I’m happy to say that we’ve gotten a very long way.
I’m still a SearchMonkey guy at heart, so I wonder how much Wofram’s team is familiar with existing Semantic Web research and practice–because at a high level this seems very much like RDF with suitable queries thereupon. If that’s a good characterization, that’s A Good Thing, since practical application has been one of SemWeb’s weak spots.
-m
Permalink
Filed under AI, Mark Logic, aswemaythink, commercialism, intentional web, languages, math, metadata, software, yahoo
Monday, February 16th, 2009
From the company home page, reknown XSLT trainer and friend G. Ken Holman has expanded his offerings to include XQuery training. The first such session is March 16-20, alongside XML Prague.
I’ve always thought there is great power in having both XSLT and XQuery tools at one’s disposal. I’ve seen people tend to polarize into one camp or the other, but in truth there is a lot of common ground, as well as cases where the right technology makes for a much more elegant solution. So learning both is easier than it seems, and more useful than it seems.
If you will be around the conference, take a look at the syllabus. I’m curious to see others’ reactions toward the combined XSLT + XQuery toolset. -m
Permalink
Filed under Mark Logic, XQuery, announcement, languages
Wednesday, January 7th, 2009
I’ve started looking into porting the WebPath code (and eventually XForms Validator) over to Python 3. The first step is external libraries, of which there is only one. WebPath uses the lex.py module from PLY. I had got it into my head that Python 2.x and 3.x were thoroughly incompatible, but leave it to the remarkable David Beazley to blow that assumption out of the water: the latest version of lex.py from SVN works in both 2.x and 3.x.
From there the included 2to3 tool was easy enough to run. (Relatively more difficult was getting 2.6 and 3.0 versions of Python frameworks installed on Mac, but even that wasn’t too bad.) The tool made some moderate changes, and I can run the unit tests, and a few even pass!
The primary remaining problem stems from code where the documentation is a little unclear, and my inexperience is severe. The part of the code in platonicweb.py that reads nasty, grotty HTML via Tidy and produces a clean DOM throws an exception every time. Seems to be a mismatch between String and Byte (encoded string) types, but manifested as a failed XML parse. Sans exception handling, the code looks like:
page = urllib.request.urlopen(fullurl)
markup = page.read()
dom = xml.dom.minidom.parseString(markup)
urlopen() returns a file-like object, but the docs didn’t seem clear on whether it’s like a file opened in byte or string mode. In any case, I’m almost certainly doing it wrong. Suggestions?
-m
Permalink
Filed under languages, python
Tuesday, December 9th, 2008
Wendell Piez, Mulberry Technologies
Assertion-based schema language. A way to test XML documents. Rule-based validation language. Cool report generator. Good for capturing edge cases.
Same architecture as XSLT. (Schematron specifies, does not perform)
<schema xmlns="http://purl.cclc.org/dsdl/schematron">
<title>Check sections 12/07</title>
<pattern id="section-check">
<rule context="section">
<assert test="title">This section has no title</assert>
<report test="p">This section has paragraphs</report>
...
Demo. OxygenXML has support. Assert vs. Report – essentially opposites. Assert means “tell me if this if false”. Report means “tell me if this is true”.
“Almost as if Schematron is a harness for XPath testing.”
More examples:
<rule context="note">
<report test="ancestor::note">A note appears in a note. OK?</report>
</rule>
Binding: Default is XSLT 1, but flexible enough to allow other query langauges via attribute @queryBinding at the top. Many processors allow mix-and-match between XSLT and Schematron. Examples showing just that.
Some tests can be very useful:
test=”every $line in tokenize(., $newline) satisfies string-length($line) le 72″
Q: What if the destination is not a human, but another part of a pipeline? Varies by implementation, but SVRL is standardized as an annex in the ISO spec, part of DSDL.
Use as little or as much as you want, at different times in the document lifecycle. “Schematron is a feather duster that reaches areas other schema languages cannot.” – Rick Jelliffe
As time permits section of the talk:
Other top-level elements: title, pattern, ns, let, p, include, phase, diagnostics.
-m
Permalink
Filed under languages, xml, xpath
Friday, November 28th, 2008
Lately I’ve been playing with some more advanced XQuery. One thing nearly every XQuery engine supports is some kind of eval() function. MarkLogic has several, but my favorite is xdmp:eval. It’s lightweight because it reuses the entire calling context, so for instance you can write let $v := 5 return xdmp:value("$v"). Not too useful, but if the expression passed in comes from a variable, it gets interesting.
Now, quite a few standards based on XPath depend on the context node being set to some particular node. This turns out to be easy too, using the path operator: $context/xdmp:value($expr). According to the definition of the XPath path operator, the expression to the right is evaluated with the results of the expression on the left setting the context node.
OK, how about setting the context size and position? More difficult, but one could use a sequence on the left-hand side of the path operator, with the desired $context node in somewhere in the middle. Then last() will return the length of the sequence, and position() will return, well, the position of $context in the sequence. But it’s kind of hacky to manufacture a bunch of temporary nodes, only to throw them away in the next step of the path.
I’m curious if anyone else has done something similar. Comments? -m
Permalink
Filed under Mark Logic, XQuery, languages, standards
Friday, October 24th, 2008
I’ve been playing lately with this site, and it’s a fantastic resource. The word carboy probably comes from Persian qarabah “large flagon.” Who knew? -m
Permalink
Filed under everythingismiscellaneous, languages, metadata, search
Wednesday, September 17th, 2008
The XQuery Working Group is debating the need for higher-order functions in the language. I’m working on honing my description of why this is an important feature. Does this work? What would work better?
Imagine you are writing a smallish widget app, in an environment without a standard library. When you need to sort your widgets, you’d write a simple function with a signature like sort(sequence-of-widgets). That’s great.
Now imagine you find your app to be steadily growing. An accumulation of smaller one-off solutions won’t work anymore, you need a general solution. What you’ll end up with is something like qsort in C, which takes a pointer to a comparator function. By providing different comparators, you can sort anything any way you like, all through only a single sort function. C and C++ have something like this, as do PHP, Python, Java, JavaScript, and even assembly language. XSLT has it, as proven by Dimitre.
XQuery doesn’t. It should, because people are now using it for more than short queries. People are writing programs in it. -m
P. S. Comment please.
Permalink
Filed under XQuery, languages, python, standards
Wednesday, September 10th, 2008
Has there ever been a case of mitigated gall?
More collected Geek Thoughts at http://geekthoughts.info/.
Permalink
Filed under geekthoughts, languages
Friday, August 8th, 2008
It would be awesome of someone made a site that catalogued all the common mis-encodings. Even in 2008, I see these things all over the web–mangled quotation marks, apostrophes, em-dashes. I’d love to see a pictoral guide.
curly apostrophe looks like ?’ – original encoding=_________ mislabeled as __________ .
That sort of thing. Surely somebody has done this arleady, right? -m
Permalink
Filed under browsers, languages, metadata
Wednesday, May 28th, 2008
I registered ‘xfv’ on Google App Engine. Too bad there doesn’t appear to be any significant XML libraries supported. I have XPath covered by my pure-python WebPath, but what about Relax NG? Anyone know of anything in pure python? -m
Permalink
Filed under XForms, announcement, google, languages, python, xpath
Tuesday, May 20th, 2008
In my about page, I’ve written my CV in two lines. Why don’t you try it, then link back to here?
I’ve been known to use this as an interview question, and it’s quite a bit harder than it looks. A clever candidate will turn the paper sideways giving themselves more room to write “two lines”, but that’s not the point. This exercise forces one to really think about their qualifications, skills, and experience; one’s “unique selling proposition”.
Writing short, as opposed to rambling on, is notoriously difficult. Someone who can do that with their own CV is off to a good start in my book. -m
P. S. Mark Logic is looking for some high-caliber XML and web folks. Contact me offline if you know anyone looking…
Permalink
Filed under Mark Logic, announcement, languages, writing
Thursday, May 8th, 2008
When making hash browns xkcd style, there are at least 14 ways it could go badly.
- That’s not a potato, it’s a misshapen rock.
- Unexpectedly flammable tennis racket.
- Sparks landing on gas can.
- Food poisoning via undercooked hash browns due to limited flame contact time.
- Broken plate fragments.
- Dripping, flaming gasoline.
- Swing and a miss; balance lost.
- Flaming potato fragments in the eye socket.
- Diving catch ends badly.
- Spontaneous combustion.
- Tennis elbow.
- Repetitive stress injury.
- Fork misfire.
- Heat death of the universe.
(17 if that fork is a dangerous crossbreed) -m
Permalink
Filed under everythingismiscellaneous, languages, writing
Friday, November 30th, 2007
If you want to get anything done, give it to a busy person…
In my life, I’ve started four novels, completed my goals on three, gotten to “The End” on two, and completely flamed out on one.
The first was in 2001. I hadn’t written much since high school. Something clicked in my head that made me realize that writing wasn’t some kind of black art (as one particular teacher had drilled into his credulous students). It was doable. You take pencil and paper and write one word after another. Voilà. I was so taken with this simple idea that every single thing I ever learned about writing went out the window. I had Swifties, danglers, tell-vs-show, you name it. There’s enough material in there for several Bulwer-Lytton contests. By the time I had 70 hand-written pages, the thing collapsed under it’s own weight and the story reached an abrupt, borderline-surrealistic “ending” to abuse the term. I have evidence that I even typed it all in and pressed on for a 2nd draft.
By 2003 my non-fiction book was published–my writing career was under way! Part of the elaborate book proposal dance involved me writing some online articles, including one piece of fiction that was well-received in the tiny circle that was its intended audience. At this stage I adopted electronic writing, and ditched my crashy Windows laptop for a Mac, a vast improvement.
In 2005 I discovered NaNoWriMo, and though I thought it would be a lost cause, I signed up. No way it could be as bad as the previous attempt. I had a new job, and was able to skip a few lunches to write, not to mention intense evenings and weekends. The end goal is 50,000 words during the 30 days of November, that’s 1,666 and two-thirds words per day. All of the prior month I spent outlining, making maps, creating my universe. I used the simplest of tools, my text editor and one file per chapter. I learned that the command wc *.txt could easily give me a combined word count. To my surprise, it worked. I reemerged into daylight with a completed a full story arc loosely based on the earlier story, and ended up with just over 50,000 words. The text itself was very rough, but I read the whole thing out loud in a podcast to edit it. In terms of improvement, it was huge, but still far from publishable.
2006 and another NaNoWriMo rolled around, and I took off on a more ambitious storyline with far fewer notes going into it. The story itself involved the same general characters of the previous two episodes, but with a deeper, more mature feeling to it. In short, I finally wrote a piece of fiction to be proud about afterwards, though when I hit 50,000 words I felt really burned out; hit “save” and left the story arc unfinished.
The pull to dig in to an intensive 2nd draft of the story was immense, but just too many things were going on, including a new arrival in the family and a new set of job responsibilities. I never got more than a few dozen pages into the rewrite. When NaNoWriMo 2007 came upon me, I had a tough choice…do I write something fresh, or try to rework the previous novel? Fresh. A completely new story line, new characters, new setting, new everything. As of a few days ago, I finished the draft, compressing parts of the story as needed to meet both the 50 kiloword goal and the complete story arc. In preparation, I read a number of books, but as far as written outlines, maps, etc. go, almost nothing happened before November 1. I saved enough of the “fun stuff” that a second revision of this story will be a joy. Overall, another improvement year-over-year.
There’s only one kink to the “if you want to get something done…” idea: my slides for the XML Conference talk I have in a few days are still unfinished… -m
Permalink
Filed under everythingismiscellaneous, languages, stuff, writing
Monday, November 19th, 2007
Where’s Project Gutenberg? One difficulty in launching an ebook platform is the lack of available titles. I keep hearing about 80,000+ titles, but expressed as a percentage of Amazon’s book catalog, it’s minuscule. There should be all kind of public domain titles ready to go on day one. And where’s the Creative Commons books?
There’s some public domain books to be found, but none are free. Take, for example, A Connecticut Yankee in King Arthur’s Court, a book (in paper form) sitting just out of arm’s reach as I write this, waiting to be read. If I had it on a device, particularly one with a good screen, I’d be more inclined to keep it, and dozens others, on hand in my backback and be ready to read at a moment’s notice. But no.
The problem is the the “we take care of the wireless delivery” part, called Whispernet(tm). It’s not really free, nor bundled in the service price. It’s bundled in to the cost of every media access. Is it fair to pay $9.99 for a New York Times bestseller? Sure. But it sucks to pay $1 for an A-list blog that’s free everywhere else, or to get literally nickeled and dimed for the privelege of “converting” and delivering your own content to your own device.
By the way, who gets the money paid for accessing, say, a CreativeCommons non-commercial licensed blog via the Kindle? Somebody should look into that.
I applaud Amazon for pushing to innovate in a space that badly needs it, but the financial model behind the wireless access encourages the wrong kind of things. Exceptions, like unlimited Wikipedia access (be still my heart!) still need to be hand approved by the gatekeeper. Information wants to be free, it doesn’t want to be a service, though that’s hard to see when the dollar signs get in your eyes.
Many folks are comparing this to the original iPod launch–remember, the huge klunky one with a tiny capacity, black and white screen, and a mechanical click-wheel? There’s some strong points of similarity, but stronger differences. For one, anyone with an iPod can easily rip their existing CDs, not to mention obtain MP3s from other methods (so I hear). There’s nothing like that yet for books.
Where’s the documentation for the new, proprietary ebook format? I don’t care about the DRM crap. I care about being able to create new content, or repackage existing content for which I have the rights, and for that, I’m having trouble coming up with a rationale for an entire new format. I would love to do some cool things with this platform. Perhaps I will some day, though my enthusiasm is somewhat lessened by the difficulties I would face getting anything cool onto the devices. -m
Permalink
Filed under IPR, browsers, everythingismiscellaneous, hardware, languages, mobile, trends
Sunday, November 18th, 2007
As one who, in the all-too-near future, will be hammering out the visuals to go with my talk at XML 2007, this made my day. (be sure to check out the deeper pages too) -m
Permalink
Filed under everythingismiscellaneous, languages, writing
Saturday, November 10th, 2007
What is the difference between placing instanceof=”prefix:val” vs. rel=”prefix:val” on something? How do I decide between the two?
In the example of hEvent data, why is it better/more accurate to use instanceof=”cal:Vevent” instead of a blank node via rel=”cal:Vevent”?
-m
Permalink
Filed under everythingismiscellaneous, languages, standards, xml
Monday, October 22nd, 2007
The more I look at RDFa, the more I like it. But still it doesn’t help with the pain-point of namespaces, specifically of unmemorable URLs all over the place and qnames (or CURIEs) in content.
Does GRDDL offer a way out? Could, for instance, the namespace name for Dublin Core metadata be assigned to the prefix “dc:” in an external file, linked via transformation to the document in question? Then it would be simpler, from a producer or consumer viewpoint, to simply use names like “dc:title” with no problems or ambiguity.
This could be especially useful not that discussions are reopening around XML in HTML.
As usual, comments welcome. -m
Permalink
Filed under annoyance, intentional web, languages, microformats, standards, stuff, xml
Monday, October 1st, 2007
It’s a common need to parse space-separated attribute values from XPath/XSLT 1.0, usually @class or @rel. One common (but incorrect) technique is simple equality test, as in {@class=”vcard”}. This is wrong, since the value can still match and still have other literal values, like “foo vcard” or “vcard foo” or ” foo vcard bar “.
The proper way is to look at individual tokens in the attribute value. On first glance, this might require a call to EXSLT or some complex tokenization routine, but there’s a simpler way. I first discovered this on the microformats wiki, and only cleaned up the technique a tiny bit.
The solution involves three XPath 1.0 functions, contains(), concat() to join together string fragments, and normalize-space() to strip off leading and trailing spaces and convert any other sequences of whitespace into a single space.
In english, you
- normalize the class attribute value, then
- concatenate spaces front and back, then
- test whether the resulting string contains your searched-for value with spaces concatenated front and back (e.g. ” vcard “
Or {contains(concat(’ ‘,normalize-space(@class),’ ‘),’ vcard ‘)} A moment’s thought shows that this works well on all the different examples shown above, and is perhaps even less involved than resorting to extension functions that return nodes that require further processing/looping. It would be interesting to compare performance as well…
So next time you need to match class or rel values, give it a shot. Let me know how it works for you, or if you have any further improvements. -m
Permalink
Filed under XForms, browsers, languages, software, web20, xml, xpath
Sunday, September 16th, 2007
My Copious Free Time(tm) has been filled lately by two different evaluation projects. One is the 2nd Annual Writing Show Best First Chapter of a Novel Contest, for which the first round of judging is just winding up. The main benefit for contest entrants is that every submission gets a professional critique of at least 750 words. But additionally, each submisison gets a score on a 50-point scale, based on:
- 10 points for Story. Is it a compelling read with a great hook? Are we engaged?
- 10 points for Style. Is the writing smooth and tight, without awkward constructions, extraneous verbiage, and redundancies?
- 10 points for Dialog. Is the dialog natural and does it move the story along?
- 10 points for Character. Are the characters interesting? Do we care about them?
- 10 points for Mechanics. Are grammar, spelling, and punctuation correct?
I’m also attending some classes aiming toward becoming a Certified Beer Judge (details on Meadblog). This isn’t as fun as it sounds. (Well, OK, maybe it is…). The idea is to build up better sensory perception so that my personal brewing and cooking projects can benefit. But the upcoming test is 70% written essay questions like “Identify three distinctly different top-fermenting beer styles with a starting gravity of 1.070 or higher, and describe the similarities and differences between the styles”. 30% of the test is based on actual tasting and filling out a tasting sheet. Of interest, the scoring here is also based on a 50-point scale:
- 12 points for Aroma.
- 3 points for Appearance.
- 20 points for Taste.
- 5 points for Mouthfeel.
- 10 points for Overall Impression.
The interesting part is that there’s similarities between the two tasks. For both, I need to work off of physical paper, not in my head on on a computer screen. For both, I first “skim”, building an overall impression, then dig down into individual categories to assign a score for each one. Then I step back and look at my numbers, and check whether everything makes sense and accurately records my impressions. When I’m satisfied, I add everything up and am done.
Most day-to-day problems aren’t so well structured or normalized, but nonetheless, I find myself tackling all kinds of problems with a similar approach. There you have it. Writing and drinking beer make you a better person. :) -m
Permalink
Filed under everythingismiscellaneous, languages, patternalia, stuff, writing
Wednesday, August 22nd, 2007
Yeah, they’re related. -m
Permalink
Filed under announcement, everythingismiscellaneous, languages, stuff
Wednesday, August 8th, 2007
Go check it out. It even has a Tidy option to clean up the markup. But they missed an important feature: it should include an option to run Tidy on the markup first then validate. This is becoming the defacto bar for web page validity anyway… -m
Permalink
Filed under browsers, intentional web, languages, software, standards, trends, xml
Monday, July 16th, 2007
If it’s been quiet on this front it’s because I’ve been engrossed in my continuing education. Andy Oram sent me a copy of Beautiful Code, a thoroughly enjoyable work from O’Reilly. If you like stretching your brain by reading code-intense essays from top-tier coders, I recommend this volume. In particular, I’m been digging into Douglas Crockford’s Top Down Operator Precedence chapter.
Other than that, some interesting BJCP classes, but I’m keeping that non-tech stuff over on meadblog. -m
Permalink
Filed under languages, patternalia, software, stuff
Wednesday, January 24th, 2007
I’ve always had a thing for text analysis.
- the 352
- and 250
- to 225
- of 188
- in 118
- a 108
- we 100
- is 76
- our 75
- that 72
Source. -m
Permalink
Filed under languages, python, stuff
Tuesday, January 23rd, 2007
A semi-random thought that occurred to me.
One marker of a well-designed markup language is that it looks to the future. This doesn’t mean it’s an amorphous blob of abstract indirections mapped to tags. It can (and arguably should) be concrete and solid, but designed in such a way that keeps bigger things in mind.
HTML and XHTML are, I suppose, canonical examples of this, giving birth to microformats and many other uses outside of a browser. -m
Permalink
Filed under intentional web, languages, microformats
Wednesday, November 8th, 2006
but there has never been a successful Java implementation of a commercial-grade web browser. (right?)
There exist lots of huge applications including IDEs, and editors of all sorts, but nobody’s been able to nail the whole XHTML+CSS+JavaScript thing in Java. (right?)
Take it a step further–no need to pick on Java–nobody has done this in any VM-based language (right)?
Coincidence or sign of greater forces in the universe? Feel free to post counter-examples in the comments.-m
Permalink
Filed under browsers, languages, standards, xml
Thursday, October 5th, 2006
On a PHP 4 project, I need to use XSLT, but the interfaces seem far more complicated than they should be. Check out this declaration for the function to run an XSLT:
mixed xslt_process ( resource xh, string xmlcontainer, string xslcontainer [, string resultcontainer [, array arguments [, array parameters]]] )
Wow, it returns a “mixed”, that’s some helpful documentation worth looking up. And all I have to do is pass in an “xmlcontainer” and “xslcontainer”. In practice, you end up with hard-to-read code like this:
$arguments = array('/_xml' => $xml, '/_xsl' => $xsl);
$xslproc = $xslt_create();
$result = xslt_process($xslproc, 'arg:/_xml', 'arg:/_xsl', NULL, $arguments);
Too bad I have already-parsed objects. And then there’s this:
Warning: As of PHP 4.0.6, this function no longer takes XML strings in xmlcontainer or xslcontainer. Passing a string containing XML to either of these parameters will result in a segmentation fault in Sablotron versions up to and including version 0.95.
Whatever. In fairness, this falls mainly to the XSLT engine and not the language and the PHP 5 interfaces are much different and much improved. But still.. -m
Permalink
Filed under languages, stuff