Archive for the 'standards' Category

Thursday, November 29th, 2007

XPath 2.0 implementation details

Well, my plans for a series of postings about details of implementing XPath 2.0 fell rather short, so let’s skip straight to the good stuff.

An article by Mike Kay giving the details of the Saxon architecture. On the surface it’s about performance, but it also has an excellent section in internals. Worth a look. This has been quite influential for me, and maybe you too. -m

Tuesday, November 13th, 2007

Representing structured data on a web page

OK, let me take a step back from specific technologies like RDFa, let’s go through a really simple example.

On a certain web page, I refer to a book. That book has a price of 21.86 US dollars. The page is intended as primarily human-readable, but I want to include machine-readable data too, for a global audience.
What would you do? What specific markup choices would you make? What specific markup would you use? -m

Saturday, November 10th, 2007

RDFa question

What is the difference between placing instanceof=”prefix:val” vs. rel=”prefix:val” on something? How do I decide between the two?

In the example of hEvent data, why is it better/more accurate to use instanceof=”cal:Vevent” instead of a blank node via rel=”cal:Vevent”?

-m

Monday, November 5th, 2007

A better name for CURIEs (?)

“Compact Clark Notation“. (Inspired by reading this) -m

Monday, October 29th, 2007

Five percent rules

Many things in life are simpler when you only need to be within 5%:

  • Pi is pretty much 3
  • Water weighs pretty much 8 pounds a gallon
  • A quart is pretty much a liter (and a gallon, 4 liters)
  • A year has pretty much 360 days, and pretty much 31 million seconds
  • The speed of light is pretty much 300,000 km/s, which is pretty much one foot/nanosecond

Of course, there’s even more things that get more convenient when you have 10% or 20% to work with… -m

Monday, October 22nd, 2007

Is there fertile ground between RDFa and GRDDL?

The more I look at RDFa, the more I like it. But still it doesn’t help with the pain-point of namespaces, specifically of unmemorable URLs all over the place and qnames (or CURIEs) in content.

Does GRDDL offer a way out? Could, for instance, the namespace name for Dublin Core metadata be assigned to the prefix “dc:” in an external file, linked via transformation to the document in question? Then it would be simpler, from a producer or consumer viewpoint, to simply use names like “dc:title” with no problems or ambiguity.

This could be especially useful not that discussions are reopening around XML in HTML.

As usual, comments welcome. -m

Saturday, October 20th, 2007

Building a tokenizer for XPath or XQuery

In researching for an XPath 2.0 implementation, I ran across this curious document from the W3C. Despite being labeled a Working Draft (as opposed to a Note), it appears to be a one-shot document with no future hope for updates or enhancements.

In short, it outlines several options for the first stage or two of an XPath 2.0 or XQuery implementation. (Despite the title, it talks about more than just a tokenizer; additionally a parser and a possible intermediate stage). Tokenizing and parsing XPath are significantly more difficult than other languages, because things like this are perfectly legitimate (if useless):

if(if) then then else else- +-++-**-* instance
of element(*)* * * **---++div- div -div

The document tries to standardize on some terminology for various approaches toward dealing with XPath. The remaining bulk of the document sketches out some lexical states that would be useful for one particular implementation approach. I guess the vibrant, thriving throngs of XPath 2.0 developers didn’t see the need for this kind of assistance.

In short, I didn’t find it terribly useful. Maybe some readers have, though. Feel free to comment below. Subsequent articles here will describe how I approached the problem. Stay sharp! -m

Monday, October 15th, 2007

XForms evening at XML 2007

Depending on who’s asking and who’s answering, W3C technologies take 5 to 10 years to get a strong foothold. Well, we’re now in the home stretch for the 5th anniversary of XForms Essentials, which was published in 2003. In past conferences, XForms coverage has been maybe a low-key tutorial, a few day sessions, and hallway conversation. I’m pleased to see it reach new heights this year.

XForms evening is on Monday December 3 at the XML 2007 conference, and runs from 7:30 until 9:00 plus however ERH takes on his keynote. :) The scheduled talks are shorter and punchier, and feature a lot of familiar faces, and a few new ones (at least to me). I’m looking forward to it–see you there! -m

Monday, October 8th, 2007

XML 2007 Schedule

As widely reported by now, the final schedule for XML 2007 this December in Boston is up. All I have to add is the suggestion of careful attention to the Tuesday program at 4:00. :) If you can’t wait, some technical details are forthcoming in this space. That is all. -m

Friday, October 5th, 2007

Playing with microformats

I’ll be doing some experimenting around here over maybe the next week or two. Specifically, setting up hAtom within these pages. Watch for falling debris and report any unusual observations. -m

Wednesday, October 3rd, 2007

XML Annoyance: do greater-than signs need to be escaped?

Let’s see how many downstream pieces of software trip over this post…

Do greater-than and less-than signs need to be escaped in XML? Conventional wisdom has it that less-than signs always do, since that character starts a fresh “tag”, but greater-than signs are safe.

Wrong.

There is a particular sequence, namely ]]> , not allowed to occur unescaped in XML “for compatibility“–a particular phrase the spec uses to indicate rules that only an SGML-head could love (but still strict requirements nonetheless). Does your software prevent this condition from causing an error? -m

Monday, October 1st, 2007

simple parsing of space-seprated attributes in XPath/XSLT

It’s a common need to parse space-separated attribute values from XPath/XSLT 1.0, usually @class or @rel. One common (but incorrect) technique is simple equality test, as in {@class=”vcard”}. This is wrong, since the value can still match and still have other literal values, like “foo vcard” or “vcard foo” or ” foo vcard bar “.

The proper way is to look at individual tokens in the attribute value. On first glance, this might require a call to EXSLT or some complex tokenization routine, but there’s a simpler way. I first discovered this on the microformats wiki, and only cleaned up the technique a tiny bit.

The solution involves three XPath 1.0 functions, contains(), concat() to join together string fragments, and normalize-space() to strip off leading and trailing spaces and convert any other sequences of whitespace into a single space.

In english, you

  • normalize the class attribute value, then
  • concatenate spaces front and back, then
  • test whether the resulting string contains your searched-for value with spaces concatenated front and back (e.g. ” vcard “

Or {contains(concat(’ ‘,normalize-space(@class),’ ‘),’ vcard ‘)} A moment’s thought shows that this works well on all the different examples shown above, and is perhaps even less involved than resorting to extension functions that return nodes that require further processing/looping. It would be interesting to compare performance as well…

So next time you need to match class or rel values, give it a shot. Let me know how it works for you, or if you have any further improvements. -m

Thursday, September 27th, 2007

XForms Editors?

What are some good tools, with a strong preference for open source, for editing XForms these days? Comment below… -m

Friday, September 21st, 2007

Come see me at XML 2007

Watch this space for details. I’ll be speaking about something related to Python and XPath 2.0. Watch this blog for tidbits on the subject. :) -m

Saturday, September 8th, 2007

Steven Pemberton and Michael(tm) Smith on (X)HTML, XForms, mobile, etc.

Video from XTech, worth a look. -m

Friday, September 7th, 2007

Time to update the XForms Validator (XFV)?

In the last couple of days, I’ve had three completely separate instances of people freshly interested in XForms coming to ask me about Stuff.

A declarative model is pretty much irresistible compared to the alternatives. But nobody can directly use an abstract declarative sculpture–sombody needs to put some solid vocabulary and processing meat on the skeleton. And, of course, a good example of that is XForms.

Around the time the book came out, I put together a modest XForms Validator, modeled after the W3C validator of the time. It later went open source, and is available online. But compared to the latest in online validator technology, it feels more than a little dated.

Hypothetically speaking, if I actually had free time, would it make sense to update the XForms Validator? What would you use it for? Would you be willing to help?

Comments below. Thanks, -m

Wednesday, August 29th, 2007

2 years at Yahoo!

Today is my 2nd anniversary at Yahoo!. Looking back, it’s been a great time. Since I don’t know how long ago, I’ve fantasized about being involved in research. Check. Since sitting across from the mobile guys for 5 years in W3C meetings, I’ve fantasized about working in mobile. Check. And since I wrote Web search, without the web (demo), I’ve fantasized about working on web-scale search.

Check.

What will the next two years bring? I don’t know, but I’m certain they will be even better than the previous two. -m

Thursday, August 16th, 2007

XForms Essentials at…Target?

Yeah, it’s for real. You save 27%! Sure, it’s powered by Amazon, but it’s still a little weird to see this come up in search results… -m

Wednesday, August 8th, 2007

New W3C Validator

Go check it out. It even has a Tidy option to clean up the markup. But they missed an important feature: it should include an option to run Tidy on the markup first then validate. This is becoming the defacto bar for web page validity anyway… -m

Sunday, June 24th, 2007

At that moment, I knew my business was Machine Ready

I fell asleep one night while reading Ray Kurzweil, and had this crazy dream where the internet called me up (over VOIP, naturally) to complain that none of my web pages made sense. Par for the course, I thought at first. But then I told the internet a few things, to let me worry about my own domain of concern; he/she/it grappled with a response when a loud noise awoke me–my chirping alarm clock. I reached over to pound the Snooze button, but I stopped when my eyes focused on the display, which read in segmented LED letters: I rtFm. -m

Monday, April 30th, 2007

Why does ‘rich client’ equal ‘bad separation of presentation from content’?

I started writing this post back when doing tech editing the “Rich Client Alternatives” chapter on Web 2.0, the book. Now, with Apollo getting some attention, it’s worth revisiting.

What do XUL, Yahoo! Widgets, OpenLaszlo, Silverlight, and Apollo have in common? All of them mix content with presentation to some degree. Years of experience on the web have shown that a properly-done CSS layout gives you:

  • smaller, faster pages
  • better accessibility and user control of rendering
  • better adaptation to different screen resolutions
  • easier repurposing of data, including microformats
  • better mobile compatibility

Initial HTML browsers didn’t have these advantages, and gave in to early pressure to implement things like blink and font tags. Today, most webfolks would admit that these presentational tags were a mistake, and contemporary web design avoids them.
So what is it about “rich” clients that’s different? Are developers missing out on the hard lessons learned on the web? Or is there something inherent in the definition of “rich clients” that changes the balance? Your comments are welcome. -m

Wednesday, April 11th, 2007

James Clark blog: do you read the web or feed version?

James Clark is blogging. A few zillion people have already mentioned this.

A slightly tangent observation: I had trouble reading through an entire article in web form, but had no problems returning later to the atom feed. At first I chalked it up to early morning grogginess, but it seems to be a repeatable phenomenon at all hours, at least for me.

So a double thanks to James for publishing a full feed.

How about you: do you have an easier time reading long form articles in a feed reader vs. a browser?  Do you prefer feed reader vs. browser for this blog? Comment below. -m

Sunday, April 1st, 2007

HTTPoetry

I can’t talk on the phone right now. Can you follow up on email?
Consider it placed on my todo list.
Let me give you my new address.
Hmm, I don’t have it.
What are you talking about?

(If you get the pattern, post below…) -m

Thursday, March 15th, 2007

Feed readers get namespaces wrong

Big surprise, huh? More evidence that the XML namspaces spec is out of touch with the reality of developers ‘on the street’, a.k.a. it has cracks in the foundation.

I disagree that aggregator developers are “bozonic”, as the title of the first cited article indicates. Why should any developer need to keep all that extra complexity bouncing around in their head? Optimize for people first, machines second. -m

Friday, March 9th, 2007

HTTP question

OK, RESTafarians and HTTP experts, here’s a question. Is it kosher to send a Location: header back with an ordinary, say 200, response?

Scenario: the server knows better than the client what the client needs. ‘I realize you asked for http://foo.com/x, but instead I’m sending you http://foo.com/y — ready or not, here it comes..’
-m

Tuesday, February 13th, 2007

Windows Live Search for Mobile

Spotted under the headline Windows Live Search for Mobile Goes Final, Still Great (like they were expecting it to suddenly plummet in quality?) on Gizmodo. It’s a 114k jar file that runs on my SLVR, where Yahoo! Go isn’t yet available yet, so points for that. Search suggestions show as you type, hugely useful on a klunky 9-key entry situation. They use an interesting UI to hold search results, densely packed–6 down the screen–with a status bar on top, and each search result marquee-scrolling back-and-forth as needed. A detail page can zap you in to map mode or set up a call.

My standard test search–a little offbeat but still plausible–for mead near Sunnyvale produced disappointing results. The meadery within walking distance didn’t show, and of the top 6, two were duplicates. Scrolling down to the 10th result, though, did show an interesting, useful result, albeit 60.15 miles away: Knowne World Meads. I wanted to visit the web site, but here lies another problem: there’s no web integration. None of the search results include a URL or clickable link.

For all the hassle, I’ll stick with Opera Mini and my favorite search engine, thank you. -m

Tuesday, February 13th, 2007

changes the architecture of the house, not just the color of the paint

ERH’s comments on XForms, as part of his predictions for 2007. Worth a read. -m

Thursday, February 8th, 2007

The internet is a series of pipes

Check it out. -m

Thursday, February 1st, 2007

Is HTML on the web a special case?

Some random thoughts and responses to lots of blog discussion sparked by the XML2 article, where I asked “Is HTML on the Web a special case?”

By which, I mean, if you go through all the effort of writing down all the syntax rules used by the union of browsers that you care about, then go through the pain of getting consensus within a standards body, will the resulting document be useful beyond HTML on the Web, much like how XML is useful beyond being a vehicle for XHTML?

I don’t know if Tim Bray had that same version of the question in mind, but he answers “obviously ‘yes’”.

But I don’t think so. Once you have that set of rules, wouldn’t it be useful in other areas, say, notoriously RSS on the web? SVG? MathML? In fact, I’d go as far as saying that any hand-authored markup would be a candidate for XML2 syntax.

What about mobile? Anne van Kesteren responds:

in that article Micah Dubinko mentions mobile browsers living up to their premise and all that. What he says however, isn’t really true. Mobile browsers and XHTML is tag soup parsing all the way.

He links to this page, which does a rather poor job of making a point the author seems to have decided upon before starting the experiment. If you look at the specific test cases, one tests completely bizarro markup that no author or tool I can imagine would ever produce. Another test checks the handling of content-type, not markup. On the other axis, the choices there seem a bit jumbled: lists of user-agent strings, one for stock Mozilla, and a footnote indicating confusion about what browser is really in use. If anything, this page shows that the browsers tested here, with the exception of Opera Mini, are crap. If you spend more than a few minutes in mobile, you’ll discover this widespread trend. (And I’m working on a solution…watch this space).

Look at this from a pragmatic viewpoint. Check the doctype used on Yahoo! front page vs mobile front page. Despite the poor browsers, XHTML adoption is still farther ahead on the mobile web then the desktop web.

The last thing nagging at me (for now) is whether XML2 will have an infoset. Will it be possible to use XPath, XQuery, and XML tools on XML2 content? How well will these map to each other? In the strict sense, no, XML2 won’t have a conforming infoset because it will never include namespaces. But might it support a subset of the infoset? (Would that be a infosubset?) That’s a huge open question at this point. -m

Friday, January 26th, 2007

UBL Swinger

An easy to use UBL Editor. Has anyone tried it? -m