Archive for the 'intentional web' Category

Monday, July 21st, 2008

Review: Web 2.0: A Strategy Guide

Actually, instead of a review, let me quote the opening testimonial from the inside-front cover.

Competing globally with dynamic capabilities is the top priority of multinational executives and managers everywhere. Rethinking strategy in a highly networked world is the big challenge. How can your company navigate successfully in this turbulent, highly networked and socially connected environment? …

If this does it for you, I couldn’t recommend this book more highly. -m

Thursday, July 17th, 2008

Website Optimization is on the shelves

Andy King’s Website Optimization is now in print from O’Reilly. This book covers it all: performance, SEO, conversion rates, analytics, you name it. If you run a web site, you’ll find this useful. I tech edited and contributed a small portion, about the growing trend of metadata as site advantage. Go check it out. -m

Thursday, July 3rd, 2008

Yahoo! now indexes RDFa

I haven’t seen an announcement about this, but try the following query on Yahoo Search: [] (link). It shows documents containing RDFa, with Digg at the top. Since this is a Searchmonkey ID, it’s also usable in Searchmonkey to actually extract the metadata and use it to customize search results.

Does your site use RDFa yet? -m

Wednesday, May 14th, 2008

Reminder: SearchMonkey developer launch party Thursday

Reminder: Thursday evening at Yahoo! Sunnyvale headquarters is the launch party for the developer-facing side of SearchMonkey. In case you haven’t been paying attention, SearchMonkey is a new platform that lets developers craft their own awesomized search results. If you’re interested in SEO or general lowercase semantic web tools, you’ll love it. Meet me there. Upcoming link. Party starts at 5:30. -m

Update: The developer tool is live. Rasmus has a nice walkthrough.

Wednesday, May 7th, 2008

Quote of the day

Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them…

The prescient Vannevar Bush, who foresaw (among other things) the importance of hyperlinks. -m

Friday, May 2nd, 2008

SearchMonkey dev party

If you have webdev skillz, you might be interested in the SearchMonkey launch party on May 15. Good food, good drink, good coding. Space is limited, but I have a few invites to share. Comment here or contact me offline if interested. -m

Thursday, May 1st, 2008


Today happens to mark the 6th anniversary of my blog. To celebrate going into year seven I’m refocusing it, including a new name: Micahpedia.

Blogging is an important skill, a subset of the overall skill of managing your online persona, so it’s worth devoting some attention to. The ego-burst doesn’t hurt either. My concrete goal is to get in the top 10 search results for the query [Micah], though I face some stiff competition including the prophet.

From an SEO perspective, “Push Button Paradise” wasn’t the greatest choice of name. It suffers from the common SEO mistake of being excessively clever and/or cute reflection of what I happened to be working on at the moment, namely XForms. If you see the old name standalone, or in a blogroll, or in an RSS reader, you still don’t have much of an idea what it’s about or who’s behind it. True I get pretty good ranking on the exact phrase, but nobody searches for that…

I will continue SEO tweaks on this site as time goes on and welcome any advice from any of my 7 readers.

In short, Micahpedia is about what I’m reading, writing, thinking about, and working on. I have plenty to say about these things. :-) The best is yet to come. -m

Monday, April 28th, 2008

SearchMonkey in private beta

I haven’t mentioned it yet, but SearchMonkey (now an official name, not just a project name) is in external limited beta. Keep an eye on ysearchblog, lots more technical content is on the way. -m

Thursday, March 13th, 2008

The (lowercase) semantic web goes mainstream

So today Yahoo! announced a major facet of what I’ve been working on lately: making the web more meaningful. Lots of fantastic coverage, including TechCrunch and ReadWriteWeb (and others, please link in the comments), and supportive responses and blog posts across the board. It’s been a while since I’ve felt this good about being a Yahoo.

So what exactly is it?

A few months ago I went through the pages on this very blog and added hAtom markup. As a result of this change…well, nothing happened. I had a good experience learning about exactly what is involved in retrofitting an existing site with microformats, but I didn’t get any tangible benefit. With the “SearchMonkey” platform, any site using microformats, or RDFa or eRDF, is exposed to developers who can enhance search results. An enhanced result won’t directly make my my site rank higher in search, it it most certainly make it prone to more clicks, and ultimately more readership, more inlinks, and better organic ranking.

How about some questions and answers:

Q: Is this Tim Berners-Lee‘s vision of the Semantic Web finally getting fulfilled?

A: No.

Q: Does this presuppose everybody rushing to change their sites to include microformats, RDF, etc?

A: No. After all, there is a developer platform. Naturally, developers will have an easier time with sites that use official and community standards for structuring data, but there is no obligation for any site to make changes in order to participate and benefit.

Q: Why would a site want to expose all its precious data in an easily-extractable way?

A: Because within a healthy ecosystem it results in a measurable increase in traffic and customer satisfaction. Data on the public web is already extractable, given enough eyeballs. An openness strategy pays off (of which SearchMonkey is an existence proof).

Q: What about metacrap? We can never trust sites to provide honest metadata.

A: The system does have significant spam deterrents built in, of which I won’t say more. But perhaps more importantly, the plugin nature of the platform uses the power of the community to shape itself. A spammy plugin won’t get installed by users. A site that mixes in fraudulent RDFa metadata with real content will get exposed as fraudulent, and users will abandon ship.

Q: Didn’t prove that having a better user interface doesn’t help gain search market share?

A: Perhaps. But this isn’t about user interface–it’s about data (which enables a much better interface.)

Q: Won’t (Google|Microsoft|some startup) just immediately clone this idea and take advantage of all the new metadata out there?

A: I’m sure these guys will have some kind of response, and it’s true that a rising tide lifts all boats. But I don’t see anyone else cloning this exactly. The way it’s implemented has a distinctly Yahoo! appeal to it. Nobody has cloned Yahoo! Answers yet, either. In some ways, this is a return to roots, since Yahoo! started off as a human-guided directory. SearchMonkey is similar, except a much broader group of people can now participate. And there are some specific human, technical and financial reasons why as well, but I suggest inviting me out for beers if you want specifics. :-)

Disclaimer: as always, I’m not speaking for my employer. See the standard disclaimer. -m

Update: more Q and A

Q: How is SearchMonkey related to the recently announced Yahoo! Microsearch?

A: In brief, Microsearch is a research project (and a very cool one) with far-reaching goals, while SearchMonkey is targeted as imminently shipping software. I frequently talk to and compare notes with Peter Mika, the lead researcher for Microsearch.

Monday, March 3rd, 2008

WebPath and Wikipedia

The WebPath bug reports continue to roll in. For one, queries against *.wikipedia.* don’t seem to work. You get something back, but it has no resemblance to the page you were looking for. The problem comes from the W3C tidy service that I use, specifically that the (understandably overworked and understaffed) admins at the Wikimedia Foundation seem to have blocked it. It seems like more than a simple IP or user-agent-based block. I’ve emailed them about it but haven’t heard back yet.

So, this highlights the limitation of having a single-source converter in the Platonic Web module of WebPath. So I turn to my readers: do you know of any other tidy servers? Or converters of a non-tidy origin? For any of these to work, they need to return clean XML corresponding to the original page (as opposed to, say, returning something with big headers/footers or ampersand-encoded). This seems like an outstanding need for the open source community.

Please comment below with ideas. Thanks! -m

UPDATE: heard back from the Wikipedia admins, and although professional and helpful-as-can-be-expected, they won’t be changing anything on their end. Still looking for more open source options.

Thursday, January 24th, 2008

WebPath wants to be free (BSD licensed, specifically)

WebPath, my experimental XPath 2.0 engine in Python is now an open source project with a liberal BSD license. I originally developed this during a Yahoo! Hack Day, and now I get to announce it during another Hack Day. Seems appropriate.

The focus of WebPath was rapid development and providing an experimental platform. There remains tons of potential work left to do on it…watch this space for continued discussion. I’d like to call out special thanks to the Yahoo! management for supporting me on this, and to Douglas Crockford for turning me on to Top Down Operator Precedence parsers. Have a look at the code. You might be pleasantly surprised at how small and simple a basic XPath 2 engine can be. So, who’s up for some XPath hacking?

Code download. (Coming to SourceForge with CVS, etc., in however many days it takes them to approve a new project) I hope this inspires more developers to work on similar projects, or better yet, on this one! -m

Wednesday, January 23rd, 2008

Machine tags

Take a look at this URL, and the page behind it. This is a list of all the Flickr photos with the tag “xmlns:dc=“. Although these have been around for a while, I hadn’t been aware of this kind of tagging until recently.

Why “xml” in the namespace declaration? This doesn’t have much to do with XML. How many tags are there in the world that start with “dc:” and are not referring to Dublin Core? At least the tag declaring the namespace provides a good hook for finding things with machine tags. It’s only a small step up to RDFa from here, which is good! -m

Monday, January 7th, 2008

Yahoo! introduces mobile XForms

Admittedly, their marketing folks wouldn’t describe it that way, but essentially that’s what was announced today. (documentation in PDF format, closely related to what-used-to-be Konfabulator tech; here’s the interesting part in HTML) The press release talks about reaching “billions” of mobile consumers; even if you don’t put too much emphasis on press releases (you shouldn’t) it’s still talking about serious use of and commitment to XForms technology.

Shameless plug: Isn’t it time to refresh your memory, or even find out for the first time about XForms? There is this excellent book available in printed format from Amazon, as well as online for free under an open content license. If you guys express enough interest, good things might even happen, like a refresh to the content. Let’s make it happen.

From a consumer standpoint, this feels like a welcome play against Android, too. Yahoo! looks like it’s placing a bet on working with more devices while making development easier at the same time. I’ll bet an Android port will be available, at least in beta, before the end of the year.

Disclaimer: I have been out of Yahoo! mobile for several months now, and can’t claim any credit for or inside knowledge of these developments. -m

P. S. Don’t forget the book.

Sunday, December 16th, 2007

Slides from XML 2007: WebPath: Querying the Web as XML

Here’s the slides from my presentation at XML 2007, dealing with an implementation of XPath 2.0 in Python. I hope to have even more news in this area soon.

WebPath (html)

WebPath (OpenDocument, 4.7 megs)

Did you notice the OpenOffice has nice slide export, that generates both graphically-accurate slides and highly indexable and accessible text versons? -m

Thursday, December 6th, 2007

Lists in RDFa?

I came away from the XML 2007 conference with lots of new ideas and inspirations. I’ll write some postings about individual technologies in the coming days.

But for now, another RDFa question. If I need to represent a list, what is the best way to do it? Does it differ between ordered and unordered lists? Let’s take some concrete examples, say a shopping list and an (ordered) todo list. How would you do it? -m

P.S. What about multi-level lists?

Tuesday, November 13th, 2007

Representing structured data on a web page

OK, let me take a step back from specific technologies like RDFa, let’s go through a really simple example.

On a certain web page, I refer to a book. That book has a price of 21.86 US dollars. The page is intended as primarily human-readable, but I want to include machine-readable data too, for a global audience.
What would you do? What specific markup choices would you make? What specific markup would you use? -m

Monday, October 22nd, 2007

Is there fertile ground between RDFa and GRDDL?

The more I look at RDFa, the more I like it. But still it doesn’t help with the pain-point of namespaces, specifically of unmemorable URLs all over the place and qnames (or CURIEs) in content.

Does GRDDL offer a way out? Could, for instance, the namespace name for Dublin Core metadata be assigned to the prefix “dc:” in an external file, linked via transformation to the document in question? Then it would be simpler, from a producer or consumer viewpoint, to simply use names like “dc:title” with no problems or ambiguity.

This could be especially useful not that discussions are reopening around XML in HTML.

As usual, comments welcome. -m

Monday, October 15th, 2007

XForms evening at XML 2007

Depending on who’s asking and who’s answering, W3C technologies take 5 to 10 years to get a strong foothold. Well, we’re now in the home stretch for the 5th anniversary of XForms Essentials, which was published in 2003. In past conferences, XForms coverage has been maybe a low-key tutorial, a few day sessions, and hallway conversation. I’m pleased to see it reach new heights this year.

XForms evening is on Monday December 3 at the XML 2007 conference, and runs from 7:30 until 9:00 plus however ERH takes on his keynote. :) The scheduled talks are shorter and punchier, and feature a lot of familiar faces, and a few new ones (at least to me). I’m looking forward to it–see you there! -m

Monday, October 1st, 2007

simple parsing of space-seprated attributes in XPath/XSLT

It’s a common need to parse space-separated attribute values from XPath/XSLT 1.0, usually @class or @rel. One common (but incorrect) technique is simple equality test, as in {@class=”vcard”}. This is wrong, since the value can still match and still have other literal values, like “foo vcard” or “vcard foo” or ” foo vcard bar “.

The proper way is to look at individual tokens in the attribute value. On first glance, this might require a call to EXSLT or some complex tokenization routine, but there’s a simpler way. I first discovered this on the microformats wiki, and only cleaned up the technique a tiny bit.

The solution involves three XPath 1.0 functions, contains(), concat() to join together string fragments, and normalize-space() to strip off leading and trailing spaces and convert any other sequences of whitespace into a single space.

In english, you

  • normalize the class attribute value, then
  • concatenate spaces front and back, then
  • test whether the resulting string contains your searched-for value with spaces concatenated front and back (e.g. ” vcard “

Or {contains(concat(‘ ‘,normalize-space(@class),’ ‘),’ vcard ‘)} A moment’s thought shows that this works well on all the different examples shown above, and is perhaps even less involved than resorting to extension functions that return nodes that require further processing/looping. It would be interesting to compare performance as well…

So next time you need to match class or rel values, give it a shot. Let me know how it works for you, or if you have any further improvements. -m

Wednesday, September 26th, 2007

Recruitment picking up?

In the last few weeks, I’ve been getting more recruitment pitches, including from the well known person ________ who is now at _______, for a think-tank position with _______, multiple LinkedIn requests from Web 2.0 company ________ and even ________.

So, is this a sign that the general industry is picking up? -m

P.S. I’m not looking. :)

Saturday, September 8th, 2007

Steven Pemberton and Michael(tm) Smith on (X)HTML, XForms, mobile, etc.

Video from XTech, worth a look. -m

Thursday, August 16th, 2007

HOWTO: find the VTA light rail schedule

There’s a tram that goes by Yahoo, very convenient. Here’s a simple, step-by-step guide to finding schedule information through the official website.

  • Navigate to
  • Click on “Schedules, Maps and Fares” in the left sidebar.
  • Click on “Route Schedules and Route Maps”.
  • Scan down to the bottom third of the page. Click “Light Rail Schedules”.
  • Click on “Mountain View to Winchester (902)” (You just have to know this, even if your journey takes you nowhere near either of those places).
  • Scroll below the fold. Try to figure out whether you need “Northbound” or “Southbound” service, even though it runs predominantly east-west in this neighborhood.
  • If you guessed wrong, click ‘back’ and try again.
  • Scroll down the huge table until you find the information you need.

See, it’s easy. Now, if you manage a web site, is yours this simple and usable? -m

Wednesday, August 8th, 2007

New W3C Validator

Go check it out. It even has a Tidy option to clean up the markup. But they missed an important feature: it should include an option to run Tidy on the markup first then validate. This is becoming the defacto bar for web page validity anyway… -m

Sunday, May 27th, 2007

Everything is Miscellaneous recap

Everyone gets so much information all day long that they lose their common sense. –Gertrude Stein

…the solution to the overabundance of information is more information. –David Weinberger in Everything is Miscellaneous.

Weinberger’s book is a great read, taking you to lots of different places–from a prototype Staples store to the underground Bettmann Archive, and meeting a variety of different folks from Linnaeus to Dewey. It’s the kind of book that attaches itself to a particular idea and riffs on it at length, covering lots of details and implications that show great insight, but yet seem obvious after reading.

The book posits three “orders of order”. The first I’d call the problem of atoms. Arranging physical things, whether a library or your silverware drawer is primarily limited by the physical realm. If you have only one copy of a particular book, it has to go on a shelf somewhere. If you have two copies, you need to decide whether to keep them together or separately, possibly increasing the chances of losing track of one.

The second order I’d call solving the problem of atoms with more atoms, the prototypical example being a library card catalog. It’s still physically limited: a card can only hold so much metadata, but at least it’s easy to have multiple cards for a given book, making it easier to find something based on your choice of title, author, subject, or something else. Notably, many of the early online efforts have been straight translations of the second order into the digital realm.

But the third order is something else altogether, solving the problem of bits with more bits, as the leading quote indicates. I get the impression that some sites, like Amazon and, are getting closer to Weinberger’s third order, but none have fully achieved it yet. The third order fully blows the doors off of the constraints of atoms, which we’ve spent the last few thousand years developing and getting used to.

If you’re geeky enough to be reading this here, you’ll be familiar with many of the lines of thought found in this book: Wikipedia vs. Britannica; implied vs. concrete hierarchies or ontologies; centralized vs. decentralized control; Semantic Web vs. “smushiness”.

My complaint is that for a book that talks about the three orders of order, the text itself is firmly tied to first order atoms. Sure, there’s a ongoing blog with tagging, comments, etc. but to actually read the text, you need to find your way through a book store, card catalog, or online bookstore to get it. Sure, there’s no third order of economics (yet), but still I would have liked to see something more.

So, what does this all mean for a company like Yahoo!? That’s the question I’m working on now. Stay tuned. -m

Monday, April 30th, 2007

Why does ‘rich client’ equal ‘bad separation of presentation from content’?

I started writing this post back when doing tech editing the “Rich Client Alternatives” chapter on Web 2.0, the book. Now, with Apollo getting some attention, it’s worth revisiting.

What do XUL, Yahoo! Widgets, OpenLaszlo, Silverlight, and Apollo have in common? All of them mix content with presentation to some degree. Years of experience on the web have shown that a properly-done CSS layout gives you:

  • smaller, faster pages
  • better accessibility and user control of rendering
  • better adaptation to different screen resolutions
  • easier repurposing of data, including microformats
  • better mobile compatibility

Initial HTML browsers didn’t have these advantages, and gave in to early pressure to implement things like blink and font tags. Today, most webfolks would admit that these presentational tags were a mistake, and contemporary web design avoids them.
So what is it about “rich” clients that’s different? Are developers missing out on the hard lessons learned on the web? Or is there something inherent in the definition of “rich clients” that changes the balance? Your comments are welcome. -m

Wednesday, April 11th, 2007

James Clark blog: do you read the web or feed version?

James Clark is blogging. A few zillion people have already mentioned this.

A slightly tangent observation: I had trouble reading through an entire article in web form, but had no problems returning later to the atom feed. At first I chalked it up to early morning grogginess, but it seems to be a repeatable phenomenon at all hours, at least for me.

So a double thanks to James for publishing a full feed.

How about you: do you have an easier time reading long form articles in a feed reader vs. a browser?  Do you prefer feed reader vs. browser for this blog? Comment below. -m

Sunday, April 1st, 2007


I can’t talk on the phone right now. Can you follow up on email?
Consider it placed on my todo list.
Let me give you my new address.
Hmm, I don’t have it.
What are you talking about?

(If you get the pattern, post below…) -m

Friday, March 9th, 2007

HTTP question

OK, RESTafarians and HTTP experts, here’s a question. Is it kosher to send a Location: header back with an ordinary, say 200, response?

Scenario: the server knows better than the client what the client needs. ‘I realize you asked for, but instead I’m sending you — ready or not, here it comes..’

Tuesday, February 13th, 2007

changes the architecture of the house, not just the color of the paint

ERH’s comments on XForms, as part of his predictions for 2007. Worth a read. -m

Thursday, February 1st, 2007

Is HTML on the web a special case?

Some random thoughts and responses to lots of blog discussion sparked by the XML2 article, where I asked “Is HTML on the Web a special case?”

By which, I mean, if you go through all the effort of writing down all the syntax rules used by the union of browsers that you care about, then go through the pain of getting consensus within a standards body, will the resulting document be useful beyond HTML on the Web, much like how XML is useful beyond being a vehicle for XHTML?

I don’t know if Tim Bray had that same version of the question in mind, but he answers “obviously ‘yes'”.

But I don’t think so. Once you have that set of rules, wouldn’t it be useful in other areas, say, notoriously RSS on the web? SVG? MathML? In fact, I’d go as far as saying that any hand-authored markup would be a candidate for XML2 syntax.

What about mobile? Anne van Kesteren responds:

in that article Micah Dubinko mentions mobile browsers living up to their premise and all that. What he says however, isn’t really true. Mobile browsers and XHTML is tag soup parsing all the way.

He links to this page, which does a rather poor job of making a point the author seems to have decided upon before starting the experiment. If you look at the specific test cases, one tests completely bizarro markup that no author or tool I can imagine would ever produce. Another test checks the handling of content-type, not markup. On the other axis, the choices there seem a bit jumbled: lists of user-agent strings, one for stock Mozilla, and a footnote indicating confusion about what browser is really in use. If anything, this page shows that the browsers tested here, with the exception of Opera Mini, are crap. If you spend more than a few minutes in mobile, you’ll discover this widespread trend. (And I’m working on a solution…watch this space).

Look at this from a pragmatic viewpoint. Check the doctype used on Yahoo! front page vs mobile front page. Despite the poor browsers, XHTML adoption is still farther ahead on the mobile web then the desktop web.

The last thing nagging at me (for now) is whether XML2 will have an infoset. Will it be possible to use XPath, XQuery, and XML tools on XML2 content? How well will these map to each other? In the strict sense, no, XML2 won’t have a conforming infoset because it will never include namespaces. But might it support a subset of the infoset? (Would that be a infosubset?) That’s a huge open question at this point. -m