Lightning Search, a Web Application

Take one part lightning-fast full-text search, one part XML, and one part rich client. Mix thoroughly... It's strange to have a search form with no submit button, but wonderful.

If you've ever spent any time on Orkut or Gmail, you'll notice that Google has perfected the art of eliminating needless round-trip refreshes.

The indexing is provided by David Mertz's gnosis code, as seen in Text Processing in Python. It's plenty fast on a TiBook, but searches on complete words only. It might be useful to have a stemming search, though that would probably require tries or some other data structure in order to be fast enough. Link: .

The XML is provided by a few dozen lines of Python, which implement an HTTP server, including GET access to the data, and an interface to the indexing/searching code.

The client piece uses XMLHTTP, the superglue of the Web, to submit a new lightning query every time the actual text in the query field changes, then dynamically display the results. It certainly feels faster to not have a full-page roundtrip, and the feeling of immediacy helps you experiment with different queries, which has already led to one interesting discovery within my own data.

Hey, what about XForms? As it is, this works 'out of the box' with FireFox or Safari. With a minor tweak, it would work in IE as well, though I don't anticipate ever accessing this via IE. XForms could probably do it without a bit of script on the client, and would be easier to maintain, but as it stands today, it wouldn't work 'out of the box'.

It's interesting--XForms is kind of subversive in that it can be (or in some cases, already has been) implemented on top of almost any web app platform, like IE+HTC+XMLHTTP, or Mozilla+XUL+WebExtras, or Flash, or soon, XAML/Avalon. -m

Dubinko on the net

Other than my family, I don't personally know of any other Dubinko's. But the Internet is great for research...

First which appears to be some kind of hotel or house for rent. Anyone fluent in German care to translate?

At we have Laura Dubinko, apparently in Oklahoma.

From MSNBC we learn of a space museum tour gide Yelena Dubinko

Something sports-related:, Tatjana Dubinko.

A fellow author Svetlana Dubinko writes books to help Russian-speakers get up to speed on English.

Looking for love is Olga Dubinko

On Amazon full-text search, you can find a reference to a "Dubinko, G A who wrote a 1966 research paper on DNA synthesis, referenced from a $344 book.

Then, there's Vladimir Dubinko, who I have briefly corresponded with.

Have you ever met another Dubinko? Can you translate any of these links? Let me know. -m

Elliotte Rusty Harold lauds XForms

ERH likes what he sees about XForms at WWW2004. Some quotes:

"...going to be a key part of development in the very near future, with an exponential growth rate for the next couple of years."

"Unlike the semantic web, it does not require learning completely new and unfamiliar areas of technology..."

"...much better designed than HTML forms ever were."

"...XForms is a compelling enough story to displace IE."



A short story.

Movie run. The friends gather to make the all-important decision about which movie to see. They banter about, as friends do, and finally manage rough consensus, though a few stragglers in the back keep unusually quiet.

The group settles into the darkened theater, just in time for the previews. The opening credits roll. Suddenly, the stragglers stand up. "Hey, we don't like this movie, let's go somewhere else." -m

Web Applications and Compound Documents

Must-read: the Mozilla/Opera position paper. And naturally my two pence.

First: Like nearly all real-world requirements, there are mutually-contradictive aspects, so it will be interesting to see how things get resolved. -m

InfoWorld Innovators Award

I am thrilled to have won the InfoWorld Innovators 2004 award for my work on XForms.

The only other of this year's awardees that I've met is Miguel de Icaza. Congrats! -m


What's up with the CNet RSS feeds? As viewed in NetNewsWire, they have just a headline. No link. No summary. Is it just me, or does this miss the point of having RSS in the first place? -m

iTunes GIGO

A surprising 7% of all artists listed in my iTunes library are duplicates with minor variations.

There's "B B King" vs. "B.B. King" vs. "B. B. King", of course, but the most annoying thing is a preponderance of backticks instead of normal apostrophe characters. I see these things everywhere, but most notably in CDDB. Once the bad data gets into a system like that, it's pretty hard to get it out. -m

XML, Hack Thyself

Today XMLhack went on hiatus, a sign of the general health of the XML world, and an occasion meriting a rant.

XML is in tough shape. In short, it's not fun to work with anymore. Now, "fun" may sound juvenile, but it's important--things that aren't fun get studiously avoided whenever possible, and engineers are incredibly good at redefining what's possible. XMLhack was run by volunteers, including me. When a volunteer effort is no longer fun, it's better for it to gracefully go dark than linger on in digital malnourishment.

What happened? In the last 6 years Moore's law has given us CPU power about 16 more powerful than 1998, when XML--that sleek, SGML-bloat-avoiding tiger--was unleashed. One person could easily keep all of XML in her head, even including the basics of related technologies like CSS. Now you'd need a team of about 16 to keep on top of things, and it's getting worse.

What happened? For one, XML Namespaces set a terrible (and much-imitated) precedent of exposing things in markup--making machine-processing easier and human-processing harder. That's the wrong way--it's the computers, not the humans, that are getting more powerful practically by the second. XML hit the need-to-refactor point much earlier on the curve than SGML ever did.

What's happening? More WS-horribleKluge "specifications" than I can keep up with, even enough to know what the acronyms mean. DTD-dependence that lingers on, zombie-like. Popular tools that fail to meet even basic conformance. Multiple, fragmented, incompatible versions of RSS.

Is anyone interested enough in reviving XML? Is anyone probing techniques like alternate namespace approaches (Java and Python, for example, seem to have good namepsace handling) enough to come up with a cogent propsal for work on 'XML-next'? Sounds like a promising research project to me.

In the final message on XMLhack, Edd Dumbill provides some light at the end of the tunnel: xml coverage lives on at, all the "cool" URIs on the site will remain active, and finally, never say never. -m

iTunes backup hint

When copying your iTunes library off to a another server for backup purposes, first to this: in the search box for the entire "Library", search for " (a double-quote character). If you see any, rename them or delete those characters. This will prevent pesky file errors when accessing that portion of the music tree. -m

The Upgrade Treadmill

It started out as a simple hard drive upgrade...

But my ancient BIOS just looked at 200 gigs, and said 'huh?'. So, I could either fiddle with getting a new PCI card to run under Linux, or just get a whole new motherboard. Well, I needed one anyway. Oh, and the motherboard needs a new processor and new memory. Fine. Except the thing doesn't start--not even a flicker of a POST on the screen. Turns out the new board doesn't like the old power supply. And it hardly makes sense to buy just a power supply without a new case. So, all said and done, I have an entirely new computer, except for the CD drive and monitor. Damage: $450 and four round trips to various computer stores.

On the positive side, a major thumbs-up to trusty Red Hat 8, which auto-detected all the hardware changes without a hitch, and kept on plugging along. I could use a better video driver for the ABit VA-10 onboard video, but since this thing is a server with the screen off 99% of the time, I can't even complain about that. Still, the new SuSE 9.1 beckons... -m

Life, the Universe, and Everything

File this away for later: From the Vatican Observatory. Intersting patterns of the intersection between religion and science. -m

What I'm Reading

You can tell alot about someone by what they read. I recently got my pick of any 5 O'Reilly books, and it goes even beyond the 'What I'm Reading' links on the web site and RSS feed...

I picked:

Cascading Style Sheets, 2nd ed

Lex & Yacc

Hardcore Java

Java Cryptogrophy

Security Warrior

I bit Java-heavy, I'll admit. I chalk that up mostly to already having most other O'Reilly books in areas of interest! The CSS book is as good as you'd expect. The Lex book is a follow-on to Text Processing in Python and the Dragon book. And Securiy Warrior just looked interesting, something new to try with little risk. -m

With great power comes ...

Help me fill in the blank. Send email to the my listed contact address. Bonus points for humor. -m

Update: Luther says "Microsoft stock options". Any more?

Feringi Rules of Acquisition

Some food for thought: -m

XML Hacks Includes a hack by Y.T. -m

Market-driven TLDs

There are some discussions on www-tag about how (and if) to handle new top-level domains, like .mobi. Here's my idea...

Do we really need more TLDs? Maybe we need less. Like one. Then everyone would be free to choose and use second and third-level names that have meaning within the purpose for which they're used.

OK, so that's a bit over the top. But that scenario isn't much different than just allowing unlimited top-level domains. So, what would happen the existing TLDs were managed more-or-less as-is, but new ones could be had by anyone, for say US$1 million to a trust fund for advancement of the Internet? Each TLD would then be administered by the organization that owns in, including managing subdomains.

Example: the Coalition for Youth Safety could buy .kids, then hand out free or low-cost subdomains, and set up a Terms of Use agreement that states what kind of content is or is not allowed under that domain. Per the agreed-upon terms, violators get their name yanked until they comply. Now, if someone comes along and thinks this is censorship, they are free to buy their own TLD, and run it however they wish.

If someone wants to try to set up .mobi, they are free to go for it. I imagine they'd run into technical issues defining any kind of resriction for what kinds of devices can and can't access the content, but the market will deal with that kind of shortsightedness appropriately. Bad ideas will die out on their own, and the entry fee will discourage some of the most bone-headed ones.

Does this devalue existing domains? Yes and no. It does break down some of the artificial limitations in the current system. On the other hand, it opens up all kinds of new market possibilites. anyone? -m

The Grand List of Overused Science Fiction Clich├ęs -m

Rebuttal to Hixie

Ian Hickson is a sharp guy, but he swings and misses with this post...

In a posting on comp.mozilla.devel.layout (no link, google it if you want), he writes:

"XForms does not in any way alleviate the need for server-side validations since the server can never trust the client and therefore has to do all the validation anyway."

OK, so instead of the classic approach of writing an unmaintainable pile of JavaScript on the client and an unmaintainable pile of Perl on the server, now you can write one unmaintainable pile of XForms and run it in both places. Seems like a win to me. :P

"Note that XForms' leveraging of so many standards is one of its main downfalls as far as implementations go (you have to implement god only knows how many specs before XForms is even on the radar)."

"god only knows"?? That sounds like an emotionally-driven argument rather than a factual one. Here's some non-divine facts: there are 16 normative references in XForms. This compares to 26 for HTML 4.01. And the 16 includes references made purely for terminology and background, like the requirements document, RFC 2119, and XHTML Modularization.

It basically comes down to a few obvious things: XML, XMLNS (yep, it's a W3C spec), and RFC 2388 for multipart/form-data. Then the real work: XPath, XML Events, RFC 2387 for multipart/related, and a sliver of WXS datatypes, avoiding the gnarly bits for XForms Basic. That will get you much more than 'on the radar'.

"XForms has zero synergy with HTML..."

XForms is part of XHTML 2.0.

There's plenty more that could be said, if it was worth debating every little point.

XForms isn't perfect. Like all recent W3C work, it's heavily influenced by namespaces, and not for the better.

I don't think XForms is a perfect fit in the browser core, at least not right away. What we need is better plug-in systems so that upon encountering a web page with xforms:model, or svg:rect, or whatever, the proper component can be downloaded and installed. Now, there's a competitive feature that would put IE to shame. -m


An example from Antione Quint: -m

Sun Policy on Public Discourse

Tim Bray:

Things like this pile up in background browser windows, until I am able to "do something" with them. Recording them in a readily searchable place is a good choice. World-readable turns out to be not so bad either. -m

Wicked Problems

Another for the record: -m

Dictionary of Algorithms and Data Structures

Picked this up from Mark Baker's feed, but I'll need it here when I search for it later. -m

Cross Platform Hedge

I've worked with some amazing software to store all my stuff. In roughly chronological order: Zoot, Microsoft Office OneNote, NoteTaker, and StickyBrain. As great as all these are, they run on only a single platform (the first two on Windows, the last two on OS X) Even though I'm currently happy with OS X, I won't settle for a single-platform solution.

Data, the stuff I care about here, is too important to limit to a single platform. It's still all about the data. -m

It's the data, stupid

Welcome to the first new entry written from my own personal content management system, code-named "It's the data, stupid". Unlike most other CMSes, the content remains fully accessible, even when none of the software is actually running. This is accomplished by storing all the important data as plain text, UTF-8 files in directories.

Technically, it's pretty bare metal. It's largely based on David Mertz's Text Processing in Python, which includes several plain-text to XHTML converters. Add in some XSLT templates to produce a fully formatted XHTML page, as well as the RSS, and you're off.

Why do this? Basically, it gives me a platform I have full control over, down to the last line of source, as well as local storage of my information in a format of my choosing. It's part of a broader effort on my part to get all of my notes, writings, journals, news clippings, and anything else textual under a common umbrella that can last me the rest of my life. Side projects include a full-text indexer and local access to all information and actions via REST interface.

Along with this goes a tiny site redesign and a new location for the RSS feed (which is now 1.0--and should be redirecting.) If you notice any problems, send me email at the address on the web page. -m

(repost) Write IE browser extension in XForms

The indefatigable Mark Birbeck pointed me to this: --a toolkit to write IE sidebars in pure XForms. Included are Amazon and Google search. This is a sign of changes to come in the development of Internet Apps. -m


Terms of use

For external use only. I doubt the enforcability of click-through licenses anyway. Copyright 2004 Micah Dubinko. All rights reserved.


Older stuff here