Archive for the 'commercialism' Category

Thursday, April 26th, 2012

MarkLogic World 2012

I’m getting ready to leave for MarkLogic World, May 1-3 in Washington, DC, and it’s shaping up to be one fabulous conference. I’ve always enjoyed the vibe at these events–it has a, well, cool-in-a-data-geeky-way thing going on (like the XML conference in the early 2000′s where I got to have lunch with James Clark, but that’s a different story). Lots of people with big data problems will be here, and I always enjoy talking to these kinds of people.

I’m speaking on Wednesday at 3:30 with Product Manager extraordinaire Justin Makeig about big data visualization. If you’ll be at the conference, come look me up. And if you won’t, well, forgive me if I need a few extra days to get back to any email you send this way.

Follow me on Twitter and look for the #MLW12 tag for live coverage.

-m

Sunday, April 15th, 2012

Actually using big data

I’ve been thinking a lot about big data, and two recent items nicely capture a slice of the discussion.

1) Alex Milowski recounting working with Big Weather Data. He concludes that ‘naive’ (as-is) data loading is a “doomed” approach. Even small amounts of friction add up at scale, so you should plan on doing som in-situ cleanup. He came up with a slick solution in MarkLogic–go read his post for details.

2) Chris Dixon on Making Large Datasets Useful. Typical approaches like machine learning only solve 80-90% of the problem. So you need to either live with errorful data, or invoke manual clean-up processes.

Both worth a read. There’s more to say, but I’m not ready to tip my hand on a paper I’m working on…

-m

Wednesday, February 1st, 2012

Googlebot submitting Flash forms

I’m sure this is old news by now, but here’s one more data point.

As it turns out, XForms Institute uses an old skool XForms engine written in Flash, dating approximately back to the era when Flash was necessary to do XForms-ey things in the browser. The feedback form for the site is, quite naturally, implemented in XForms. Submissions there ultimately make it into my inbox. Here’s what I see:

Tue Jan 31 12:19:22 2012 66.249.68.249 Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

An iPhone running Flash? I doubt it. That’s quite an agent string! Organic versioning in the wild. -m

Sunday, January 15th, 2012

Five iOS keyboard tips you probably didn’t know

Check out these tips. The article talks about iPad, but they work on iPhone too, even an old 3G.

One one hand, it shows the intense amount of careful thought Apple puts into the user experience. But on the other hand, it highlights the discovery problem. I know people who have been using iOS since before it was called iOS, and still didn’t know about these. How do you put these kinds of finishing touches into a product and make sure the target audience can find out about them? -m

Thursday, December 8th, 2011

Resurgence of MVC in XQuery

There’s been an increasing amount of talk about MVC in XQuery, notably David Cassel’s great discussion and to an extent Kurt Cagle’s platform discussion that touched on forms interfaces. Lots of Smart People are thinking in this area, and that’s a good thing.

A while back I recorded my thoughts on what I called MET, or the Model Endpoint Template organizational pattern, as used in MarkLogic Application Builder. One difference between 2009 and now, though, is that browsers have distanced themselves even farther from XML, which tends to undercut the eliminate-the-impedance-mismatch argument. In particular, the forms model in HTML5 continues to prefer flat data, which to me indicates that models still play an important role in XQuery web apps.

So I envision the app lifecycle like this:

  1. The browser requests a particular page, say the one that lets you configure sorting options in the app you’re building
  2. An HTML page loads.
  3. Client-side script requests the project state from a designated endpoint, the server transforms the XML into a flat list, and delivers it as JSON (as an optimization, the server can package the initial data into the page delivered in the prior step)
  4. Standard form interaction and client-side scripting happens, including manipulation of repeating structures mediated by JavaScript
  5. A standard form submit happens (possibly via script), sending a flat list back to the client, which performs an update to the stored XML.
It’s pretty easy to envision data-mapping tools and libraries that help automate the construction of the transforms mentioned in steps 3 and 5.

Another thing that’s changed is the emergence of XQuery plugin technology in MarkLogic. There’s a rapidly-growing library of reusable components, initially centered around Information Studio but soon to cover more ground. This is going to have a major impact on XQuery app designs as components of the app (think visualization widgets) can be seamlessly added to apps.

Endpoints still make a ton of sense for XQuery apps, and provide the additional advantage that you now have a testable, concern-separated data layer for your app. Other apps have a clean way to interop, and even command-line operaton is possible with off-the-shelf-tools like wget.

Lastly, Templates. Even if you use plugins for the functional core of your app, there’s still a lot of boilerplate stuff you’d not want to repeat. Something like Mustache.xq is a good fit for this.

Which is all good–but is it MVC? This organizational pattern (let’s call it MET 2.0) is a lot closer to it. Does MET need a controller? Probably. (MarkLogic now ships a pretty good one called rest:rewrite) Like MVC, MET separates the important essences of your application. XQuery will never be Ruby or Java, and its frameworks will never be Rails or Spring, but rather something uniquely poised to capture the expressive power of the language to build apps on top of unstructured and big data. -m

Tuesday, November 1st, 2011

5 things to know about MarkLogic 5

MarkLogic 5 is out today. Here’s five things beyond the official announcement that developers should know about it:

  1. If you found the CQ sample useful, you’ll love Query Console, which does everything CQ does and more (syntax highlighting!)
  2. Better Search API support for metadata: MarkLogic has always had support for storing metadata separately from documents. With new Search API support, it’s easy to set up, and it works great with databases of binary documents.
  3. The Hadoop connector, while not officially supported in this configuration, works on Mac. I know a lot of developers use Mac hardware. Once you get Hadoop itself set up (following rules like these), everything works great in my experience.
  4. “Fields” have gotten more general and more powerful. If you haven’t set aside named portions of your documents or metadata for special indexing and access, you should look in to this feature–it will rock your world.
  5. To better understand what your system is doing at any point in time, you can now use the built-in Monitoring Dashboard, which runs in-browser.
And let’s not leave out the Express license, which makes it easier to get started. Check it out.
-m

Monday, May 30th, 2011

Good to Great

One book that Ken Bado, the MarkLogic President and CEO, likes to talk about is Good to Great, (subtitled why some companies make the leap… and others don’t), a result of many man-years of meticulous research.

There’s plenty to think about in this book. It talks about the qualities of a “level 5″ executive: the best have a paradoxical mixture of personal humility and iron will. It talks about getting the right people on the bus, and only then deciding where the bus is going. It talks about a culture where brutal facts surfacing is the normal and expected behavior, resulting in a culture of both discipline and faith in the future. Perhaps the key point of the book is the venn diagram that depicts “great” companies as focusing on the intersection of passion, what they can be the best at in the world, and what drives their economic engine.

The structure of the book is based on 11 key companies that passed several rigorous metrics, including an at-least-15-year period of good financial performance, followed by a turning point and an at-least-15-year period of greatness, that is, returns well above the general and industry markets. (Perhaps unfairly, companies that were in the ‘great’ bucket continuously, with no periods of merely ‘good’ performance, were excluded).

Two of the companies in the list: Fannie Mae and Wells Fargo, raised the eyebrows of this fresh reader. Both of them have been prominently in the headlines in the last few years, and not in a good way. In particular the depictions of Wells Fargo struggling with deregulation in the 80s seem galling to read with the hindsight of going through the Great Recession. Circuit City, another of the good-to-great companies, declared bankruptcy in 2009. The book itself cautions about tough times at Gillette and Nucor in the Epilogue section.

I bring this out not to be negative, but to emphasize that this is a soft discipline, not science. If there are companies that have consistently beat the market from the 80s until today with no serious hiccups, that would be truly remarkable. But there’s lots of hidden variables, the system is chaotic, and mere financial numbers are too shallow a measure by which to measure greatness. A company that can truly follow these principles will almost certainly do better than one that doesn’t. Just look at Yahoo for a negative example.

In particular, I’m thinking the three circles are a good way to approach life, though I sincerely hope an individual’s third circle isn’t about optimizing finances. What can you be the best in the world at, have pasion for, and drive your personal satisfaction engine? Maybe that would be a good area to focus your limited resources on. -m

Thursday, February 17th, 2011

MarkLogic in the news

What’s that on your TV screen? Why, it’s MarkLogic, again.

Why President Obama Picked the Bay Area

And it’s true, we’re hiring big time. Maybe your resume should be in that pile… -m

Wednesday, January 5th, 2011

Why I am abandoning Yahoo! Mail (and why you should too)

This is a non-technical description of why Yahoo! Mail is unsafe to use in a public setting, and indeed at all. I will be pointing people at this page as I go through the long process of changing an address I’ve had for more than a decade.

What’s wrong with Yahoo Mail?

A lot of web addresses start with http://–that’s a signal that the “scheme” used to deliver the page to your browser is something called HTTP, which is a technical specification that turns out is a really good way to move around web pages. As the page flows to the browser, it’s susceptible to eavesdropping, particularly over a wi-fi connection, and much more so in public, including the usual hotspots like coffee shops, but also workplaces and many home environments. It’s the virtual equivalent of a postcard. When you’re reading the news or checking traffic, it’s not a big deal if someone can sneak a glance at your page.

Some addresses start with https://–notice the extra ‘s’ which stands for “secure”. This means two things 1) that the web page being sent over is encrypted, and thus unavailable to eavesdroppers, and 2) that the people running the site had to obtain a certificate, which is a form of proof of their identity as an organization (that they’re not, say, Ukrainian phishers). Many years ago, serving pages over https was considered quite expensive in that servers needed much beefier processors to run all that encryption. Today, while it still requires extra computation, it’s not as big of a deal. Most off-the-shelf servers have plenty of extra power. To be fair, for a truly ginormous application with millions of users like Yahoo Mail, it is not a trivial thing to roll out. But it’s critically important.

First, to dispel a point of confusion, these days nearly every site, including Yahoo Mail, uses https for the login screen. This is the most critical time when encryption is needed, because otherwise you’d be sending your password on a postcard for anyone with even modest technical skills to peek at. So that’s good, but it’s no longer enough. Because sites are written so that you don’t have to reenter your password on every single new page, they use a tiny bit of information called a “cookie” in your browser to stay logged in. Cookies themselves are neither good nor bad, but if an eavesdropper gets a hold of one, they can control most of your account–everything that doesn’t require re-entering a password. In Yahoo Mail this includes reading any of your messages, sending mail on your behalf, or even deleting messages. Are you comfortable allowing strangers to do this?

As I mentioned earlier, new, more powerful tools have been out for months that automate the process of taking over accounts this way. Zero technical prowess is needed, only the ability to install a browser plug-in. If there are any web companies dealing in personal information for which this wasn’t a all-hands-on-deck security wake-up, they are grossly negligent. Indeed, other sites like Gmail work with https all-the-time. But still, in 2011, Yahoo Mail doesn’t. I have a soft spot for Yahoo as a former employer, and I want to keep liking them. Too bad they make it so difficult.

The deeper issue at stake is that if this serious of an issue goes unfixed for months, how many lesser issues lurk in the site and have been around for months or years? The issue is trust, my friend, and Yahoo just overdrew their account. I’m leaving.

FAQ

Q: So what do you want Yahoo to do about this?  A: Well, they should fix their site for their millions of remaining users.

Q: What if they fix it tomorrow? Will you delete this message?  A: No. Since I no longer trust the site, I am leaving, even though it takes time to notify all the people who still send me mail, and no matter what other developments unfold in the meantime. This page will explain my actions.

Q: Do you really want everyone else to leave Yahoo Mail?  A: No, only those who care about their privacy.

Q: What’s your new email address?  A: I have a couple, but <my first name> @ <this domain> is a good general-purpose one.

I will continue to update this page as more information becomes available. -m

Saturday, December 4th, 2010

Yahoo Mail’s inexplicable, inexcusable lack of https support

Dear Yahoo,

What’s the deal? Shortly after FireSheep was announced on Oct 24, 2010, you should have had an emergency security all-hands meeting. You should have had an edict passed down from the “Paranoids” group to get secure or else. Maybe these things happened–I have no way of knowing.

But it is clear that it’s been 6 weeks and security hasn’t changed. It’s simply not possible to read Yahoo mail over https–try it and you get redirected straight back to an insecure channel. As such, anyone accessing Yahoo mail on a public network, say a coffee shop or a workplace, is vulnerable to having their private information read, forwarded, compromised, or deleted.

Wait, did I say 6 weeks?–SSL had apparently been rolled out for mail more than 2 years ago, but pulled back due to problems. Talk about failure to execute.

I feel like I missed an announcement. What’s the deal, Y? Show me that you care about your users. No excuses.

Sincerely,

-m

Sunday, August 22nd, 2010

Eulogy for SearchMonkey

This is indeed a sad day for all of us, for on October 1, a great app will be gone. Though we hardly had enough time during his short life to get to know him, like the grass that withers and fades, this monkey will finish his earthly course.

Updated SearchMonkey logo

Photo by Micah

I know he left many things undone, for example only enhancing 60% of the delivered result pages. He never got a chance to finish his life’s ambition of promoting RDFa and microformats to the masses or to be the killer app of the (lower-case) semantic web. You could say he will live on as “some of this structured data processing will be supported natively by the Microsoft platform”. Part of the monkey we loved will live on as enhanced results continue to flow forth from the Yahoo/Bing alliance.

The SearchMonkey Alumni group on LinkedIn is filled with wonderful mourners. Micah Alpern wrote there

I miss the team, the songs, and the aspiration to solve a hard problem. Everything else is just code.

Isaac Asimov was reported to have said “If my doctor told me I had only six minutes to live, I wouldn’t brood. I’d type a little faster.” Today we can identify with that sentiment. Keep typing.

-m

Monday, July 26th, 2010

Microsoft’s new slogan

I wanted to say something snarky about Microsoft’s new slogan, but the comments on the linked article did a pretty good job already. Ahh snark, the unthinking-man’s eloquence. -m

Wednesday, July 7th, 2010

Grokking Selenium

As the world of web apps gets more framework-y, I need to get up to speed on contemporary automation testing tools. One of the most popular ones right now is the open source Selenium project. From the look of it, that project is going through an awkward adolescent phase. For example:

  • Selenium IDE lets you record tests in a number of languages, but only HTML ones can be played back. For someone using only Selenium IDE, it’s a confusing array of choices for no apparent reason.
  • Selenium RC has bindings for lots of different languages but not for the HTML tests that are most useful in Selenium IDE. (Why not include the ability to simply play through an entire recorded script in one call, instead of fine grained commands like selenium.key_press(input_id, 110), etc.?)
  • The list of projects prominently mentions Selenium Core (a JavaScript implementation), but when you click through to the documentation, it’s not mentioned. Elsewhere on the site it’s spoken of in deprecating terms.
  • If you look at the developer wiki, all the recent attention is on Web Drivers, a new architecture for remote-controlling browsers, but those aren’t mentioned in the docs (yet) either.

So yeah, right now it’s awkward and confusing. The underlying architecture of the project is undergoing a tectonic shift, something that would never see public light of day in a proprietary project. In the end it will come out leaner and meaner. What the project needs in the short term is more help from fresh outsiders who can visualize the desirable end state and help the ramped and productive developers on the project get there.

By the way, if this kind of problem seems interesting to you, let me know. We’re hiring. If you have any tips for getting up to speed in Selenium, comment below.

-m

Wednesday, June 9th, 2010

“Google syntax” for semantic queries?

Thought experiment: are there any commonly-expressed semantic queries–the kind of queries you’d run over a triple store, or perhaps a SearchMonkey-annotated web site–expressible in common type-in-a-searchbox query grammar?

As a refresher, here’s some things that Google and other search engines can handle. The square brackets represent the search box into which the queries are typed, not part of the queries themselves.

[term]

[term -butnotthis]

[term1 OR term2]

["phrase term"]

[tem1 OR term2 -"but not this" site:dubinko.info filetype:html]

So what kind of semantic queries would be usefully expressed in a similar way, avoiding SPARQL and the like? For example, maybe [by:"Micah Dubinko"] could map to a document containing a triple like <this document> <dc:author> “Micah Dubinko”. What other kinds of graph queries are interesting, common, and simple to express like this? Comments welcome.

-m

Sunday, May 30th, 2010

Balisage contest: solving the wikiml problem

I wish I could say I had something to do with the planning of this: part of Balisage 2010 is a contest to “encourage markup experts to review and to research the current state of wiki markup languages and to generate a proposal that serves to de-babelize the current state of affairs for the long haul.”  To enter, you must propose a set of concrete steps (organizational, social, and/or technological) that will enable wiki content interchange, a real WYSIWYG editor, and/or wiki syntax standardization.

This pushes all of my buttons. It’s got structured documents, Web, parser geekery, writing, engineering, and standards. There’s a bunch of open source prior art, including PyXMLWiki, which I adapted from some fantastic earlier work from Rick Jelliffe.

Sadly, MarkLogic employees aren’t eligible to enter. Get your write-up done by July 15 and sent to balisage-2010-contest at marklogic dot com. The winner will be announced at Balisage and will take home some serious prize winnings, and also will be strongly encouraged (but not required) to give a brief summary (~10 minutes) of their winning entry.

Can’t wait to see what comes out of this. -m

Friday, May 14th, 2010

Geek Thoughts: verbing facebook

Facebook (v): to deliberately create an impenetrable computer user interface for purposes of manipulating users.

More collected Geek Thoughts at http://geekthoughts.info.

Tuesday, May 11th, 2010

XProc is ready

Brief note: The W3C XProc specification, edited by my partner-in-crime Norm Walsh, has advanced to Recommendation status. Now go use it. -m

Thursday, April 29th, 2010

DMC = developer.marklogic.com

The new MarkLogic developer site is up, cleaner, better organized, and more social. Even cooler, it’s an XSLT-heavy application running on a pre-release version of MarkLogic. The new blog gives some of the details of the new site and transition.

So, if you’re already a MarkLogic developer, this is a great resource. And if you’re not, the site itself shows how fast and simple it is to put together a XSLT and XQuery-powered app. -m

Friday, April 2nd, 2010

Recalibrating expectations of XML performance

Working at MarkLogic has forced me to recalibrate my expectations around XML-related performance issues. Not to brag or anything, but it’s screaming fast. Conventional wisdom of avoiding // in paths doesn’t apply, since that’s the sort of thing the indexes are made to do, and that’s just the start. Single milliseconds are now a noteworthy amount of time for something showing up in the profiler.

This is what XML was supposed to be like. Now that XML has fallen off the hype cycle, we’re getting some serious work done. -m

Thursday, March 18th, 2010

Kindle for Mac scores low on usability

Here’s my first experience with Amazon’s new Kindle client for Mac: After digging up my password and logging in, I was presented with a bunch of books. I picked the last one I’d been reading. It downloaded slowly, without a progress bar, then dumped me on some page in the middle. Apparently my farthest-read location, but I honestly don’t remember.

A cute little graphic on the screen said I could use my scroll wheel. I’m on a laptop, so I tried the two-finger drag–the equivalent gesture sans mouse… and flipped some dozens of pages in half a second. Now, hopelessly lost I searched for a ‘back’ button to no avail.  Perversely, there is a prominent ‘back’ button, but disabled. Mocking me.

This feels rushed. I wonder what could be pushing Amazon to release something so unfinished? -m

Monday, March 1st, 2010

Newsweek should never have been free

Andrew Zolli argues in Newsweek that online content should never have been free. I’m probably not the first one to make this profound observation–but if it were not for the free online edition of Newsweek (and link aggregator sites like Digg) I wouldn’t have read a single word of Newsweek in years, nor would I be linking to it as my previous sentence does… Maybe Newsweek is OK with that. -m

Monday, February 22nd, 2010

Mark Logic User Conference 2010

Are you coming? Link. It starts on May 4 (Star Wars day!) at the InterContinental Hotel in San Francisco. Guest speakers include Chris Anderson, Editor-in-Chief of Wired and Michelle Manafy, Editor-in-Chief of EContent magazine.

Early bird registration ends Feb 28. -m

Friday, January 15th, 2010

Economic indicators: recruiting picking up again

I got a personal email pitch from recruiters at both Facebook and Google, oddly enough both messages within a 3-minute window on a Monday morning. Hiring is on the uptick again, it seems. My team is still looking for the right front end engineer–someone who knows the JavaScript language in depth, how to use semantic HTML and CSS, AND all about browser quirks. Email me. -m

Sunday, January 3rd, 2010

Geek Thoughts: the ultimate real-time strategy game

Games like Farmville and the iPhone knock-off iFarm throw in a unique twist in the realm of strategy gaming: crops that get planted mature in “real time”. If a crop takes 24 hours to grow, then you need to literally wait the full 24 hours. Great for making an app “sticky” and getting users to repeatedly log in. Side fact: Farmville sells more virtual tractors in a day than real tractors sold in the US in a Year.

Game producers keep upping the ante in terms of real-time strategy games interacting with the real world. Take the latest for instance, a free iPhone app called Lose It!. Everything in this game runs in real-time–a game day is always a full 24 hours. Instead of conventional points, it uses “calories”, which are gained by the actual foods you physically eat, and subtracted via actual exercise. The app includes a massive database of food items and exercises to help you keep an accurate record, apparently on the honor system. The goal: to set a calorie target for each day and come in under it. A secondary scoring system is based on your own weight, though you will need an accurate scale (not included with the app) to measure it.

So far I’ve done pretty well at the game. I’ve averaged better than 1000 calories under my goal for the last several weeks, and have done well on the weight number too. And it’s pretty interesting to have a log of everything I’ve eaten. What will they think of next?

More collected Geek Thoughts at http://geekthoughts.info.

Monday, December 21st, 2009

Failure as the secret to success

Excellent article in Wired, perhaps a good explanation of my career. :-)

Dunbar observed that the skeptical (and sometimes heated) questions asked during a group session frequently triggered breakthroughs, as the scientists were forced to reconsider data they’d previously ignored.

Which sounds like a fairly typical spec review at Mark Logic. Hint: we’re hiring–email me.

-m

Friday, December 18th, 2009

Mark Logic Careers

Check out the updated careers page, including a quote from YT. If you’re looking for an amazing place to work, get in touch with me. In particular I’m looking for top-notch JavaScript/FE/UI people. -m

Monday, November 30th, 2009

The best thing you can do…

The best thing a user can do to advance the Web is to help move people off IE 6

– Ryan Servatius, senior product manager for Internet Explorer.

Source. -m

Sunday, November 29th, 2009

The Model Endpoint Template (MET) organizational pattern for XRX apps

One of the lead bullets describing why XForms is cool always mentions that it is based on a Model View Controller framework. When building a full XRX app, though, MVC might not be the best choice to organize things overall. Why not?

Consider a typical XRX app, like MarkLogic Application Builder. (You can download a your copy of MarkLogic, including Application Builder, under the community license at the developer site.) For each page, the cycle goes like this:

  1. The browser requests a particular page, say the one that lets you configure sorting options in the app you’re building
  2. The page loads, including client-side XForms via JavaScript
  3. XForms requests the project state as XML from a designated endpoint; this becomes the XForms Instance Data
  4. Stuff happens on the page that changes the client-side state
  5. Just before leaving the page, XML representing the updated state is HTTP PUT back to the endpoint

The benefit of this approach is that you are dealing with XML all the way through, no impedance mismatches like you might find on an app that awkwardly transitions from (say) relational data to Java objects to urlencoded name/value pairs embedded in HTML syntax.

So why not do this in straight MVC? Honestly, MVC isn’t a bad choice, but it can get unwieldy. If an endpoint consists of a separate model+view+controller files, and each individual page consists of separate model+view+controller files, it adds up to a lot of stuff to keep track of. In truly huge apps, this much attention to organization might be worth it, but most apps aren’t that big. Thus the MET pattern.

Model: It still makes sense to keep the code that deals with particular models (closely aligned with Schemas) as a separate thing. All of Application Builder, for example, has only one model.

Endpoint: The job of an endpoint is to GET and PUT (and possibly POST and DELETE) XML, or other equivalent resource bundles depending on how many media types you want to deal with. It combines an aspect of controllers by being activated by a particular URL and views by providing the data in a consistent format.

Template: Since XForms documents already contain MVC mechanics, it not a high-payoff situation to further use MVC to construct the XForms and XHTML wrapper themselves. The important stuff happens within XForms, and then you need various templating mechanisms for example to provide consistent headers, footers, and other pieces across multiple pages. For this, an ordinary templating mechanism suffices. I can imagine dynamic assembly scenarios where this wouldn’t be the case, but again, many apps don’t need this kind of flexibility, and the complexity that comes along with it.

What about separation of concerns? Oh yeah, what about it? :-) Technically both Endpoints and Templates violate classical SOC. In an XRX app, this typically doesn’t lead to the kinds of spaghetti situations that it might otherwise. Endpoints are self contained, and can focus on doing just one thing well; with limited scope comes limited ability to get into trouble. For those times when you need to dig into the XQuery code of an endpoint, it’s actually helpful to see both the controller and view pieces laid out in one file.

As for Templates, simplicity wins. With the specifics of models and endpoints peeled away, the remaining challenge in developing individual pages is getting the XForms right, and again, it’s helpful to minimize the numbers of files one XForms page are split across. YAGNI applies to what’s left, at least in the stuff I’ve built.

So, I’ve been careful in the title to call this an “organizational pattern”, not a “design pattern” or an (ugh) “architectural pattern”. Nothing too profound here. I’d be happy to start seeing XRX apps laid out with directory names like “models”, “endpoints”, and “templates”.

What do you think? Comments welcome.

-m

Sunday, November 22nd, 2009

How Xanadu Works: technical overview

One particular conversation I’ve overheard several times, often in the context of web and standards development, has always intrigued me. It goes something like this:

You know, Ted Nelson’s hypertext system from the 60′s had unbreakable, two-way links. It was elegant. But then came along Tim Berners-Lee and HTML, with its crappy, one-way, breakable links, and it took over the world.

The general moral of the story is usually about avoiding over-thinking problems and striving for simplicity. This has been rolling around in the back of my mind ever since the first time I heard the story. Is it an accurate assessment of reality? And how exactly did Nelson’s system, called Xanadu (R), manage the trick of unbreakable super-links? Even if the web ended up going in a different direction, there still might be lessons to learn for the current generation of people building things that run (and run on) the web.

Nelson’s book Literary Machines describes the system in some detail, but it’s hard to come by in the usual channels like Amazon, or even local bookstores. One place does have it, and for a reasonable price too: Eastgate Systems. [Disclosure: I bought mine from there for full price. I'm not getting anything for writing this post on my blog.] The book has a versioning notation, with 93.1 being the most recent, describing the “1993 design” of the software.

Pause for a moment and think about the history here. 1993 is 16 years ago as I write this, about the same span of time between Vannevar Bush’s groundbreaking 1945 article As We May Think (reprinted in full in Literary Machines) and Nelson’s initial work in 1960 on what would become the Xanadu project. As far as software projects go, this one has some serious history.

So how does it work? The basic concepts, in no particular order, are:

  • A heavier-weight publishing process: Other than inaccessible “privashed” (as opposed to “pub”lished) documents, once published, documents are forever, and can’t be deleted except in extraordinary circumstances and with some kind of waiting period.
  • All documents have a specific owner, are royalty-bearing, and work through a micropayment system. Anyone can quote, transclude, or modify any amount of anything, with the payments sorting themselves out accordingly.
  • Software called a “front end” (today we’d call it a “browser”) works on behalf of the user to navigate the network and render documents.
  • Published documents can be updated at will, in which case unchanged pieces can remain unchanged, with inserted and deleted sections in between. Thus, across the history of a document, there are implicit links forward and backward in time through all the various editions and alternatives.
  • In general, links can jump to a new location in the docuverse or transclude part of a remote document into another, and many more configurations, including multi-ended links, and are granular to the character level, as well as attached to particular characters.
  • Document and network addressing are accomplished through a clever numbering system (somewhat reminiscent of organic versioning, but in a way infinitely extensible on multiple axes). These address, called tumblers, represent a Node+User+Document+Subdocument, and a minor variant to the syntax can express ranges between two points therein.
  • The system uses its own protocol called FEBE (Front End Back End) which contains at several verbs including on page 4/61: RETRIEVEV (like HTTP GET), DELETEVSPAN, MAKELINK, FINDNUMOFLINKSTOTHREE, FINDLINKSFROMTOTHREE, and FINDDOCSCONTAINING [Note that "three" in this context is an unusual notation for a link type] Maybe 10 more verbs are defined in total.

A few common themes emerge. One is the grandiose scope: This really is intended as a system to encompass all of literature past, present, and future, and to thereby create a culture of intellect and reshape civilization. “We think that anyone who actually understands the problems will recognize ours approach as the unique solution.” (italics from original, 1993 preface)

Another theme is simple solutions to incredibly difficult problems. So the basic solution to unbreakable links is to never change documents.  Sometimes these solutions work brilliantly, sometimes they fall short, and many times they ends up somewhere in between. In terms of sheer vision, nobody else has come close to inspiring as many people working on the web. Descriptions of what today we’d call a browser would sound familiar, if a bit abstract, even to casual users of Firefox or IE.

Nothing like REST seems to have occurred to Nelson or his associates. It’s unclear how widely deployed Xanadu prototypes ever were, or how many nodes were ever online at any point. The set of verbs in the FEBE protocol reads like that a competent engineer would come up with. The benefits of REST, in particular of minimizing verbs and maximizing nouns, are non-obvious without a significant amount of web-scale experience.

Likewise Creative Commons seems like something the designers never contemplated.  “Ancient documents, no longer having a current owner, are considered to be owned by the system–or preferably by some high-minded literary body that oversees their royalties.” (page 2/29) While this sounds eerily like the Google Books settlement, this misses the implications of truly free-as-in-beer content, but equally misses the power of free-as-in-freedom documents. In terms of social impact there’s a huge difference between something that costs $0 and $0.000001.

In this system anyone can include any amount of any published document into their own without special permission. In a world where people writing Harry Potter Lexicons are getting sued by the copyright industry, it’s hard to imagine this coming to pass without kicking and screaming, but it is a nice world to think about. Anyway, in Xanadu per-byte royalties work themselves out according to the proportion of original vs. transcluded bytes.

Where is Google in this picture? “Two system directories, maintained by the system itself, are anticipated: author and title, no more” (page 2/49) For additional directories or search engines, it’s not clear how that would work: is a search results page a published or privashed document? Does every possible older version of every result page stick around in the system? (If not, links to/from might break) It’s part of a bigger question about how to represent and handle dynamic documents in the system.

On privacy: “The network will not, may not monitor what is written in private documents.” (page 2/59) A whole section in chapter 3 deals with these kinds of issues, as does Computer Lib, another of Nelson’s works.

He was early to recognize the framing problem: how in a tangle of interlinked documents, to make sense of what’s there, to discern between useful and extraneous chunks. Nelson admits to no general solution, but points at some promising directions, one of which is link typing–the more information there is on individual links, the more handles there are to make sense of the tangle. Some tentative link types include title, author, supersession, correction, comment, counterpart, translation, heading, paragraph, quote, footnote, jump-link, modal jump-link, suggested threading, expansion, citation, alternative version, comment, certification, and mail.

At several points, Nelson mentions algorithmic work that makes the system possible. Page 1/36 states “Our enfilade data structures and methods effectively refute Donald Knuth’s list of desirable features that he says you can’t have all at once (in his book Fundamental Algorithms: Sorting and Searching)”. I’m curious if anyone knows more about this, or if Knuth ever got to know enough details to verify that claim, or revise his.

So was the opening anecdote a valid description of reality? I have to say no, it’s not that simple. Nelson rightly calls the web a shallow imitation of his grand ideas, but those ideas are–in some ways literally–from a different world. It’s not a question of “if only things had unfolded a bit differently…”. To put it even more strongly, a system with that kind of scope cannot be designed all at once, in order to be embraced by the real world it has to be developed with a feedback loop to the real world. This in no way diminishes the value and influence of big ideas or the place that Roarkian stick-to-your-gunnedness has in our world, industry, and society. We may have gotten ourselves into a mess with the architecture of the present web, but even so, Nelson’s vision will keep us aspiring toward something better.

I intend to return to this posting and update it for accuracy as my understanding improves. Some additional topics to maybe address are: a more detailed linking example (page 2/45), comparing XLink to Xanadu, comparing URIs and tumblers, and mention the bizarre (and yet oddly familiar if you’ve ever been inside a FedEx Kinkos) notion of “SilverStands”.

For more on Nelson, there is the epic writeup in Wired. YouTube has some good stuff too.

Comments are welcome. -m

Xanadu is a registered trademark, here used for specific identifying purpose.

Saturday, October 24th, 2009

Are Windows 7 reviewers logic challenged?

At the risk of sounding fanboy, are Windows 7 reviewers logic challenged? Not to pick on any one in particular, but here’s the most recent one I bumped into–I’ve seen similar qualities in other reviews. Under the reasons to get it:

1. Your computer can probably run it. Unlike Vista, which proved a giant slop-feeding resource hog compared to XP, Windows 7′s system requirements haven’t changed much at all since Vista,

So if Vista was a “giant slop-feeding resource hog”, and the Windows 7 requirements haven’t changed much relative to that…how is this a plus again?

2. It costs less than Vista did. Microsoft really seems to have learned its lesson with Vista pricing, which was way too high at first. Although Windows 7 is hardly cheap…

Similar to #1. The argument amounts to ‘it’s not as ridiculous as Vista’. Yay.

3. You’re not stuck with whatever version you choose first. There are a lot of versions of Windows 7 , all with different combinations of features. If you buy Home Premium and decide at some future point that you really need Ultimate—who doesn’t need BitLocker at some point?—you don’t have to drop $319.99 on top of the $199.99 you already spent the first time.

Remember the version chart? If for some reason you choose “Professional” over “Ultimate”, saving a cool $20 at retail price, you can always go back and upgrade for a modest $129.99. Remember, this is from the list of reasons to choose Windows.

5. You don’t have to give up Windows XP. Yes, exiting any long-term relationship can be difficult, but sometimes it has to be done.

A reason to upgrade is that you don’t have to give up the thing you are probably upgrading from?

7. Comedic value. Even if Windows 7 can’t be hailed for anything else, it inspired an enlightening and truly hilarious column from PCMag.com Editor-in-Chief Lance Ulanoff…

Comedic value? Seriously? The comedic value in Windows 7 reviews seems to be entirely unintentional… -m

(Posted from 30k feet. Hooray for Virgin America)