Archive for the 'commercialism' Category

Monday, March 1st, 2010

Newsweek should never have been free

Andrew Zolli argues in Newsweek that online content should never have been free. I’m probably not the first one to make this profound observation–but if it were not for the free online edition of Newsweek (and link aggregator sites like Digg) I wouldn’t have read a single word of Newsweek in years, nor would I be linking to it as my previous sentence does… Maybe Newsweek is OK with that. -m

Monday, February 22nd, 2010

Mark Logic User Conference 2010

Are you coming? Link. It starts on May 4 (Star Wars day!) at the InterContinental Hotel in San Francisco. Guest speakers include Chris Anderson, Editor-in-Chief of Wired and Michelle Manafy, Editor-in-Chief of EContent magazine.

Early bird registration ends Feb 28. -m

Friday, January 15th, 2010

Economic indicators: recruiting picking up again

I got a personal email pitch from recruiters at both Facebook and Google, oddly enough both messages within a 3-minute window on a Monday morning. Hiring is on the uptick again, it seems. My team is still looking for the right front end engineer–someone who knows the JavaScript language in depth, how to use semantic HTML and CSS, AND all about browser quirks. Email me. -m

Sunday, January 3rd, 2010

Geek Thoughts: the ultimate real-time strategy game

Games like Farmville and the iPhone knock-off iFarm throw in a unique twist in the realm of strategy gaming: crops that get planted mature in “real time”. If a crop takes 24 hours to grow, then you need to literally wait the full 24 hours. Great for making an app “sticky” and getting users to repeatedly log in. Side fact: Farmville sells more virtual tractors in a day than real tractors sold in the US in a Year.

Game producers keep upping the ante in terms of real-time strategy games interacting with the real world. Take the latest for instance, a free iPhone app called Lose It!. Everything in this game runs in real-time–a game day is always a full 24 hours. Instead of conventional points, it uses “calories”, which are gained by the actual foods you physically eat, and subtracted via actual exercise. The app includes a massive database of food items and exercises to help you keep an accurate record, apparently on the honor system. The goal: to set a calorie target for each day and come in under it. A secondary scoring system is based on your own weight, though you will need an accurate scale (not included with the app) to measure it.

So far I’ve done pretty well at the game. I’ve averaged better than 1000 calories under my goal for the last several weeks, and have done well on the weight number too. And it’s pretty interesting to have a log of everything I’ve eaten. What will they think of next?

More collected Geek Thoughts at http://geekthoughts.info.

Monday, December 21st, 2009

Failure as the secret to success

Excellent article in Wired, perhaps a good explanation of my career. :-)

Dunbar observed that the skeptical (and sometimes heated) questions asked during a group session frequently triggered breakthroughs, as the scientists were forced to reconsider data they’d previously ignored.

Which sounds like a fairly typical spec review at Mark Logic. Hint: we’re hiring–email me.

-m

Friday, December 18th, 2009

Mark Logic Careers

Check out the updated careers page, including a quote from YT. If you’re looking for an amazing place to work, get in touch with me. In particular I’m looking for top-notch JavaScript/FE/UI people. -m

Monday, November 30th, 2009

The best thing you can do…

The best thing a user can do to advance the Web is to help move people off IE 6

— Ryan Servatius, senior product manager for Internet Explorer.

Source. -m

Sunday, November 29th, 2009

The Model Endpoint Template (MET) organizational pattern for XRX apps

One of the lead bullets describing why XForms is cool always mentions that it is based on a Model View Controller framework. When building a full XRX app, though, MVC might not be the best choice to organize things overall. Why not?

Consider a typical XRX app, like MarkLogic Application Builder. (You can download a your copy of MarkLogic, including Application Builder, under the community license at the developer site.) For each page, the cycle goes like this:

  1. The browser requests a particular page, say the one that lets you configure sorting options in the app you’re building
  2. The page loads, including client-side XForms via JavaScript
  3. XForms requests the project state as XML from a designated endpoint; this becomes the XForms Instance Data
  4. Stuff happens on the page that changes the client-side state
  5. Just before leaving the page, XML representing the updated state is HTTP PUT back to the endpoint

The benefit of this approach is that you are dealing with XML all the way through, no impedance mismatches like you might find on an app that awkwardly transitions from (say) relational data to Java objects to urlencoded name/value pairs embedded in HTML syntax.

So why not do this in straight MVC? Honestly, MVC isn’t a bad choice, but it can get unwieldy. If an endpoint consists of a separate model+view+controller files, and each individual page consists of separate model+view+controller files, it adds up to a lot of stuff to keep track of. In truly huge apps, this much attention to organization might be worth it, but most apps aren’t that big. Thus the MET pattern.

Model: It still makes sense to keep the code that deals with particular models (closely aligned with Schemas) as a separate thing. All of Application Builder, for example, has only one model.

Endpoint: The job of an endpoint is to GET and PUT (and possibly POST and DELETE) XML, or other equivalent resource bundles depending on how many media types you want to deal with. It combines an aspect of controllers by being activated by a particular URL and views by providing the data in a consistent format.

Template: Since XForms documents already contain MVC mechanics, it not a high-payoff situation to further use MVC to construct the XForms and XHTML wrapper themselves. The important stuff happens within XForms, and then you need various templating mechanisms for example to provide consistent headers, footers, and other pieces across multiple pages. For this, an ordinary templating mechanism suffices. I can imagine dynamic assembly scenarios where this wouldn’t be the case, but again, many apps don’t need this kind of flexibility, and the complexity that comes along with it.

What about separation of concerns? Oh yeah, what about it? :-) Technically both Endpoints and Templates violate classical SOC. In an XRX app, this typically doesn’t lead to the kinds of spaghetti situations that it might otherwise. Endpoints are self contained, and can focus on doing just one thing well; with limited scope comes limited ability to get into trouble. For those times when you need to dig into the XQuery code of an endpoint, it’s actually helpful to see both the controller and view pieces laid out in one file.

As for Templates, simplicity wins. With the specifics of models and endpoints peeled away, the remaining challenge in developing individual pages is getting the XForms right, and again, it’s helpful to minimize the numbers of files one XForms page are split across. YAGNI applies to what’s left, at least in the stuff I’ve built.

So, I’ve been careful in the title to call this an “organizational pattern”, not a “design pattern” or an (ugh) “architectural pattern”. Nothing too profound here. I’d be happy to start seeing XRX apps laid out with directory names like “models”, “endpoints”, and “templates”.

What do you think? Comments welcome.

-m

Sunday, November 22nd, 2009

How Xanadu Works: technical overview

One particular conversation I’ve overheard several times, often in the context of web and standards development, has always intrigued me. It goes something like this:

You know, Ted Nelson’s hypertext system from the 60’s had unbreakable, two-way links. It was elegant. But then came along Tim Berners-Lee and HTML, with its crappy, one-way, breakable links, and it took over the world.

The general moral of the story is usually about avoiding over-thinking problems and striving for simplicity. This has been rolling around in the back of my mind ever since the first time I heard the story. Is it an accurate assessment of reality? And how exactly did Nelson’s system, called Xanadu (R), manage the trick of unbreakable super-links? Even if the web ended up going in a different direction, there still might be lessons to learn for the current generation of people building things that run (and run on) the web.

Nelson’s book Literary Machines describes the system in some detail, but it’s hard to come by in the usual channels like Amazon, or even local bookstores. One place does have it, and for a reasonable price too: Eastgate Systems. [Disclosure: I bought mine from there for full price. I’m not getting anything for writing this post on my blog.] The book has a versioning notation, with 93.1 being the most recent, describing the “1993 design” of the software.

Pause for a moment and think about the history here. 1993 is 16 years ago as I write this, about the same span of time between Vannevar Bush’s groundbreaking 1945 article As We May Think (reprinted in full in Literary Machines) and Nelson’s initial work in 1960 on what would become the Xanadu project. As far as software projects go, this one has some serious history.

So how does it work? The basic concepts, in no particular order, are:

  • A heavier-weight publishing process: Other than inaccessible “privashed” (as opposed to “pub”lished) documents, once published, documents are forever, and can’t be deleted except in extraordinary circumstances and with some kind of waiting period.
  • All documents have a specific owner, are royalty-bearing, and work through a micropayment system. Anyone can quote, transclude, or modify any amount of anything, with the payments sorting themselves out accordingly.
  • Software called a “front end” (today we’d call it a “browser”) works on behalf of the user to navigate the network and render documents.
  • Published documents can be updated at will, in which case unchanged pieces can remain unchanged, with inserted and deleted sections in between. Thus, across the history of a document, there are implicit links forward and backward in time through all the various editions and alternatives.
  • In general, links can jump to a new location in the docuverse or transclude part of a remote document into another, and many more configurations, including multi-ended links, and are granular to the character level, as well as attached to particular characters.
  • Document and network addressing are accomplished through a clever numbering system (somewhat reminiscent of organic versioning, but in a way infinitely extensible on multiple axes). These address, called tumblers, represent a Node+User+Document+Subdocument, and a minor variant to the syntax can express ranges between two points therein.
  • The system uses its own protocol called FEBE (Front End Back End) which contains at several verbs including on page 4/61: RETRIEVEV (like HTTP GET), DELETEVSPAN, MAKELINK, FINDNUMOFLINKSTOTHREE, FINDLINKSFROMTOTHREE, and FINDDOCSCONTAINING [Note that “three” in this context is an unusual notation for a link type] Maybe 10 more verbs are defined in total.

A few common themes emerge. One is the grandiose scope: This really is intended as a system to encompass all of literature past, present, and future, and to thereby create a culture of intellect and reshape civilization. “We think that anyone who actually understands the problems will recognize ours approach as the unique solution.” (italics from original, 1993 preface)

Another theme is simple solutions to incredibly difficult problems. So the basic solution to unbreakable links is to never change documents.  Sometimes these solutions work brilliantly, sometimes they fall short, and many times they ends up somewhere in between. In terms of sheer vision, nobody else has come close to inspiring as many people working on the web. Descriptions of what today we’d call a browser would sound familiar, if a bit abstract, even to casual users of Firefox or IE.

Nothing like REST seems to have occurred to Nelson or his associates. It’s unclear how widely deployed Xanadu prototypes ever were, or how many nodes were ever online at any point. The set of verbs in the FEBE protocol reads like that a competent engineer would come up with. The benefits of REST, in particular of minimizing verbs and maximizing nouns, are non-obvious without a significant amount of web-scale experience.

Likewise Creative Commons seems like something the designers never contemplated.  “Ancient documents, no longer having a current owner, are considered to be owned by the system–or preferably by some high-minded literary body that oversees their royalties.” (page 2/29) While this sounds eerily like the Google Books settlement, this misses the implications of truly free-as-in-beer content, but equally misses the power of free-as-in-freedom documents. In terms of social impact there’s a huge difference between something that costs $0 and $0.000001.

In this system anyone can include any amount of any published document into their own without special permission. In a world where people writing Harry Potter Lexicons are getting sued by the copyright industry, it’s hard to imagine this coming to pass without kicking and screaming, but it is a nice world to think about. Anyway, in Xanadu per-byte royalties work themselves out according to the proportion of original vs. transcluded bytes.

Where is Google in this picture? “Two system directories, maintained by the system itself, are anticipated: author and title, no more” (page 2/49) For additional directories or search engines, it’s not clear how that would work: is a search results page a published or privashed document? Does every possible older version of every result page stick around in the system? (If not, links to/from might break) It’s part of a bigger question about how to represent and handle dynamic documents in the system.

On privacy: “The network will not, may not monitor what is written in private documents.” (page 2/59) A whole section in chapter 3 deals with these kinds of issues, as does Computer Lib, another of Nelson’s works.

He was early to recognize the framing problem: how in a tangle of interlinked documents, to make sense of what’s there, to discern between useful and extraneous chunks. Nelson admits to no general solution, but points at some promising directions, one of which is link typing–the more information there is on individual links, the more handles there are to make sense of the tangle. Some tentative link types include title, author, supersession, correction, comment, counterpart, translation, heading, paragraph, quote, footnote, jump-link, modal jump-link, suggested threading, expansion, citation, alternative version, comment, certification, and mail.

At several points, Nelson mentions algorithmic work that makes the system possible. Page 1/36 states “Our enfilade data structures and methods effectively refute Donald Knuth’s list of desirable features that he says you can’t have all at once (in his book Fundamental Algorithms: Sorting and Searching)”. I’m curious if anyone knows more about this, or if Knuth ever got to know enough details to verify that claim, or revise his.

So was the opening anecdote a valid description of reality? I have to say no, it’s not that simple. Nelson rightly calls the web a shallow imitation of his grand ideas, but those ideas are–in some ways literally–from a different world. It’s not a question of “if only things had unfolded a bit differently…”. To put it even more strongly, a system with that kind of scope cannot be designed all at once, in order to be embraced by the real world it has to be developed with a feedback loop to the real world. This in no way diminishes the value and influence of big ideas or the place that Roarkian stick-to-your-gunnedness has in our world, industry, and society. We may have gotten ourselves into a mess with the architecture of the present web, but even so, Nelson’s vision will keep us aspiring toward something better.

I intend to return to this posting and update it for accuracy as my understanding improves. Some additional topics to maybe address are: a more detailed linking example (page 2/45), comparing XLink to Xanadu, comparing URIs and tumblers, and mention the bizarre (and yet oddly familiar if you’ve ever been inside a FedEx Kinkos) notion of “SilverStands”.

For more on Nelson, there is the epic writeup in Wired. YouTube has some good stuff too.

Comments are welcome. -m

Xanadu is a registered trademark, here used for specific identifying purpose.

Saturday, October 24th, 2009

Are Windows 7 reviewers logic challenged?

At the risk of sounding fanboy, are Windows 7 reviewers logic challenged? Not to pick on any one in particular, but here’s the most recent one I bumped into–I’ve seen similar qualities in other reviews. Under the reasons to get it:

1. Your computer can probably run it. Unlike Vista, which proved a giant slop-feeding resource hog compared to XP, Windows 7’s system requirements haven’t changed much at all since Vista,

So if Vista was a “giant slop-feeding resource hog”, and the Windows 7 requirements haven’t changed much relative to that…how is this a plus again?

2. It costs less than Vista did. Microsoft really seems to have learned its lesson with Vista pricing, which was way too high at first. Although Windows 7 is hardly cheap…

Similar to #1. The argument amounts to ‘it’s not as ridiculous as Vista’. Yay.

3. You’re not stuck with whatever version you choose first. There are a lot of versions of Windows 7 , all with different combinations of features. If you buy Home Premium and decide at some future point that you really need Ultimate—who doesn’t need BitLocker at some point?—you don’t have to drop $319.99 on top of the $199.99 you already spent the first time.

Remember the version chart? If for some reason you choose “Professional” over “Ultimate”, saving a cool $20 at retail price, you can always go back and upgrade for a modest $129.99. Remember, this is from the list of reasons to choose Windows.

5. You don’t have to give up Windows XP. Yes, exiting any long-term relationship can be difficult, but sometimes it has to be done.

A reason to upgrade is that you don’t have to give up the thing you are probably upgrading from?

7. Comedic value. Even if Windows 7 can’t be hailed for anything else, it inspired an enlightening and truly hilarious column from PCMag.com Editor-in-Chief Lance Ulanoff…

Comedic value? Seriously? The comedic value in Windows 7 reviews seems to be entirely unintentional… -m

(Posted from 30k feet. Hooray for Virgin America)

Wednesday, October 21st, 2009

Application Builder behind-the-scenes

I’ll be speaking next Tuesday (Oct 27) at the Northern Virginia MarkLogic User Group (NOVAMUG). Here’s what I’ll be talking about.

Application Builder consists of two main parts: Search API to enable Google-style search string processing, and the actual UI wizard that steps users through building a complete search app. It uses a number of technologies that have not (at least not up until now!) been widely associated with MarkLogic. Why some technologies that seem like a perfect fit for XML apps are less used in the Mark Logic ecosystem is anyone’s guess, but one thing App Builder can contribute to the environment is some fresh DNA. Maybe your apps can benefit from these as well.

XForms and XRX. Clicking through the screens of App Builder is really a fancy way of editing XML. Upon first arriving on a page, the client makes a GET request to an “Application XML Endpoint” (axe.xqy) to get the current state of the project, which is rendered in the user interface. Interacting with the page edits the in-memory XML. Afterwards, the updated state is PUT back to the same endpoint upon clicking ‘Save’ or prior to navigating away. This is a classic XRX architecture. MarkLogic ships with a copy of the XSLTForms engine, which makes use of client-side XSLT to transform XForms Markup into divs, spans, classes, and JavaScript that can be processed entirely in the browser. Thus XForms works on all supported browsers all the way back to IE6. The apps built by the App Builder don’t use any XForms (yet!) but as App Builder itself demonstrates, it is a great platform for application development.

To be honest, many XForms apps have fallen short on the polished UI department. Not so with App Builder, IMHO. An early, and in hindsight somewhat misdirected, thrust of XForms advocacy pushed the angle of building apps with zero script needed. But one advantage of using a JavaScript implementation of XForms is that it frees you to use script as needed. So in many places, large amounts of UI, all mapped to XML, are able to be hidden away with CSS, and selectively revealed (or mapped to yet other HTML form controls) in small, self-contained overlays triggered via script. While it doesn’t fulfill the unrealistic promise of completely eliminating script, it’s a useful technique, one I predict we’ll see more of in the future.

Relax NG. XML Schema has its roots deep into the XML infrastructure. The type system of XQuery and XSLT 2.0 is based on it. Even XForms has ties to it. But for its wide reach, XML Schema 1.0 has some maddening limitations, and “takes some getting used to” before one can sight read it. In the appendices of many recent W3C specifications use the highly-readable compact syntax to describe content models is a way equally human and machine-readable.

What are these limitations I speak of? XML Schema 1.1 goes a long way toward resolving these, but isn’t yet widely in use. Take this example, the content model of the <options> element from Search API:

start = Options | Response

# Root element
OptionsType = (
 AdditionalQuery? &
 Annotation* &
 ConcurrencyLevel? &
 Constraint* &
 Debug? &
 DefaultSuggestionSource? &
 Forest* &
 Grammar? &
 Operator* &
 PageLength? &
 QualityWeight? &
 ReturnConstraints? &
 ReturnFacets? &
 ReturnMetrics? &
 ReturnQtext? &
 ReturnQuery? &
 ReturnResults? &
 ReturnSimilar? &
 SearchOption* &
 SearchableExpression? &
 SortOrder* &
 SuggestionSource* &
 Term? &
 TransformResults?
)

The start line indicates that, within this namespace, there are two possible root elements, either <options> or <response> (not shown here). An instance with a root of, say search:annotation is by definition not valid. Try representing that in XML Schema.

The definition of OptionsType allows a wide variety of child elements, some zeroOrMore times, other optional (zero or one occurrence), with no ordering restrictions at all between anything. XML Schema can’t represent this either. James Clark’s trang tool converts Relax NG into XML Schema, and has to approximate this as an xsd:choice with maxOccurs=”unbounded”, thus the elements that can only occur once are not schema-enforced. Thus the Relax NG description of the content model, besides being more readable, actually contains more information than the closest XML Schema. So particularly for XML vocabularies that are optimized for human use, Relax NG is a good choice for schema development.

Out of line validation. So if XML Schema doesn’t fully describe the <options> node, how can authors be sure they have constructed one correctly? Take a step back: even if XML Schema could fully represent the content model, for performance reasons you wouldn’t want to repeatedly validate the node on every query. The options node tends to change infrequently, mainly during a development cycle. Both of these problems can be solved with out-of-line validation: a separate function call search:check-options().

Inside this function you’ll find a validate expression that will make as much use of the Schema as it can, but also much more. The full power of XQuery can be leveraged against the proposed <options> node to check for errors or inconsistencies, and provide helpful feedback to the developer. Since it happens out-of-line, these checks can take substantially longer than actually handing the query based on them. The code can go as in-depth as it needs to without performance worries. This is a useful technique in many situations. One potential shortfall is that people might forget to call your validation function, but in practice this hasn’t been too much trouble.

Higher-order functions. The predecessor to Search API had a problem that it was so popular that users would modify it to suit their unique requirements, which lead to dozens of minor variations floating around in the wild. Different users have different needs and expectations for the library, and making a one-size-fits-all solution is perhaps not possible. One way to relieve this kind of feature pressure is to provide enough extension hotspots to allow all the kinds of tweaks that users will want, preserving the mainline code. This involves prediction, which is difficult (especially about the future). But a good design makes this possible.

Look inside the built-app and you will find a number of function pointers, implemented as a new datatype xdmp:function. XQuery 1.1 will have a more robust mechanism for this, but it might be a while before this is widespread. By modifying one file, changing a pointer to different code, nearly every aspect of the application can be adjusted.

Similarly, a few hotspots in the Search API can be customized, to hook in different kinds of parsers or snippet-generators. This powerful technique can take your own apps to the next level.

-m

Monday, October 12th, 2009

Speaking at Northern Virginia Mark Logic User Group Oct 27

Come learn more about Mark Logic and get a behind-the-scenes look at the new Application Builder. I’ll be speaking at the NOVA MUG (Northern Virginia Mark Logic User Group) on October 27. This turns out to be pretty close to the big Semantic Web conference, so I’ll stick my head in there too. Stop by and look me up!

Details at the developer site.

-m

Wednesday, October 7th, 2009

US Federal Register in XML

Fed Thread is a front end for the newly XMLified Federal Register. Why is this a big deal? It’s a daily publication of the goings-on of the US government. It’s a primary source for all kinds of things that normally only get rehashed through news organizations. And it is bulky–nobody can read through it on a regular basis. A yearly subscription (printed) would cost nearly $1000 and fill over 80,000 pages.

Having it in XML enables all kinds of searching, syndication, and annotation via flexible front ends like this one. Yay for transparency. -m

Sunday, September 27th, 2009

View on Publishing

An editor’s view on the modern publishing market, how it’s changing, and challenges any book faces in running the gauntlet of publication. Worth a read. -m

Sunday, August 9th, 2009

Kindle Flaw

Here’s the scenario:

The night before a long flight, I upload my personal files into a freshly charged Kindle 2. To preserve the battery, I switch off wireless and in the bag it goes. The next day, on the plane, I open the Kindle…and it’s showing an entirely depleted battery, exclamation point and all. Can you spot the design flaw?

-m

Thursday, August 6th, 2009

Kindle software update 2.0.4

According to this page, it’s here. At least the source code is. You heard it here first. -m

Saturday, July 11th, 2009

The decline of the DBMS era

Several folks have been pointing to this article which has some choice quotes along the lines of

If we examine the nontrivial-sized DBMS markets, it turns out that current relational DBMSs can be beaten by approximately a factor of 50 in most any market I can think of.

My employer is specifically mentioned:

Even in XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors

And it’s true, but don’t take my word for it. :-) The DBMS world has lots of inertia, but don’t let that blind you to seeing another way to solve problems. Particularly if that extra 50x matters. -m

Tuesday, July 7th, 2009

Demo Jam at Balisage 2009

Come join me at the Demo Jam at Balisage this year. August 11 at 6:30 pm. There will be lots of cool demos, judged by audience participation. I’d love to see you there. -m

Saturday, June 27th, 2009

Steorn: the jury has spoken

The Steorn 300 program is underway, and yes, I am one of the 300 looking at their information which is coming out in once-a-week bursts in the form of educational modules. So far, nothing interesting. Some basic physics lessons, and somewhat more interesting forum activity.

But all signs seem to be pointing in the wrong direction for a miraculous breakthrough. A jury of members selected by Steorn recently unanimously stated:

The unanimous verdict of the Jury is that Steorn’s attempts to demonstrate the claim have not shown the production of energy. The jury is therefore ceasing work.

Even this announcement raises more questions, and at this point in the game, more questions is not a good thing. Of the 22 original jury members, apparently only 16 were left at the end. Those 16 were unanimous, but what did the other 6 think? Were they booted as dissenters? Also allegedly the jury was never presented with actual hardware, which seems completely crazy and counterproductive from the standpoint of the company that convened the jury.

This kind of story has unfolded many times before, and it doesn’t end well. I’ve spent many hours debunking energy claims and perpetual motion devices. But hey, the company says they are proceeding with plans to commercialize the technology by the end of 2009. No matter what happens, it will be interesting to watch. -m

Previously: How Orbo works, When the experimenter wants to believe, and The downside of free energy.

Thursday, June 25th, 2009

MarkLogic Server 4.1, App Services released

I’m thrilled to announce MarkLogic 4.1 and with it my project App Services, is here. Top-of-the-post props go out to Colleen, David, and Ryan who made it happen.

You might already know that MarkLogic Server is a super-powerful database slash search engine powering projects like MarkMail. (But did you know there’s a free-as-in-beer edition?) The next step is to make it easier to use and build your own apps on top of the server.

The first big piece is the Search API, which lets you do “Google-style” searches over your content like this:

search:search(“MP3 OR iPod AND color:black -Zune”)

The built-in grammar includes AND, OR, parens for grouping, – for negation, quotations for phrases, and easy ways to define facets like date:today or author:”Bill Shakespeare” or GPA:3.95. By passing in additional options, you can redefine the grammar and control all aspects of the search and how the results are returned. Numerous grass-roots efforts at doing someting like this had begun to spring up, so the time was right to come out with an officially-sanctioned API. For those developers who haven’t seen the light yet and don’t fancy XQuery, an API like this is a huge benefit.

The next piece builds on the Search API to offer a graphical App Builder tool that produces a simplified MarkMail-type app around your content. It looks like this:

App Builder screen shot, Search page

The App Builder itself is based on XForms via the excellent XSLTForms library and REST, making it a full-blown XRX application.

Lots more info, videos, screencasts, articles, and more are coming soon.

You can start playing with this now by visiting the download page. Under the Community License, you can put 10 gigs of content into it for noncommercial production free-as-in-beer.

Enjoy! I’ll be catching my breath for the next two months*. -m

* Not really

Friday, June 19th, 2009

VoCamp Wrap-up

I spent 2 days at the Yahoo! campus at a VoCamp event, my first. Initially, I was dismayed at the schedule. Spend all the time the first day figuring out why everybody came? It seemed inefficient. But having gone through it, the process seems productive, exactly the way that completely decentralized groups need to get things done. Peter Mika did a great job moderating.

Attendees numbered about 35, and came from widely varying backgrounds from librarian to linguist to professor to student to CTO, though uniformly geeky. With SemTech this week, the timing was right, and the number of international attendees was impressive.

In community development, nothing gets completely decided just because a few people met. But progress happens. The first day was largely exploratory, but also covered plenary topics that nearly everyone was interested in. Namely:

  • Finding, choosing, and knowing when to create vocabularies
  • Mapping from one vocabulary to another
  • RDBMS to RDF mapping

Much of the shared understanding of these discussions is captured on various wiki pages connected to the one at the top of this article.

For day 2, we split into smaller working groups with more focused topics. I sat in on a discussion of Common Tag (which still feels too complex to me, but does fulfill a richer use case than rel-tag). Next, some vocabulary design, planning a microformat (and eventual RDF vocab) to represent code documentation: classes, functions, parameters, and the like. Tantek Çelik espoused the “scientific method” of vocab design: would a separate group, in similar circumstances, come up with the same design? If the answer is ‘yes’, then you probably designed it right. The way to make that happen is to focus on the basics, keeping everything as simple as possible. If any important features are missed, you will find out quickly. The experience of getting the simple thing out the door will provide the education needed to make the more complicated follow-on version a success.

From the wrap-up: if you are designing a vocabulary, the most useful thing you can do is NOT to unleash a fully-formed proposal on the world, but rather to capture the discussion around it. What were the initial use cases? What are people currently doing? What design goals were explicitly left off the table, or deferred to a future verson, or immediately shot down? It’s better to capture multiple proposals, even if fragmentary, and let lots of people look them over and gravitate toward the best design.

Lastly, some cool things overheard:

“Relational databases? We call those ‘legacy’.”

“The socially-accepted schema is fairly consistent.”

“It’s just a map, it’s not the territory.”

-m

Sunday, June 14th, 2009

Selling my house

I’m sticking around Sunnyvale, but am selling my house. It’s a smaller “starter home”place good for a small family. It’s close to Yahoo!, Google, Ebay, Cisco, and lots of other South Bay companies. In a great neighborhood with lots of parks, restaurants (Giovanni’s Pizza just down the street is fantastic), and a nearby movie theater. If you know anyone moving into the area and looking for a place, here’s a chance to short-circuit a lot of the hassle and get straight into well-cared-for place from a reputable seller.

I’m hesitant to post my address and pictures of my house, etc. here. Email me if you want to see more. -m

Wednesday, June 10th, 2009

The Inmates are Running the Asylum: review and RFE

The central thesis of The Inmates are Running the Asylum by Alan Cooper is dead on: engineers get too wrapped up in their own worlds, and left entirely to their own whims can easily make a product incomprehensible to ordinary folks. For this reason alone, it’s worth reading.

But I do question parts of his thesis. He (with tongue in cheek) posits the existence of another species of human, called Homo Logicus. Stepping on to an airplane, Homo Logicus turns left into the cockpit with a million buttons but ultimate control over every aspect of the plane. Regular Homo Sapiens, on the other hand, turn right and tuck themselves into a chair–no control but at least they can relax.

But if there was only one “species” of Homo Logicus, members (like me) would never experience usability issues in software created by fellow Logicians. But ordinary fax machines give me fits. The touch-screen copier at work instills dread in my heart. And the software I need to use to file expense reports–written by enterprise software geeks probably very similar to me–is a usability nightmare. Words fail me in expressing my disdain for this steaming heap of fail.

The book is sub-titled “Why High-Tech Products Drive Us Crazy”, but one doesn’t have to look very far to find similar usability bugs in the low-tech world. Seth Godin, for example, likes to talk about different things in life that Just Don’t Work, along with reasons why. Some examples:

  • airport cab stand (75 cabs, 75 people, and it takes an hour)
  • “don’t operate heavy machinery” warning on dog’s prescription medicine
  • excessive fine print on liability agreements–intentionally hard to read and figure out
  • official “Vote for Pedro” shirts that look nothing like the ones in the movie
  • more examples on the web site

If anything, I think Cooper’s work doesn’t go far enough. It is relatively short on good examples, stretching out only four examples over four chapters. If properly-designed software is so hard to come up with examples of, then there are bigger problems in play (that would need to be dealt with by something more manifesto than book).

The book now 5 years old. Perhaps it’s time for an update. Particularly in the world of web software, lots has happend in 5 years. Flickr. Gmail. Yahoo Pipes. Google Docs. Even SearchMonkey. Instead of focusing on pointing at crappy software, I’d like to see more emphasis on properly-done interfaces. More delving into nuance, and common factors behind why both high-tech and low-tech products miss the mark.

But maybe that’s just me. -m

Thursday, June 4th, 2009

Displaced Yahoo Placement Service

I was shocked today to find out that one of my old friends from the Yahoo Search days was let go in the last round. He’s simply brilliant and would have been one of the last people I would have expected that the managers-in-purple could do without.

At the same time, I’m getting hounded by recruiters–five so far just this week.

So let me put these two forces against each other and see if they cancel out. To any former Yahoos: get in touch with me and I’ll do what I can to hook you up with a cool opportunity. This offer is good for June and July–after that I can’t reasonably say I’ll have time for matchmaking. Send me your CV via email and I’ll get started. No promises on results, but I’ll do what I can. :-)

-m

Wednesday, June 3rd, 2009

See you at Balisage

Balisage, formerly Extreme Markup, is the kind of conference I’ve always wanted to attend.

Historically my employers have been not quite enough involved in the deep kinds of topics at this conference (or too cash-strapped, but let’s not go there) to justify spending a week on the road. So I’m glad that’s no longer the case: Mark Logic is sponsoring the conference this year. I’m looking forward to the show, and since I’m not speaking, I might be able to relax a little and soak in some of the knowledge.

See you there! -m

Friday, May 22nd, 2009

More on the GOOG book settlement

From Brewster Kahle. Good read, so to speak. -m

Thursday, May 21st, 2009

One year at Mark Logic

Another anniversary this week, one year at Mark Logic. Much of it in stealth mode, but more details of what I’ve been up to are forthcoming. -m

Tuesday, May 12th, 2009

Google Rich Snippets powered by RDFa

The new feature called rich snippets shows that SearchMonkey has caught the eye of the 800 pound gorilla. Many of the same microformats and RDF vocabularies are supported. It seems increasingly inevitable that RDFa will catch on, no matter what the HTML5 group thinks. -m

Sunday, May 10th, 2009

Yahoo!: One year gone

As of today, I have been out of Yahoo! for a full year. And what a year it’s been… I guess that means I’m now free to recruit…any good XML people still wearing purple? -m

Sunday, May 3rd, 2009

Playing with Wolfram Alpha

I’ve been experimenting with the preview version of Wolfram Alpha. It’s not like any current search engine because it’s not a search engine at all. Others have already written more eloquent things about it.

The key feature of it is that it doesn’t just find information, it infers it on the fly. Take for exmple the query

next solar eclipse in Sunnyvale

AFAIK, nobody has ever written a regular web page describing this important (to me) topic. Try it in Yahoo! or Google and see for yourself. There are a few potentially interesting links based on the abstracts, but they turn out to be spammy. Wolfram Alpha figures out that I’m talking about the combination of a concept (“solar eclipse”) and a place (“Sunnyvale, CA”, but with an offer to switch to Sunnyvale, TX) and combines the two. The result is a simple answer–4:52 pm PDT | Sunday, May 20, 2012 (3.049 years from now). Hey, that’s sooner than I thought! Besides the date, there’s many related facts and a cool map.

This is in contrast to SearchMonkey, which I helped create, in two main areas:

  1. Wolfram Alpha uses metadata to produce the result, then renders it through a set of pre-arranged renderers. The response is facts, not web pages.
  2. SearchMonkey focuses on sites providing their own metadata, while Wolfram Alpha focuses on hand-curation.

Search engines have been striving to do a better job at fact-queries. Wolfram’s approach shows that an approach disjoint from finding web pages from an index can be hugely useful.

The engineers working on this have a sense of humor too. The query

1.21GW

returns a page that includes the text “power required to operate the flux capacitor in the DeLorean DMC-12 time machine” as well as a useful comparison (~ 0.1 x the power of space shuttle at launch).

Yahoo! and Google do various kinds of internal “query rewriting”, but usually don’t let you know other than in the broadest terms (“did you mean …”). Wolfram Alpha shows a diagram of what it understood the query to be. The diagrams make it evident that something like the RDF model is in use, but without peeking under the hood, it’s hard to say something definitive.

One thing I wonder about is whether Wolfram Alpha creates dynamic (as was a major goal of SearchMonkey) of giving web authors a reason to put more metadata in their sites–a killer app if you will. It’s not clear at this early date how much web crawling or site metadata extraction (say RDFa) plays into the curation process.

In any case Wolfram Alpha is something to watch. It’s set to launch publicly this month. -m

MicahLogic is Stephen Fry proof thanks to caching by WP Super Cache