Archive for March, 2008

Sunday, March 23rd, 2008

Review: Little Brother

(sick again…at least I get to catch up on my reading)

Something has always puzzled me: I’ve never solidly connected with a Cory Doctorow story. It’s baffling; we’re practically brothers-in-geekdom. Most every nonfiction thing I read from Cory leaves me nodding in agreement. If we met, we’d have no trouble talking for hours about metacrap, free content licenses, and crypto. But for some reason, Cory’s fiction–short story or freely-downloadable-novel–hasn’t clicked with my peculiar mind.

Until now, that is. I emailed Cory asking him for a prerelease copy of Little Brother, in return for an honest review here. He was happy to oblige. The story pulled me in fast and hard, and by the time it was over, I wished there was more to the tale. I’d call that “solidly connecting”. :-)

The story is aimed at high-school-aged kids, and naturally features a cast of high-school-aged protagonists. This means that I’m quite outside the target audience, so your opinions might vary. Too many reviews fall into plot synopsis mode here, so I’ll try to avoid it. Suffice it to say that the story revolves around a close group of teens who get accused of involvement in a bigger-than-911 plot, and quietly fight back against the resulting oppression.

The tale has a lot of (and AFAIK this is a freshly minted word) techsposition. Like any exposition, it is a risky thing to do as a writer, since it halts the forward momentum of your story. It’s doubly hard to stop to explain technical details. Blocks of techsposition were heavy enough to throw me out of the story a few times. There were cases where the plot wouldn’t have suffered by glossing over some details. On the other hand, these not-quite-asides are about real-world (as opposed to fictional) technology, so definitely have some benefit for readers.

The story has a strong message, but it gets spread a little thick toward the middle. All the clever things the kids think to do happen fairly early in the story, but the plot keeps rolling along. There’s also a few too many instances of the really-smart teen who has outwitted or escaped from the clutches of The Man, just in time to appear in the action.

There’s a number of characters in the story, with complex interactions among them.  The main characters are solid, believable, and fully-realized. In fact, I’d point to the characterization as the main that kept me up late reading.

Little Brother comes out the end of April, both in print and freely downloadable. If you’re allowed to choose your own reading material, you might decide it’s worth a look. -m

Friday, March 21st, 2008

Trying Evernote

Evernote looks like a cool application, and for at least a few more hours, you can get it for free via the Giveaway of the Day site. At first glance, this seems like the closest software I’ve seen to the original “Brain Attic” concept I’ve held for years.

My most pressing questions are (big surprise) around data storage. It seems that in the version 3 beta all the data is kept on a remote server, which makes me a little uneasy. In what format is the data kept? Is it some format that will be readable in 50 years? If the Evernote corporation goes offline or out of business, do I lose everything?

I’ll keep reporting back here with my discoveries and experiences. -m

Thursday, March 20th, 2008

Geeking out

I have here a pre-release copy of Cory Doctorow’s novel Little Brother.

With permission.

In plain text.

Being read with the UNIX command less.

On an XO laptop.

And so far it’s awesome. -m

Thursday, March 13th, 2008

The (lowercase) semantic web goes mainstream

So today Yahoo! announced a major facet of what I’ve been working on lately: making the web more meaningful. Lots of fantastic coverage, including TechCrunch and ReadWriteWeb (and others, please link in the comments), and supportive responses and blog posts across the board. It’s been a while since I’ve felt this good about being a Yahoo.

So what exactly is it?

A few months ago I went through the pages on this very blog and added hAtom markup. As a result of this change…well, nothing happened. I had a good experience learning about exactly what is involved in retrofitting an existing site with microformats, but I didn’t get any tangible benefit. With the “SearchMonkey” platform, any site using microformats, or RDFa or eRDF, is exposed to developers who can enhance search results. An enhanced result won’t directly make my my site rank higher in search, it it most certainly make it prone to more clicks, and ultimately more readership, more inlinks, and better organic ranking.

How about some questions and answers:

Q: Is this Tim Berners-Lee‘s vision of the Semantic Web finally getting fulfilled?

A: No.

Q: Does this presuppose everybody rushing to change their sites to include microformats, RDF, etc?

A: No. After all, there is a developer platform. Naturally, developers will have an easier time with sites that use official and community standards for structuring data, but there is no obligation for any site to make changes in order to participate and benefit.

Q: Why would a site want to expose all its precious data in an easily-extractable way?

A: Because within a healthy ecosystem it results in a measurable increase in traffic and customer satisfaction. Data on the public web is already extractable, given enough eyeballs. An openness strategy pays off (of which SearchMonkey is an existence proof).

Q: What about metacrap? We can never trust sites to provide honest metadata.

A: The system does have significant spam deterrents built in, of which I won’t say more. But perhaps more importantly, the plugin nature of the platform uses the power of the community to shape itself. A spammy plugin won’t get installed by users. A site that mixes in fraudulent RDFa metadata with real content will get exposed as fraudulent, and users will abandon ship.

Q: Didn’t ask.com prove that having a better user interface doesn’t help gain search market share?

A: Perhaps. But this isn’t about user interface–it’s about data (which enables a much better interface.)

Q: Won’t (Google|Microsoft|some startup) just immediately clone this idea and take advantage of all the new metadata out there?

A: I’m sure these guys will have some kind of response, and it’s true that a rising tide lifts all boats. But I don’t see anyone else cloning this exactly. The way it’s implemented has a distinctly Yahoo! appeal to it. Nobody has cloned Yahoo! Answers yet, either. In some ways, this is a return to roots, since Yahoo! started off as a human-guided directory. SearchMonkey is similar, except a much broader group of people can now participate. And there are some specific human, technical and financial reasons why as well, but I suggest inviting me out for beers if you want specifics. :-)

Disclaimer: as always, I’m not speaking for my employer. See the standard disclaimer. -m

Update: more Q and A

Q: How is SearchMonkey related to the recently announced Yahoo! Microsearch?

A: In brief, Microsearch is a research project (and a very cool one) with far-reaching goals, while SearchMonkey is targeted as imminently shipping software. I frequently talk to and compare notes with Peter Mika, the lead researcher for Microsearch.

Monday, March 10th, 2008

Dear readers…

You are awesome. Just sayin’. -m

Monday, March 10th, 2008

Getting what you asked for

Some time ago, Doug Crockford’s excellent blog pointed me to this page on “excessive DTD traffic” at the W3C. Go ahead and follow that link, I’ll wait…

All the standard templates that show how to construct a basic XHTML page include a public identifier of http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd and often a namespace name of http://www.w3.org/1999/xhtml. As the blog points out, these are not actually hyperlinks, they only play them on TV. Huge quantities of software are requesting these URLs 24×7, putting a load on their servers. Often times this results from unfortunate defaults in off-the-shelf XML components such as parsers.

But what did you expect?

This is the web equivalent of having a front-desk receptionist hand out a stacks of self-addressed, stamped postcards, then complaining about how much mail the company gets from all around the world.

HTTP URLs are great for identifiers on a technical basis: they are based on DNS names and have the important qualities of uniqueness and persistence. But as far as human factors go, they are a terrible choice (though with a great deal of inertia at this point). -m

Thursday, March 6th, 2008

microformat search at Yahoo!

Somehow I missed this posting and the underlying news that a Y Research project has a nice public demo of semantic search, driven by RDF, RDFa, and microformats. Still a rough sketch of a full solution, with multiple-second access times. But I particularly like the query for renaissance faire. -m

Monday, March 3rd, 2008

WebPath and Wikipedia

The WebPath bug reports continue to roll in. For one, queries against *.wikipedia.* don’t seem to work. You get something back, but it has no resemblance to the page you were looking for. The problem comes from the W3C tidy service that I use, specifically that the (understandably overworked and understaffed) admins at the Wikimedia Foundation seem to have blocked it. It seems like more than a simple IP or user-agent-based block. I’ve emailed them about it but haven’t heard back yet.

So, this highlights the limitation of having a single-source converter in the Platonic Web module of WebPath. So I turn to my readers: do you know of any other tidy servers? Or converters of a non-tidy origin? For any of these to work, they need to return clean XML corresponding to the original page (as opposed to, say, returning something with big headers/footers or ampersand-encoded). This seems like an outstanding need for the open source community.

Please comment below with ideas. Thanks! -m

UPDATE: heard back from the Wikipedia admins, and although professional and helpful-as-can-be-expected, they won’t be changing anything on their end. Still looking for more open source options.

MicahLogic is Stephen Fry proof thanks to caching by WP Super Cache