Why XML still matters after 25 years

December 2022. There’s a definite sense that technology is accelerating. In the last six months alone, a handful of AI technologies have made incredible leaps forward in generating images and fluent human language interaction. Some days it’s a challenge to even keep up with the headlines.

And yet, XML is still here.

Even if in the coming years, even if we’re doing computing from a Metaverse by talking to a Star Trek-style computer interface, we’re still going to need text and documents. And we’ll need to say things about that text–markup. And while lots of other technologies of that era have been replaced or radically evolved, XML is pretty much still the same[1].

The technology world was different in 1998, the year XML 1.0 was finalized. That year, Sun Workstations were cool. Furbys were hot. A company with the questionable name of “Google” was freshly incorporated. The Microsoft antitrust trial overshadowed the release of Windows ’98, meanwhile AOL was wrapping up their purchase of Netscape. And the first iMacs were rolling out to customers in all the lollipop colors.

I challenge you to spend a day relying on the 1998-version of any of those things.

It’s also hard to remember how fierce the hype about XML became in the early years. Everyone was rushing to include XML in their product in one way or another. The CTO of the small company I worked for agreed without hesitation that I could have the job title “Chief XML Architect,” just for the credibility it bestowed on the company. A flurry of related specifications came out, both at the W3C and other industry groups, with rapidly escalating complexity. A whole ‘nother flurry of industry group web services specs, starting with SOAP and going downhill from there, quickly grew into an incomprehensible mess. In short, XML got overextended.

The pendulum of history is not subtle, and the XML backlash struck with force. Especially among programmers, and others tasked with hand-writing angle brackets, ‘XML’ became a muttered curse[2]. Once browser vendors jumped ship, poor XML descended to meme status.

For representing data, most projects have moved on to JSON, YAML, TOML, protocol buffers, and numerous other variants and offshoots. All of these have their benefits over XML, in readability, writeability, or conciseness.

For interchanging data, it has not been as clear-cut. These days, agreeing on XML as an interchange format, especially between otherwise-uncoordinated systems, remains a solid–if a little old-fashioned–choice.

Meanwhile, a whole bunch of lightweight languages centered around Markdown have done a better job at being ‘writable’ by limiting their scope to a fixed set of markup primitives, though (tellingly) many of these languages fall back to inline bits of XHTML to express things outside their core definition.

And still, for the kinds of things for which XML hits the sweet spot–namely text with structured annotation–no better technology has been able to displace it, or even come close.

In retrospect, the designers of XML did a good job threading a needle.

Does that leave a narrow niche? In the grand scheme of things, yes. Though our texts, our knowledge, our research are among the most durable things we will produce, individually or as a species. Its fitting that the representation for these kinds of data would remain more stable, even through a maelstrom of technological fads and innovation. A dedicated group of folks still meet annually for the Balisage conference to discuss these and other markup-related issues.

After a few years away, I made it to Balisage this year, and it reminded me how much I miss working with this stuff. I looked around to see what companies were doing interesting things in this space.

So I guess this has been a long-winded way of announcing why I ended up back at MarkLogic. I started as Principal Engineer in October. It’s almost a different company now, but it’s still the best overall implementation of the ideals of XML. But it’s not an “XML Database”–it’s grown into a full data platform that also includes support for document types that aren’t a close fit for XML, including good old rows and tables. It supports JavaScript natively and plays well with Node.js. The Semantics features keep getting more powerful. And the new Optic API brings it all together in a unified way. We’ve barely scratched the surface of what’s possible when one can fluently mix-and-match all these different shapes of data, with the robustness, security, and scale of a traditional RDBMS.

I’ve grown a lot as an engineer and as a person. What makes me happiest is being able to solve problems. There might have been a time when I’d join a company purely because of tech, but that doesn’t seem like a great strategy to build a career on. Teaming up with like-minded folks, though, is a good way to find joy in your work–and tackle tough problems as you go.

Even as the pace of change accelerates, there’s still a few bits of stable ground on which to affix a fulcrum.

Oh, and I’m joining just in time for a major new release with MarkLogic 11.

Go check it out.

Footnotes

Modulo some cleanup of Unicode allowed in names and newlines. Surface-level stuff.
In fairness, actual programming languages expressed in terms of XML are pretty terrible to work with.

Related Posts

Carol Dubinko, 1948 – 2022

Low-level bit-twiddling in high-level languages