Archive for the 'announcement' Category

Thursday, April 26th, 2012

MarkLogic World 2012

I’m getting ready to leave for MarkLogic World, May 1-3 in Washington, DC, and it’s shaping up to be one fabulous conference. I’ve always enjoyed the vibe at these events–it has a, well, cool-in-a-data-geeky-way thing going on (like the XML conference in the early 2000′s where I got to have lunch with James Clark, but that’s a different story). Lots of people with big data problems will be here, and I always enjoy talking to these kinds of people.

I’m speaking on Wednesday at 3:30 with Product Manager extraordinaire Justin Makeig about big data visualization. If you’ll be at the conference, come look me up. And if you won’t, well, forgive me if I need a few extra days to get back to any email you send this way.

Follow me on Twitter and look for the #MLW12 tag for live coverage.

-m

Sunday, April 15th, 2012

Actually using big data

I’ve been thinking a lot about big data, and two recent items nicely capture a slice of the discussion.

1) Alex Milowski recounting working with Big Weather Data. He concludes that ‘naive’ (as-is) data loading is a “doomed” approach. Even small amounts of friction add up at scale, so you should plan on doing som in-situ cleanup. He came up with a slick solution in MarkLogic–go read his post for details.

2) Chris Dixon on Making Large Datasets Useful. Typical approaches like machine learning only solve 80-90% of the problem. So you need to either live with errorful data, or invoke manual clean-up processes.

Both worth a read. There’s more to say, but I’m not ready to tip my hand on a paper I’m working on…

-m

Wednesday, February 1st, 2012

Googlebot submitting Flash forms

I’m sure this is old news by now, but here’s one more data point.

As it turns out, XForms Institute uses an old skool XForms engine written in Flash, dating approximately back to the era when Flash was necessary to do XForms-ey things in the browser. The feedback form for the site is, quite naturally, implemented in XForms. Submissions there ultimately make it into my inbox. Here’s what I see:

Tue Jan 31 12:19:22 2012 66.249.68.249 Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

An iPhone running Flash? I doubt it. That’s quite an agent string! Organic versioning in the wild. -m

Tuesday, November 1st, 2011

5 things to know about MarkLogic 5

MarkLogic 5 is out today. Here’s five things beyond the official announcement that developers should know about it:

  1. If you found the CQ sample useful, you’ll love Query Console, which does everything CQ does and more (syntax highlighting!)
  2. Better Search API support for metadata: MarkLogic has always had support for storing metadata separately from documents. With new Search API support, it’s easy to set up, and it works great with databases of binary documents.
  3. The Hadoop connector, while not officially supported in this configuration, works on Mac. I know a lot of developers use Mac hardware. Once you get Hadoop itself set up (following rules like these), everything works great in my experience.
  4. “Fields” have gotten more general and more powerful. If you haven’t set aside named portions of your documents or metadata for special indexing and access, you should look in to this feature–it will rock your world.
  5. To better understand what your system is doing at any point in time, you can now use the built-in Monitoring Dashboard, which runs in-browser.
And let’s not leave out the Express license, which makes it easier to get started. Check it out.
-m

Thursday, June 9th, 2011

SkunkLink in Belorussian

The awesome thing about the internet is that you never know who’s reading your stuff. Case in point: during the depths of the hypertext linking standards discussions, after folks realized that XLink wasn’t going to work with HTML (not even with XML-flavored XHTML), all kinds of proposals flew around about what to do about it. One was my own SkunkLink, a “skunkworks” attempt to get people thinking in a certain direction.

An enthusiastic follower, Bohdan Zograf, has translated SkunkLink into Belorussian, available here. He’s also mentioned translating all of XForms Essentials, which I completely support, and is just the kind of thing I hoped would happen when I put the text under a liberal content license.

Awesome. -m

Sunday, May 1st, 2011

MarkLogic User Conference coverage

Running commentary on Twitter, but hurry, Twitter’s search infrastructure has the long-term memory of a fruit fly. Posts tagged with MLUC11 will soon be dropping off the search event horizon. -m

Thursday, March 31st, 2011

Follow me on Twitter

Not an April Fools joke. I’m now tweeting in professional capacity. I’ll talk about XML technologies, the web, and various and sundry geeky topics. -m

Thursday, February 3rd, 2011

We’ll always have Prague

Today I exchanged electrons with a major airline, which will ultimately result in them removing a certain amount of abstract currency units from my account.

In other words, see you all at XML Prauge 2011. I’ve never been to this conference before, and each year I hear better and better things. Looking forward to it. -m

Friday, January 7th, 2011

XForms Training: Feb 14, 15 in Maryland

The remarkable C. M. Sperberg-McQueen is offering XForms training in Maryland (at Mulberry Technologies), Feb 14 & 15, 2011. This is a two-day hands-on introduction to XForms. Check it out. This is a great opportunity to learn more about XForms. -m

Wednesday, January 5th, 2011

Why I am abandoning Yahoo! Mail (and why you should too)

This is a non-technical description of why Yahoo! Mail is unsafe to use in a public setting, and indeed at all. I will be pointing people at this page as I go through the long process of changing an address I’ve had for more than a decade.

What’s wrong with Yahoo Mail?

A lot of web addresses start with http://–that’s a signal that the “scheme” used to deliver the page to your browser is something called HTTP, which is a technical specification that turns out is a really good way to move around web pages. As the page flows to the browser, it’s susceptible to eavesdropping, particularly over a wi-fi connection, and much more so in public, including the usual hotspots like coffee shops, but also workplaces and many home environments. It’s the virtual equivalent of a postcard. When you’re reading the news or checking traffic, it’s not a big deal if someone can sneak a glance at your page.

Some addresses start with https://–notice the extra ‘s’ which stands for “secure”. This means two things 1) that the web page being sent over is encrypted, and thus unavailable to eavesdroppers, and 2) that the people running the site had to obtain a certificate, which is a form of proof of their identity as an organization (that they’re not, say, Ukrainian phishers). Many years ago, serving pages over https was considered quite expensive in that servers needed much beefier processors to run all that encryption. Today, while it still requires extra computation, it’s not as big of a deal. Most off-the-shelf servers have plenty of extra power. To be fair, for a truly ginormous application with millions of users like Yahoo Mail, it is not a trivial thing to roll out. But it’s critically important.

First, to dispel a point of confusion, these days nearly every site, including Yahoo Mail, uses https for the login screen. This is the most critical time when encryption is needed, because otherwise you’d be sending your password on a postcard for anyone with even modest technical skills to peek at. So that’s good, but it’s no longer enough. Because sites are written so that you don’t have to reenter your password on every single new page, they use a tiny bit of information called a “cookie” in your browser to stay logged in. Cookies themselves are neither good nor bad, but if an eavesdropper gets a hold of one, they can control most of your account–everything that doesn’t require re-entering a password. In Yahoo Mail this includes reading any of your messages, sending mail on your behalf, or even deleting messages. Are you comfortable allowing strangers to do this?

As I mentioned earlier, new, more powerful tools have been out for months that automate the process of taking over accounts this way. Zero technical prowess is needed, only the ability to install a browser plug-in. If there are any web companies dealing in personal information for which this wasn’t a all-hands-on-deck security wake-up, they are grossly negligent. Indeed, other sites like Gmail work with https all-the-time. But still, in 2011, Yahoo Mail doesn’t. I have a soft spot for Yahoo as a former employer, and I want to keep liking them. Too bad they make it so difficult.

The deeper issue at stake is that if this serious of an issue goes unfixed for months, how many lesser issues lurk in the site and have been around for months or years? The issue is trust, my friend, and Yahoo just overdrew their account. I’m leaving.

FAQ

Q: So what do you want Yahoo to do about this?  A: Well, they should fix their site for their millions of remaining users.

Q: What if they fix it tomorrow? Will you delete this message?  A: No. Since I no longer trust the site, I am leaving, even though it takes time to notify all the people who still send me mail, and no matter what other developments unfold in the meantime. This page will explain my actions.

Q: Do you really want everyone else to leave Yahoo Mail?  A: No, only those who care about their privacy.

Q: What’s your new email address?  A: I have a couple, but <my first name> @ <this domain> is a good general-purpose one.

I will continue to update this page as more information becomes available. -m

Thursday, September 9th, 2010

FCC opens its databases

Good news for big data fans. The FCC has released APIs to several large databases involving broadband statistics, spectrum licenses, and some related topics. I haven’t had a chance for a close look yet, perhaps we can do that together. Link. -m

Sunday, August 22nd, 2010

Eulogy for SearchMonkey

This is indeed a sad day for all of us, for on October 1, a great app will be gone. Though we hardly had enough time during his short life to get to know him, like the grass that withers and fades, this monkey will finish his earthly course.

Updated SearchMonkey logo

Photo by Micah

I know he left many things undone, for example only enhancing 60% of the delivered result pages. He never got a chance to finish his life’s ambition of promoting RDFa and microformats to the masses or to be the killer app of the (lower-case) semantic web. You could say he will live on as “some of this structured data processing will be supported natively by the Microsoft platform”. Part of the monkey we loved will live on as enhanced results continue to flow forth from the Yahoo/Bing alliance.

The SearchMonkey Alumni group on LinkedIn is filled with wonderful mourners. Micah Alpern wrote there

I miss the team, the songs, and the aspiration to solve a hard problem. Everything else is just code.

Isaac Asimov was reported to have said “If my doctor told me I had only six minutes to live, I wouldn’t brood. I’d type a little faster.” Today we can identify with that sentiment. Keep typing.

-m

Thursday, August 5th, 2010

Balisageurs: XML and JSON

At David Lee’s nocturne about XML and JSON round-trippimg, several folks were talking about a site that listed several “off-the-shelf” conversion methods, but nobody could remember the site.

Late that night, with 15 minutes of battery remaining, I found it. The operative search term is XSLTJSON. -m

Saturday, July 31st, 2010

Balisage bound

See all you markup extremists in Montreal. Look me up. -m

Wednesday, July 21st, 2010

Meade Classe August 7

Join me for another Meade Classe at the Los Altos MoreFlavor brew shop.

Saturday, Aug. 7, 2010 2:00 – 4:00 pm

MoreFlavor 991 N. San Antonio Road

Los Altos, CA 94022

We will taste some meads, focusing on sensory evaluation, then walk through the steps of brewing up a batch. As usual, seating is limited, so email me to reserve a spot. To help the brew shop recover the costs of the honey, yeast, and light snacks available, a $10 donation will help make sure these events can continue.

On a personal note, I’ll be traveling back from a conference in Montreal the day before, so I might be a little jet lagged. Could get interesting. :-) -m

Thursday, July 15th, 2010

VP XIV bound

Thrilled, THRILLED to announce that I’ve been accepted to the 2010 Viable Paradise workshop. I sent in the first 8000 words of a manuscript that about half of the 7 readers of this blog have looked at. You know, the one that is Science Fiction–literally, fiction about science. So I’ll be spending some time in early October at Martha’s Vineyard studying at the feet of published authors and honing my craft.

Class size is limited, so I’ve been actively psyching myself down for the last month, not getting my hopes up too high. Then when the acceptance came, I had computer down time, and nearly exploded from holding the news in for 3 days. :-)

Ahhhh. I should say more, but I believe I may still be in shock. -m

P. S. OK, how about 25 great opening lines.

Sunday, June 20th, 2010

RDBMS Alternatives

For anyone trying to get up to speed on the technology side of non-traditional databases, including NoSQL concepts and not-your-father’s-XML, this webinar looks like a good start. Tuesday June 29, 2pm EST, 11am PST. -m

Sunday, May 30th, 2010

Balisage contest: solving the wikiml problem

I wish I could say I had something to do with the planning of this: part of Balisage 2010 is a contest to “encourage markup experts to review and to research the current state of wiki markup languages and to generate a proposal that serves to de-babelize the current state of affairs for the long haul.”  To enter, you must propose a set of concrete steps (organizational, social, and/or technological) that will enable wiki content interchange, a real WYSIWYG editor, and/or wiki syntax standardization.

This pushes all of my buttons. It’s got structured documents, Web, parser geekery, writing, engineering, and standards. There’s a bunch of open source prior art, including PyXMLWiki, which I adapted from some fantastic earlier work from Rick Jelliffe.

Sadly, MarkLogic employees aren’t eligible to enter. Get your write-up done by July 15 and sent to balisage-2010-contest at marklogic dot com. The winner will be announced at Balisage and will take home some serious prize winnings, and also will be strongly encouraged (but not required) to give a brief summary (~10 minutes) of their winning entry.

Can’t wait to see what comes out of this. -m

Tuesday, May 11th, 2010

XProc is ready

Brief note: The W3C XProc specification, edited by my partner-in-crime Norm Walsh, has advanced to Recommendation status. Now go use it. -m

Thursday, April 29th, 2010

DMC = developer.marklogic.com

The new MarkLogic developer site is up, cleaner, better organized, and more social. Even cooler, it’s an XSLT-heavy application running on a pre-release version of MarkLogic. The new blog gives some of the details of the new site and transition.

So, if you’re already a MarkLogic developer, this is a great resource. And if you’re not, the site itself shows how fast and simple it is to put together a XSLT and XQuery-powered app. -m

Monday, February 22nd, 2010

Mark Logic User Conference 2010

Are you coming? Link. It starts on May 4 (Star Wars day!) at the InterContinental Hotel in San Francisco. Guest speakers include Chris Anderson, Editor-in-Chief of Wired and Michelle Manafy, Editor-in-Chief of EContent magazine.

Early bird registration ends Feb 28. -m

Friday, January 15th, 2010

Economic indicators: recruiting picking up again

I got a personal email pitch from recruiters at both Facebook and Google, oddly enough both messages within a 3-minute window on a Monday morning. Hiring is on the uptick again, it seems. My team is still looking for the right front end engineer–someone who knows the JavaScript language in depth, how to use semantic HTML and CSS, AND all about browser quirks. Email me. -m

Sunday, January 3rd, 2010

Geek Thoughts: the ultimate real-time strategy game

Games like Farmville and the iPhone knock-off iFarm throw in a unique twist in the realm of strategy gaming: crops that get planted mature in “real time”. If a crop takes 24 hours to grow, then you need to literally wait the full 24 hours. Great for making an app “sticky” and getting users to repeatedly log in. Side fact: Farmville sells more virtual tractors in a day than real tractors sold in the US in a Year.

Game producers keep upping the ante in terms of real-time strategy games interacting with the real world. Take the latest for instance, a free iPhone app called Lose It!. Everything in this game runs in real-time–a game day is always a full 24 hours. Instead of conventional points, it uses “calories”, which are gained by the actual foods you physically eat, and subtracted via actual exercise. The app includes a massive database of food items and exercises to help you keep an accurate record, apparently on the honor system. The goal: to set a calorie target for each day and come in under it. A secondary scoring system is based on your own weight, though you will need an accurate scale (not included with the app) to measure it.

So far I’ve done pretty well at the game. I’ve averaged better than 1000 calories under my goal for the last several weeks, and have done well on the weight number too. And it’s pretty interesting to have a log of everything I’ve eaten. What will they think of next?

More collected Geek Thoughts at http://geekthoughts.info.

Friday, December 11th, 2009

Tinderbox 5 is out

At first glance, this seems to be the Snow Leopard of Tinderbox releases–lots of behind-the-scenes technology updates and largely the same core features. If you’re looking for a way to get more organized, it’s worth a look. Link. -m

Thursday, December 10th, 2009

500th Post

Celebrating 500 posts since I went to WordPress in May 2006. Prior to that, an additional 730 posts as I floated through a typical evolution of blogging platforms:

  • Easy start: blogger (299 posts in 24 months)
  • Succumbing to the desire to roll your own (259 posts in 12 months)
  • Realizing that rolling your own is too difficult: Pyblosxom (172 posts in 12 months)
  • Moving to a mature platform you don’t need to worry about much: WordPress (500 posts in 42+ months)

-m

Friday, November 27th, 2009

Gandhi

Richard Attenborough’s epic biopic is available to watch instantly on Netflix, but only until November 30. Recommended viewing for the weekend. -m

Sunday, November 8th, 2009

High Temperature Superconductors

If this site is accurate, it’s now possible to have superconducting material at household freezer temperatures: 254k, or a tiny bit below 0F. From power lines to maglevs to supercolliders to energy storage, the potential applications boggle the mind. -m

Note: I’m having trouble finding independent verification of this, other than what appears to be re-hashes of the superconductor.org article. If you have any additional proof or refutation, please post it in the comments.

Thursday, November 5th, 2009

Metadata FTW

Link credit goes to Joho.

This looks pretty significant. The AZ Supreme Court ruled that document metadata must be disclosed under existing public records law. This may start a chain reaction with other states following suit. With the movement toward open data including data.gov and the Federal Register, this fits in well. Quite often metadata including creation date and author and the like make for much better searching and faceting. -m

Wednesday, October 21st, 2009

Application Builder behind-the-scenes

I’ll be speaking next Tuesday (Oct 27) at the Northern Virginia MarkLogic User Group (NOVAMUG). Here’s what I’ll be talking about.

Application Builder consists of two main parts: Search API to enable Google-style search string processing, and the actual UI wizard that steps users through building a complete search app. It uses a number of technologies that have not (at least not up until now!) been widely associated with MarkLogic. Why some technologies that seem like a perfect fit for XML apps are less used in the Mark Logic ecosystem is anyone’s guess, but one thing App Builder can contribute to the environment is some fresh DNA. Maybe your apps can benefit from these as well.

XForms and XRX. Clicking through the screens of App Builder is really a fancy way of editing XML. Upon first arriving on a page, the client makes a GET request to an “Application XML Endpoint” (axe.xqy) to get the current state of the project, which is rendered in the user interface. Interacting with the page edits the in-memory XML. Afterwards, the updated state is PUT back to the same endpoint upon clicking ‘Save’ or prior to navigating away. This is a classic XRX architecture. MarkLogic ships with a copy of the XSLTForms engine, which makes use of client-side XSLT to transform XForms Markup into divs, spans, classes, and JavaScript that can be processed entirely in the browser. Thus XForms works on all supported browsers all the way back to IE6. The apps built by the App Builder don’t use any XForms (yet!) but as App Builder itself demonstrates, it is a great platform for application development.

To be honest, many XForms apps have fallen short on the polished UI department. Not so with App Builder, IMHO. An early, and in hindsight somewhat misdirected, thrust of XForms advocacy pushed the angle of building apps with zero script needed. But one advantage of using a JavaScript implementation of XForms is that it frees you to use script as needed. So in many places, large amounts of UI, all mapped to XML, are able to be hidden away with CSS, and selectively revealed (or mapped to yet other HTML form controls) in small, self-contained overlays triggered via script. While it doesn’t fulfill the unrealistic promise of completely eliminating script, it’s a useful technique, one I predict we’ll see more of in the future.

Relax NG. XML Schema has its roots deep into the XML infrastructure. The type system of XQuery and XSLT 2.0 is based on it. Even XForms has ties to it. But for its wide reach, XML Schema 1.0 has some maddening limitations, and “takes some getting used to” before one can sight read it. In the appendices of many recent W3C specifications use the highly-readable compact syntax to describe content models is a way equally human and machine-readable.

What are these limitations I speak of? XML Schema 1.1 goes a long way toward resolving these, but isn’t yet widely in use. Take this example, the content model of the <options> element from Search API:

start = Options | Response

# Root element
OptionsType = (
 AdditionalQuery? &
 Annotation* &
 ConcurrencyLevel? &
 Constraint* &
 Debug? &
 DefaultSuggestionSource? &
 Forest* &
 Grammar? &
 Operator* &
 PageLength? &
 QualityWeight? &
 ReturnConstraints? &
 ReturnFacets? &
 ReturnMetrics? &
 ReturnQtext? &
 ReturnQuery? &
 ReturnResults? &
 ReturnSimilar? &
 SearchOption* &
 SearchableExpression? &
 SortOrder* &
 SuggestionSource* &
 Term? &
 TransformResults?
)

The start line indicates that, within this namespace, there are two possible root elements, either <options> or <response> (not shown here). An instance with a root of, say search:annotation is by definition not valid. Try representing that in XML Schema.

The definition of OptionsType allows a wide variety of child elements, some zeroOrMore times, other optional (zero or one occurrence), with no ordering restrictions at all between anything. XML Schema can’t represent this either. James Clark’s trang tool converts Relax NG into XML Schema, and has to approximate this as an xsd:choice with maxOccurs=”unbounded”, thus the elements that can only occur once are not schema-enforced. Thus the Relax NG description of the content model, besides being more readable, actually contains more information than the closest XML Schema. So particularly for XML vocabularies that are optimized for human use, Relax NG is a good choice for schema development.

Out of line validation. So if XML Schema doesn’t fully describe the <options> node, how can authors be sure they have constructed one correctly? Take a step back: even if XML Schema could fully represent the content model, for performance reasons you wouldn’t want to repeatedly validate the node on every query. The options node tends to change infrequently, mainly during a development cycle. Both of these problems can be solved with out-of-line validation: a separate function call search:check-options().

Inside this function you’ll find a validate expression that will make as much use of the Schema as it can, but also much more. The full power of XQuery can be leveraged against the proposed <options> node to check for errors or inconsistencies, and provide helpful feedback to the developer. Since it happens out-of-line, these checks can take substantially longer than actually handing the query based on them. The code can go as in-depth as it needs to without performance worries. This is a useful technique in many situations. One potential shortfall is that people might forget to call your validation function, but in practice this hasn’t been too much trouble.

Higher-order functions. The predecessor to Search API had a problem that it was so popular that users would modify it to suit their unique requirements, which lead to dozens of minor variations floating around in the wild. Different users have different needs and expectations for the library, and making a one-size-fits-all solution is perhaps not possible. One way to relieve this kind of feature pressure is to provide enough extension hotspots to allow all the kinds of tweaks that users will want, preserving the mainline code. This involves prediction, which is difficult (especially about the future). But a good design makes this possible.

Look inside the built-app and you will find a number of function pointers, implemented as a new datatype xdmp:function. XQuery 1.1 will have a more robust mechanism for this, but it might be a while before this is widespread. By modifying one file, changing a pointer to different code, nearly every aspect of the application can be adjusted.

Similarly, a few hotspots in the Search API can be customized, to hook in different kinds of parsers or snippet-generators. This powerful technique can take your own apps to the next level.

-m

Wednesday, October 21st, 2009

XForms 1.1 is out

XForms 1.1 is now a full W3C Recommendation. Compared to version 1.0, which went live a bit more than 6 years ago, version 1.1 offers lots of road-tested tools that make development easier and more powerful, including new datatypes and XPath functions, a significantly more powerful submission subsystem, and a more flexible event model.

And XSLTForms already supports almost all of the new goodies. -m