XPath puzzler: solution
Thanks to all the folks who showed interest in this little XPath puzzler published here a few weeks ago. Some asked to see the dataset, but I’m not able to release it at this time (but ask me again in 3 months).
Turns out it was a combination of two bugs, one mine, one somebody else’s. Careful observers noted that I wasn’t using any namespace prefixes in the XPath, and since I did specify that it was XPath 1.0, that technically rules out XHTML as the source language. Like nearly all XML I work with these days, the first thing I do is strip off the namespaces to make it easier to work with. Bug #1 was that in a few cases, the namespaces didn’t get stripped.
Bug #2 was in the XPath engine itself. Which one? Uh, whatever one ships with the “XPath” plugin for JEdit. It’s hard to tell directly, but I think it might be an older version of Xalan-J. In the case of the expression
//meta, it properly located only those elements part of no namespace. But in the case of
//meta/@property, it was including all the nodes that would have been selected by
//*[local-name(.)='meta']/@property. Hence, a larger number of returned nodes.
Confusing? You bet!Â -m
P.S. WebPath would not have this problem, since in the default mode it matches local-names only to begin with.