<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MicahLogic &#187; xpath</title>
	<atom:link href="http://dubinko.info/blog/tags/standards/xpath/feed/" rel="self" type="application/rss+xml" />
	<link>http://dubinko.info/blog</link>
	<description>From an XML geek, a reader, a writer, a connector, a man of the people (says keep hope alive)</description>
	<lastBuildDate>Thu, 02 Feb 2012 06:43:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Recalibrating expectations of XML performance</title>
		<link>http://dubinko.info/blog/2010/04/02/xml-performance/</link>
		<comments>http://dubinko.info/blog/2010/04/02/xml-performance/#comments</comments>
		<pubDate>Sat, 03 Apr 2010 06:56:34 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[commercialism]]></category>
		<category><![CDATA[Mark Logic]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[marklogic]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/?p=376</guid>
		<description><![CDATA[Working at MarkLogic has forced me to recalibrate my expectations around XML-related performance issues. Not to brag or anything, but it&#8217;s screaming fast. Conventional wisdom of avoiding // in paths doesn&#8217;t apply, since that&#8217;s the sort of thing the indexes are made to do, and that&#8217;s just the start. Single milliseconds are now a noteworthy [...]]]></description>
			<content:encoded><![CDATA[<p>Working at MarkLogic has forced me to recalibrate my expectations around XML-related performance issues. Not to brag or anything, but it&#8217;s screaming fast. Conventional wisdom of avoiding <code>//</code> in paths doesn&#8217;t apply, since that&#8217;s the sort of thing the indexes are made to do, and that&#8217;s just the start. Single milliseconds are now a noteworthy amount of time for something showing up in the profiler.</p>
<p>This is what XML was supposed to be like. Now that XML has fallen off the hype cycle, we&#8217;re getting some serious work done. -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2010/04/02/xml-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>XML 2008 liveblog: Introduction to Schematron</title>
		<link>http://dubinko.info/blog/2008/12/09/xml-2008-liveblog-introduction-to-schematron/</link>
		<comments>http://dubinko.info/blog/2008/12/09/xml-2008-liveblog-introduction-to-schematron/#comments</comments>
		<pubDate>Tue, 09 Dec 2008 21:58:40 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[languages]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[assertion]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[schematron]]></category>
		<category><![CDATA[validation]]></category>
		<category><![CDATA[xml2008]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/?p=403</guid>
		<description><![CDATA[Wendell Piez, Mulberry Technologies Assertion-based schema language. A way to test XML documents. Rule-based validation language. Cool report generator. Good for capturing edge cases. Same architecture as XSLT. (Schematron specifies, does not perform) &#60;schema xmlns="http://purl.cclc.org/dsdl/schematron"&#62; &#60;title&#62;Check sections 12/07&#60;/title&#62; &#60;pattern id="section-check"&#62; &#60;rule context="section"&#62; &#60;assert test="title"&#62;This section has no title&#60;/assert&#62; &#60;report test="p"&#62;This section has paragraphs&#60;/report&#62; ... Demo. [...]]]></description>
			<content:encoded><![CDATA[<p>Wendell Piez, Mulberry Technologies</p>
<p>Assertion-based schema language. A way to test XML documents. Rule-based validation language. Cool report generator. Good for capturing edge cases.</p>
<p>Same architecture as XSLT. (Schematron <em>specifies</em>, does not <em>perform</em>)</p>
<pre>&lt;schema xmlns="http://purl.cclc.org/dsdl/schematron"&gt;
  &lt;title&gt;Check sections 12/07&lt;/title&gt;
  &lt;pattern id="section-check"&gt;
    &lt;rule context="section"&gt;
      &lt;assert test="title"&gt;This section has no title&lt;/assert&gt;
      &lt;report test="p"&gt;This section has paragraphs&lt;/report&gt;
      ...
</pre>
<p>Demo. OxygenXML has support. Assert vs. Report &#8211; essentially opposites. Assert means &#8220;tell me if this if false&#8221;. Report means &#8220;tell me if this is true&#8221;.</p>
<p>&#8220;Almost as if Schematron is a harness for XPath testing.&#8221;</p>
<p>More examples:</p>
<pre>&lt;rule context="note"&gt;
  &lt;report test="ancestor::note"&gt;A note appears in a note. OK?&lt;/report&gt;
&lt;/rule&gt;
</pre>
<p>Binding: Default is XSLT 1, but flexible enough to allow other query langauges via attribute @queryBinding at the top. Many processors allow mix-and-match between XSLT and Schematron. Examples showing just that.</p>
<p>Some tests can be very useful:</p>
<p>test=&#8221;every $line in tokenize(., $newline) satisfies string-length($line) le 72&#8243;</p>
<p>Q: What if the destination is not a human, but another part of a pipeline? Varies by implementation, but SVRL is standardized as an annex in the ISO spec, part of DSDL.</p>
<p>Use as little or as much as you want, at different times in the document lifecycle. &#8220;Schematron is a feather duster that reaches areas other schema languages cannot.&#8221; &#8211; Rick Jelliffe</p>
<p>As time permits section of the talk:</p>
<p>Other top-level elements: title, pattern, ns, let, p, include, phase, diagnostics.</p>
<p>-m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/12/09/xml-2008-liveblog-introduction-to-schematron/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>XML 2008 liveblog: Exploring the New Features of XSLT 2.0</title>
		<link>http://dubinko.info/blog/2008/12/09/xml-2008-liveblog-exploring-the-new-features-of-xslt-20/</link>
		<comments>http://dubinko.info/blog/2008/12/09/xml-2008-liveblog-exploring-the-new-features-of-xslt-20/#comments</comments>
		<pubDate>Tue, 09 Dec 2008 15:45:24 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[xml2008]]></category>
		<category><![CDATA[XSLT]]></category>
		<category><![CDATA[xslt2]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/?p=398</guid>
		<description><![CDATA[Priscilla Walmsley, Datypic. &#8220;I feel like crying every time I have to go back to 1.0.&#8221; Normally this is a full-day course. Familiarity with XSLT 1.0 assumed here. Venn diagram&#8230; Much of what people think of as &#8220;XQuery&#8221; is actually XPath 2.0. XPath differences: root node -&#62; &#8220;document node&#8221;. Namespace nodes, axis are deprecated. More [...]]]></description>
			<content:encoded><![CDATA[<p>Priscilla Walmsley, Datypic.</p>
<p>&#8220;I feel like crying every time I have to go back to 1.0.&#8221; Normally this is a full-day course. Familiarity with XSLT 1.0 assumed here. Venn diagram&#8230; Much of what people think of as &#8220;XQuery&#8221; is actually XPath 2.0.</p>
<p>XPath differences: root node -&gt; &#8220;document node&#8221;. Namespace nodes, axis are deprecated. More atomic types, based on XML Schema. Node-set -&gt; sequence. Path steps can be expressions, like <code>product/(if (desc) then desc else name)</code>. Last step can return an atomic value, like <code>sum(//item/(@price * @qty))</code>.</p>
<p>Comparison operators apply to strings, dates, times. (Backwards compatibility note: comparing strings now is done by Unicode code point, not by conversion to number() as in XPath 1.0). Arithmetic possible on dates, durations. Missing value returns empty sequence rather than NaN.</p>
<p>(a,b) to concat sequences. New operators: idiv, union, intersect, except (latter 3 for nodes only)</p>
<p><code>&lt;xsl:for-each select="1 to $count"&gt;</code> is handy. Operators &lt;&lt; and &gt;&gt; test &#8216;precedes&#8217; and &#8216;follows&#8217; based on document order. Operator &#8216;is&#8217; tests node identity.</p>
<p>Statement if/then/else is a more compact xsl:choose. Simplified FLWOR (only one for, no let or where).</p>
<p>Useful functions: ends-with(), string-join(), current-date(), distinct-values(), deep-equal().</p>
<p>From XPath to XSLT: <code>&lt;xsl:for-each-group&gt;</code> with current-group() and current-grouping-key(). Useful for turning a flat document (like HTML with h1, h2, etc. into nested structure. group-starting-with=&#8221;html:h1&#8243;, etc. The instruction <code>&lt;xsl:function&gt;</code> allows defining a new function. Major benefits in reuse, clarity, and handling recursion. Custom functions can be called from more places, like @select, @group-by, @match, but have the same expressive power of a named template.</p>
<p>Regular expressions: some XPath functions matches(), tokenize(), replace() (including subexpressions). <code>&lt;xsl:analyze-string&gt;</code> splits a string into matching and non-matching parts, handled separately in <code>&lt;xsl:matching-substring&gt;</code> and <code>&lt;xsl:non-matching-substring&gt;</code> child elements and regex-group().</p>
<p>I/O: Instruction <code>&lt;xsl:result-document&gt;</code> allows multiple output files. unparsed-text() allows input of non-XML documents (particularly in conjunction with regex).</p>
<p>Do I have to pay attention to types? &#8220;Usually, no.&#8221; BUT schemas can help catch errors, improve performance, and open new avenues of processing (like matching a template based on a schema-type).</p>
<p>Odds and ends: tunneling parameters (don&#8217;t have to repeat all the params for named templates), multiple modes, @select in more places, @separator attribute on xsl:attribute and xsl:value-of.</p>
<p>Brief Q&amp;A: No test suite available. Probably better for new users to jump straight into 2.0. But going back to 1.0 is still painful. -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/12/09/xml-2008-liveblog-exploring-the-new-features-of-xslt-20/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>XiX: Details about XForms in XQuery</title>
		<link>http://dubinko.info/blog/2008/11/04/xix-details-about-xforms-in-xquery/</link>
		<comments>http://dubinko.info/blog/2008/11/04/xix-details-about-xforms-in-xquery/#comments</comments>
		<pubDate>Tue, 04 Nov 2008 07:05:20 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[XForms]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[XQuery]]></category>
		<category><![CDATA[calculate]]></category>
		<category><![CDATA[functional]]></category>
		<category><![CDATA[model]]></category>
		<category><![CDATA[xix]]></category>
		<category><![CDATA[xquery]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/?p=380</guid>
		<description><![CDATA[I was asked offline for more details about what I have in mind around XiX. Take a simple piece of XML, like this: &#60;root&#62;&#60;a&#62;3&#60;/a&#62;&#60;b&#62;4&#60;/b&#62;&#60;total/&#62;&#60;/root&#62;. An XForms Model can be applied, in an out-of-line fashion, to that instance. This is done through a bind element, with XPath to identify the nodes in question, plus other &#8220;model [...]]]></description>
			<content:encoded><![CDATA[<p>I was asked offline for more details about what I have in mind around XiX.</p>
<p>Take a simple piece of XML, like this: <code>&lt;root&gt;&lt;a&gt;3&lt;/a&gt;&lt;b&gt;4&lt;/b&gt;&lt;total/&gt;&lt;/root&gt;</code>.</p>
<p>An XForms Model can be applied, in an out-of-line fashion, to that instance. This is done through a <code>bind</code> element, with XPath to identify the nodes in question, plus other &#8220;model item properties&#8221; to annotate the instance. The <code>calculate</code> property is a good one: <code>&lt;bind nodeset="total" calculate="../a + ../b"/&gt;</code>. When called upon to refresh the instance, as you would expect, the result contains the node <code>&lt;total&gt;7&lt;/total&gt;</code>.</p>
<p>Like lots of algorithms, though, XForms is defined in a thoroughly procedural manner. Functional programming has a stricture against assignment operators, like setting the value &#8220;7&#8243; into the calculated node above. So the challenge is coming up with an implementation that works within these bounds. For example, perhaps a function that takes an original instance as input, and returns a newly-created updated instance. Simple enough for the example here, but in more complex cases with different and interacting model item properties, regenerating the entire instance frequently has performance penalties.</p>
<p>So, I&#8217;m trying to find the right expression of the XForms Model in a functional guise. (As <a href="http://rdfa.info/wiki/Functional_RDFa">with</a> RDFa). I&#8217;m curious about what anyone else has come up with in this area. -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/11/04/xix-details-about-xforms-in-xquery/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Top Down Operator Precedence in Python</title>
		<link>http://dubinko.info/blog/2008/07/15/top-down-operator-precedence-in-python/</link>
		<comments>http://dubinko.info/blog/2008/07/15/top-down-operator-precedence-in-python/#comments</comments>
		<pubDate>Wed, 16 Jul 2008 04:17:41 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[pratt]]></category>
		<category><![CDATA[tdop]]></category>
		<category><![CDATA[webpath]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/?p=302</guid>
		<description><![CDATA[This article made my day. Very similar approach to what I did in WebPath, but even cleaner. Great explanation and performance numbers. -m P.S. Thanks to Crock for pointing this out.]]></description>
			<content:encoded><![CDATA[<p><a title=" Simple Top-Down Parsing in Python " href="http://effbot.org/zone/simple-top-down-parsing.htm">This article</a> made my day. Very similar approach to what I did in <a href="http://sourceforge.net/projects/webpath">WebPath</a>, but even cleaner. Great explanation and performance numbers. -m</p>
<p>P.S. Thanks to <a href="http://360.yahoo.com/profile-TBPekxc1dLNy5DOloPfzVvFIVOWMB0li">Crock</a> for pointing this out.</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/07/15/top-down-operator-precedence-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>XForms Validator on Google App Engine?</title>
		<link>http://dubinko.info/blog/2008/05/28/xforms-validator-on-google-app-engine/</link>
		<comments>http://dubinko.info/blog/2008/05/28/xforms-validator-on-google-app-engine/#comments</comments>
		<pubDate>Thu, 29 May 2008 03:58:45 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[announcement]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[languages]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[XForms]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[essentials]]></category>
		<category><![CDATA[relaxng]]></category>
		<category><![CDATA[validator]]></category>
		<category><![CDATA[xfv]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/?p=267</guid>
		<description><![CDATA[I registered &#8216;xfv&#8217; on Google App Engine. Too bad there doesn&#8217;t appear to be any significant XML libraries supported. I have XPath covered by my pure-python WebPath, but what about Relax NG? Anyone know of anything in pure python? -m]]></description>
			<content:encoded><![CDATA[<p>I registered &#8216;xfv&#8217; on Google App Engine. Too bad there doesn&#8217;t appear to be any significant XML libraries supported. I have XPath covered by my pure-python WebPath, but what about Relax NG? Anyone know of anything in pure python? -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/05/28/xforms-validator-on-google-app-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FunctX XQuery library</title>
		<link>http://dubinko.info/blog/2008/05/09/functx-xquery-library/</link>
		<comments>http://dubinko.info/blog/2008/05/09/functx-xquery-library/#comments</comments>
		<pubDate>Sat, 10 May 2008 06:33:41 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[xpath]]></category>
		<category><![CDATA[exslt]]></category>
		<category><![CDATA[functions]]></category>
		<category><![CDATA[xquery]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/?p=254</guid>
		<description><![CDATA[In the new-to-me department, here&#8217;s a library and description of useful XQuery functions from my friend Priscilla Walmsley. XSLT 2, also. -m P.S. Mark my words, more news is coming&#8230;]]></description>
			<content:encoded><![CDATA[<p>In the new-to-me department, here&#8217;s a <a href="http://www.xqueryfunctions.com/xq/">library and description of useful XQuery functions</a> from my friend Priscilla Walmsley. XSLT 2, also. -m</p>
<p>P.S. Mark my words, more news is coming&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/05/09/functx-xquery-library/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WebPath and Wikipedia</title>
		<link>http://dubinko.info/blog/2008/03/03/webpath-and-wikipedia/</link>
		<comments>http://dubinko.info/blog/2008/03/03/webpath-and-wikipedia/#comments</comments>
		<pubDate>Mon, 03 Mar 2008 17:15:08 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[annoyance]]></category>
		<category><![CDATA[intentional web]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/2008/03/03/webpath-and-wikipedia/</guid>
		<description><![CDATA[The WebPath bug reports continue to roll in. For one, queries against *.wikipedia.* don&#8217;t seem to work. You get something back, but it has no resemblance to the page you were looking for. The problem comes from the W3C tidy service that I use, specifically that the (understandably overworked and understaffed) admins at the Wikimedia [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://sourceforge.net/projects/webpath">WebPath</a> bug reports continue to roll in. For one, queries against *.wikipedia.* don&#8217;t seem to work. You get <em>something</em> back, but it has no resemblance to the page you were looking for. The problem comes from the <a href="http://cgi.w3.org/cgi-bin/tidy">W3C tidy service</a> that I use, specifically that the (understandably overworked and understaffed) admins at the Wikimedia Foundation seem to have blocked it. It seems like more than a simple IP or user-agent-based block. I&#8217;ve emailed them about it but haven&#8217;t heard back yet.</p>
<p>So, this highlights the limitation of having a single-source converter in the <a href="http://webpath.svn.sourceforge.net/viewvc/webpath/trunk/platonicweb.py?revision=10&amp;view=markup">Platonic Web</a> module of WebPath. So I turn to my readers: do you know of any other tidy servers? Or converters of a non-tidy origin? For any of these to work, they need to return clean XML corresponding to the original page (as opposed to, say, returning something with big headers/footers or ampersand-encoded). This seems like an outstanding need for the open source community.</p>
<p>Please comment below with ideas. Thanks! -m</p>
<p>UPDATE: heard back from the Wikipedia admins, and although professional and helpful-as-can-be-expected, they won&#8217;t be changing anything on their end. Still looking for more open source options.</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/03/03/webpath-and-wikipedia/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WebPath on next.yahoo</title>
		<link>http://dubinko.info/blog/2008/02/13/webpath-on-nextyahoo/</link>
		<comments>http://dubinko.info/blog/2008/02/13/webpath-on-nextyahoo/#comments</comments>
		<pubDate>Thu, 14 Feb 2008 06:54:07 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[announcement]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/2008/02/13/webpath-on-nextyahoo/</guid>
		<description><![CDATA[It&#8217;s been an exhausting past couple of weeks, but life goes on. WebPath made front page at next.yahoo. I&#8217;m starting to get feedback from developers who are actually using it, filing bugs, suggesting features, and it&#8217;s gratifying. The community is still building up. Won&#8217;t you join too? -m]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been an exhausting past couple of weeks, but life goes on. WebPath made <a href="http://next.yahoo.net/archives/94/webpath-goes-open-source" title="WebPath Goes Open Source">front page at next.yahoo</a>. I&#8217;m starting to get feedback from developers who are actually using it, filing bugs, suggesting features, and it&#8217;s gratifying. The community is still building up. Won&#8217;t you join too? -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/02/13/webpath-on-nextyahoo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WebPath: Python XPath 2 engine now up on Sourceforge</title>
		<link>http://dubinko.info/blog/2008/01/25/webpath-python-xpath-2-engine-now-up-on-sourceforge/</link>
		<comments>http://dubinko.info/blog/2008/01/25/webpath-python-xpath-2-engine-now-up-on-sourceforge/#comments</comments>
		<pubDate>Sat, 26 Jan 2008 06:57:47 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[announcement]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/2008/01/25/webpath-python-xpath-2-engine-now-up-on-sourceforge/</guid>
		<description><![CDATA[I&#8217;ve taken this opportunity to ditch CVS on all my existing Sourceforge projects (pyxmlwiki, xfv) while setting up my newest project. Here&#8217;s the browable subversion source. Have at it. Where should you start with this code? Step zero, if you haven&#8217;t already, is to look through my XML 2007 slides on my site. First thing [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve taken this opportunity to ditch CVS on all my existing Sourceforge projects (<a href="http://sourceforge.net/projects/pyxmlwiki" title="Flexible structured text parser">pyxmlwiki</a>, <a href="http://sourceforge.net/projects/xfv" title="XForms Validator">xfv</a>) while setting up my <a href="http://sourceforge.net/projects/webpath" title="WebPath">newest project</a>. Here&#8217;s the <a href="http://webpath.svn.sourceforge.net/">browable subversion source</a>. Have at it.</p>
<p>Where should you start with this code? Step zero, if you haven&#8217;t already, is to look through my <a href="http://dubinko.info/events/XML2007/">XML 2007 slides</a> on my site. First thing is to grab a copy of <a href="http://www.dabeaz.com/ply/">PLY</a>, which is a dependency. Then with all these files in your current directory, run python with no parameters. At the interpreter prompt type <code>import demo</code> then <code>demo.demo1()</code>, <code>demo.demo2()</code>, and so on. This will give you a feel for how the system works. Look at the source of demo.py to see how it works at the high level.</p>
<p>To actually get into the code, I suggest opening webpath.py and scrolling down to the end, where a large series of unit tests begins. Tracing through these will be (I hope!) instructive on how the various details of the engine are put together.</p>
<p>There are many missing pieces (a few intentionally so). So have a look around the code and start thinking about what you could do with it. One thing I would love to have happen soon is getting rid of minidom, replacing it with something more robust.</p>
<p>If you want developer access on Sourceforge, drop me a note with your sf username. -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/01/25/webpath-python-xpath-2-engine-now-up-on-sourceforge/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WebPath wants to be free (BSD licensed, specifically)</title>
		<link>http://dubinko.info/blog/2008/01/24/webpath-wants-to-be-free-bsd-licensed-specifically/</link>
		<comments>http://dubinko.info/blog/2008/01/24/webpath-wants-to-be-free-bsd-licensed-specifically/#comments</comments>
		<pubDate>Fri, 25 Jan 2008 06:42:23 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[announcement]]></category>
		<category><![CDATA[intentional web]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/2008/01/24/webpath-wants-to-be-free-bsd-licensed-specifically/</guid>
		<description><![CDATA[WebPath, my experimental XPath 2.0 engine in Python is now an open source project with a liberal BSD license. I originally developed this during a Yahoo! Hack Day, and now I get to announce it during another Hack Day. Seems appropriate. The focus of WebPath was rapid development and providing an experimental platform. There remains [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://farm3.static.flickr.com/2044/2214215295_e322799f6d_m.jpg" style="cursor: pointer" class="yfsc_image" id="yfsc_1_70651647@N00" align="left" /></p>
<p>WebPath, my experimental XPath 2.0 engine in Python is now an open source project with a liberal BSD license. I originally developed this during a Yahoo! Hack Day, and now I get to announce it during another Hack Day. Seems appropriate.</p>
<p>The focus of WebPath was rapid development and providing an experimental platform. There remains tons of potential work left to do on it&#8230;watch this space for continued discussion. I&#8217;d like to call out special thanks to the <span class="yshortcuts" id="lw_1201243487_0">Yahoo</span>! management for supporting me on this, and to Douglas Crockford for turning me on to <a href="http://javascript.crockford.com/tdop/index.html" title="as seen in _Beautiful Code_">Top Down Operator Precedence</a> parsers. Have a look at the code. You might be pleasantly surprised at how small and simple a basic XPath 2 engine can be. So, who&#8217;s up for some XPath hacking?</p>
<p><a href="http://dubinko.info/events/XML2007/WebPath.zip" title="20k zip file">Code download</a>. (Coming to <span class="yshortcuts" id="lw_1201243487_1">SourceForge</span> with <span class="yshortcuts" id="lw_1201243487_2">CVS</span>, etc., in however many days it takes them to approve a new project) I hope this inspires more developers to work on similar projects, or better yet, on this one! -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2008/01/24/webpath-wants-to-be-free-bsd-licensed-specifically/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>XPath puzzler: solution</title>
		<link>http://dubinko.info/blog/2007/12/31/xpath-puzzler-solution/</link>
		<comments>http://dubinko.info/blog/2007/12/31/xpath-puzzler-solution/#comments</comments>
		<pubDate>Mon, 31 Dec 2007 07:39:43 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[annoyance]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/2007/12/31/xpath-puzzler-solution/</guid>
		<description><![CDATA[Thanks to all the folks who showed interest in this little XPath puzzler published here a few weeks ago. Some asked to see the dataset, but I&#8217;m not able to release it at this time (but ask me again in 3 months). Turns out it was a combination of two bugs, one mine, one somebody [...]]]></description>
			<content:encoded><![CDATA[<p>Thanks to all the folks who showed interest in this little <a href="http://dubinko.info/blog/2007/12/15/xpath-puzzler/" title="XPath puzzler">XPath puzzler</a> published here a few weeks ago. Some asked to see the dataset, but I&#8217;m not able to release it at this time (but ask me again in 3 months).</p>
<p>Turns out it was a combination of two bugs, one mine, one somebody else&#8217;s. Careful observers noted that I wasn&#8217;t using any namespace prefixes in the XPath, and since I did specify that it was XPath 1.0, that technically rules out XHTML as the source language. Like nearly all XML I work with these days, the first thing I do is strip off the namespaces to make it easier to work with. Bug #1 was that in a few cases, the namespaces didn&#8217;t get stripped.</p>
<p>Bug #2 was in the XPath engine itself. Which one? Uh, whatever one ships with the &#8220;XPath&#8221; plugin for JEdit. It&#8217;s hard to tell directly, but I think it might be an older version of Xalan-J. In the case of the expression <code>//meta</code>, it properly located only those elements part of no namespace. But in the case of <code>//meta/@property</code>, it was including all the nodes that would have been selected by <code>//*[local-name(.)='meta']/@property</code>. Hence, a larger number of returned nodes.</p>
<p>Confusing? You bet!  -m</p>
<p>P.S. WebPath would not have this problem, since in the default mode it matches local-names only to begin with.</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2007/12/31/xpath-puzzler-solution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slides from XML 2007: WebPath: Querying the Web as XML</title>
		<link>http://dubinko.info/blog/2007/12/16/slides-from-xml-2007-webpath-querying-the-web-as-xml/</link>
		<comments>http://dubinko.info/blog/2007/12/16/slides-from-xml-2007-webpath-querying-the-web-as-xml/#comments</comments>
		<pubDate>Mon, 17 Dec 2007 02:19:54 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[announcement]]></category>
		<category><![CDATA[intentional web]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/2007/12/16/slides-from-xml-2007-webpath-querying-the-web-as-xml/</guid>
		<description><![CDATA[Here&#8217;s the slides from my presentation at XML 2007, dealing with an implementation of XPath 2.0 in Python. I hope to have even more news in this area soon. WebPath (html) WebPath (OpenDocument, 4.7 megs) Did you notice the OpenOffice has nice slide export, that generates both graphically-accurate slides and highly indexable and accessible text [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s the slides from my presentation at XML 2007, dealing with an implementation of XPath 2.0 in Python. I hope to have even more news in this area soon.</p>
<p><a href="http://dubinko.info/events/XML2007/WebPath_XML2007.html">WebPath</a> (html)</p>
<p><a href="http://dubinko.info/events/XML2007/WebPath_XML2007.odp">WebPath</a> (OpenDocument, 4.7 megs)</p>
<p>Did you notice the OpenOffice has nice slide export, that generates both graphically-accurate slides and highly indexable and accessible text versons? -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2007/12/16/slides-from-xml-2007-webpath-querying-the-web-as-xml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>XPath puzzler</title>
		<link>http://dubinko.info/blog/2007/12/15/xpath-puzzler/</link>
		<comments>http://dubinko.info/blog/2007/12/15/xpath-puzzler/#comments</comments>
		<pubDate>Sun, 16 Dec 2007 05:25:10 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[everythingismiscellaneous]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/2007/12/15/xpath-puzzler/</guid>
		<description><![CDATA[While I&#8217;ve got your attention, here&#8217;s an XPath (1.0) puzzler. I have an RDFa dataset compiled from various and sundry sources. It&#8217;s all wrapped up in a single XML file. I run this XPath to see how many meta elements are present: //meta and it returns a node-set of size 762. Now, I want to [...]]]></description>
			<content:encoded><![CDATA[<p>While I&#8217;ve got your attention, here&#8217;s an XPath (1.0) puzzler. I have an RDFa dataset compiled from various and sundry sources. It&#8217;s all wrapped up in a single XML file. I run this XPath to see how many meta elements are present: <code>//meta</code> and it returns a node-set of size 762. Now, I want to see how many property elements are present, so I run the query: <code>//meta/@property</code> and it returns a node-set of size 764. How is it that the second node-set can be bigger than the first? -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2007/12/15/xpath-puzzler/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>XPath 2.0 implementation details</title>
		<link>http://dubinko.info/blog/2007/11/29/xpath-20-implementation-details/</link>
		<comments>http://dubinko.info/blog/2007/11/29/xpath-20-implementation-details/#comments</comments>
		<pubDate>Thu, 29 Nov 2007 19:29:14 +0000</pubDate>
		<dc:creator>mdubinko</dc:creator>
				<category><![CDATA[software]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://dubinko.info/blog/2007/11/29/xpath-20-implementation-details/</guid>
		<description><![CDATA[Well, my plans for a series of postings about details of implementing XPath 2.0 fell rather short, so let&#8217;s skip straight to the good stuff. An article by Mike Kay giving the details of the Saxon architecture. On the surface it&#8217;s about performance, but it also has an excellent section in internals. Worth a look. [...]]]></description>
			<content:encoded><![CDATA[<p>Well, my plans for a series of postings about details of implementing XPath 2.0 fell rather short, so let&#8217;s skip straight to the good stuff.</p>
<p>An <a href="http://idealliance.org/papers/dx_xmle04/papers/02-03-02/02-03-02.html">article by Mike Kay</a> giving the details of the Saxon architecture. On the surface it&#8217;s about performance, but it also has an excellent section in internals. Worth a look. This has been quite influential for me, and maybe you too. -m</p>
]]></content:encoded>
			<wfw:commentRss>http://dubinko.info/blog/2007/11/29/xpath-20-implementation-details/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

