WebPath: Python XPath 2 engine now up on Sourceforge

I’ve taken this opportunity to ditch CVS on all my existing Sourceforge projects (pyxmlwiki, xfv) while setting up my newest project. Here’s the browable subversion source. Have at it.

Where should you start with this code? Step zero, if you haven’t already, is to look through my XML 2007 slides on my site. First thing is to grab a copy of PLY, which is a dependency. Then with all these files in your current directory, run python with no parameters. At the interpreter prompt type import demo then demo.demo1(), demo.demo2(), and so on. This will give you a feel for how the system works. Look at the source of demo.py to see how it works at the high level.

To actually get into the code, I suggest opening webpath.py and scrolling down to the end, where a large series of unit tests begins. Tracing through these will be (I hope!) instructive on how the various details of the engine are put together.

There are many missing pieces (a few intentionally so). So have a look around the code and start thinking about what you could do with it. One thing I would love to have happen soon is getting rid of minidom, replacing it with something more robust.

If you want developer access on Sourceforge, drop me a note with your sf username. -m

2 Replies to “WebPath: Python XPath 2 engine now up on Sourceforge”

Hey! I managed to stumble across this while looking for a lightweight implementation of XPath written in Python (as opposed to relying on C, and re-implementing everything under the sun as part of it (like the entire DOM)).

On the subject of minidom, I’d say just rely on having an implementation of DOM (level 3 I assume you’d want for compareDocumentPosition() so you return NodeLists in document order). Admittedly, this prohibits use of minidom, but what are the real issues with it? Just how “mini” it is?

And on one final note, a possible bug (though my XPath may just be wrong :)): //span[@title='XHTML'] causes:

File “/Volumes/Data/Source/webpath/wpcore.py”, line 42, in string
return ”.join([n.nodeValue for n in allnodes if n.nodeType==3])
AttributeError: Text instance has no attribute ‘nodeValue’

Alas, I’d offer to help if I could, but my knowledge of XPath is a bit too limited to be of much help (and the source is pushing my knowledge of Python, but I can just cope with that).

Ah, the above issue seems to be with minidom, and not WebPath. I’ve moved over to using pxdom, and that (at least) works all right.

Comments are closed.

WebPath: Python XPath 2 engine now up on Sourceforge

Related Posts

Open to work: What I’m Looking For

Cleaning Data by David Mertz now available

2 Replies to “WebPath: Python XPath 2 engine now up on Sourceforge”