Archive for the 'python' Category

Wednesday, January 7th, 2009

On porting WebPath to Python 3k

I’ve started looking into porting the WebPath code (and eventually XForms Validator) over to Python 3. The first step is external libraries, of which there is only one. WebPath uses the module from PLY. I had got it into my head that Python 2.x and 3.x were thoroughly incompatible, but leave it to the remarkable David Beazley to blow that assumption out of the water: the latest version of from SVN works in both 2.x and 3.x.

From there the included 2to3 tool was easy enough to run. (Relatively more difficult was getting 2.6 and 3.0 versions of Python frameworks installed on Mac, but even that wasn’t too bad.) The tool made some moderate changes, and I can run the unit tests, and a few even pass!

The primary remaining problem stems from code where the documentation is a little unclear, and my inexperience is severe. The part of the code in that reads nasty, grotty HTML via Tidy and produces a clean DOM throws an exception every time. Seems to be a mismatch between String and Byte (encoded string) types, but manifested as a failed XML parse. Sans exception handling, the code looks like:

    page = urllib.request.urlopen(fullurl)
    markup =
    dom = xml.dom.minidom.parseString(markup)

urlopen() returns a file-like object, but the docs didn’t seem clear on whether it’s like a file opened in byte or string mode. In any case, I’m almost certainly doing it wrong. Suggestions?


Friday, December 5th, 2008

Mystery Python Theatre 3K

The long-awaited Python 3.0 is out. It fixes almost every annoyance I have with the language, particularly around Unicode handling, which is important in the kinds of projects I work on.

Now, to revisit some of my Open Source projects… -m

Wednesday, September 17th, 2008

The case for native higher-order functions in XQuery

The XQuery Working Group is debating the need for higher-order functions in the language. I’m working on honing my description of why this is an important feature. Does this work? What would work better?

Imagine you are writing a smallish widget app, in an environment without a standard library. When you need to sort your widgets, you’d write a simple function with a signature like sort(sequence-of-widgets). That’s great.

Now imagine you find your app to be steadily growing. An accumulation of smaller one-off solutions won’t work anymore, you need a general solution. What you’ll end up with is something like qsort in C, which takes a pointer to a comparator function. By providing different comparators, you can sort anything any way you like, all through only a single sort function. C and C++ have something like this, as do PHP, Python, Java, JavaScript, and even assembly language. XSLT has it, as proven by Dimitre.

XQuery doesn’t. It should, because people are now using it for more than short queries. People are writing programs in it. -m

P. S. Comment please.

Tuesday, July 15th, 2008

Top Down Operator Precedence in Python

This article made my day. Very similar approach to what I did in WebPath, but even cleaner. Great explanation and performance numbers. -m

P.S. Thanks to Crock for pointing this out.

Wednesday, May 28th, 2008

XForms Validator on Google App Engine?

I registered ‘xfv’ on Google App Engine. Too bad there doesn’t appear to be any significant XML libraries supported. I have XPath covered by my pure-python WebPath, but what about Relax NG? Anyone know of anything in pure python? -m

Wednesday, February 13th, 2008

WebPath on

It’s been an exhausting past couple of weeks, but life goes on. WebPath made front page at I’m starting to get feedback from developers who are actually using it, filing bugs, suggesting features, and it’s gratifying. The community is still building up. Won’t you join too? -m

Thursday, January 24th, 2008

WebPath wants to be free (BSD licensed, specifically)

WebPath, my experimental XPath 2.0 engine in Python is now an open source project with a liberal BSD license. I originally developed this during a Yahoo! Hack Day, and now I get to announce it during another Hack Day. Seems appropriate.

The focus of WebPath was rapid development and providing an experimental platform. There remains tons of potential work left to do on it…watch this space for continued discussion. I’d like to call out special thanks to the Yahoo! management for supporting me on this, and to Douglas Crockford for turning me on to Top Down Operator Precedence parsers. Have a look at the code. You might be pleasantly surprised at how small and simple a basic XPath 2 engine can be. So, who’s up for some XPath hacking?

Code download. (Coming to SourceForge with CVS, etc., in however many days it takes them to approve a new project) I hope this inspires more developers to work on similar projects, or better yet, on this one! -m

Monday, December 24th, 2007

OLPC is here

I’m taking some time off from work to relax a bit. And just in time for that, my OLPC arrived. Check out the photoset on Flickr. It’s an impressive little machine, and I’m very happy to have got this instead of a Kindle. :)


Sunday, December 16th, 2007

Slides from XML 2007: WebPath: Querying the Web as XML

Here’s the slides from my presentation at XML 2007, dealing with an implementation of XPath 2.0 in Python. I hope to have even more news in this area soon.

WebPath (html)

WebPath (OpenDocument, 4.7 megs)

Did you notice the OpenOffice has nice slide export, that generates both graphically-accurate slides and highly indexable and accessible text versons? -m

Friday, September 21st, 2007

Come see me at XML 2007

Watch this space for details. I’ll be speaking about something related to Python and XPath 2.0. Watch this blog for tidbits on the subject. :) -m

Monday, July 23rd, 2007

Prototypical inheritance in Python

Based on Doug Crockford’s chapter in Beautiful Code, I wanted to take a crack at implementing Top Down Operator Precedence in Python. After all, Python and JavaScript are quite similar, right?

Not really. As you can imagine, Doug’s code makes great use of JavaScript’s strengths, in this case the ability to assign new methods to any object. For an initial version, I wanted to make the Python version behave the same way, as opposed to a deeper redesign that would be more pythonic. (That would come later.)
My initial approach was a __getattr__ method that consisted simply of return getattr(self.prototype, name). When reattaching a new method to an instance, I needed an extra wrapper, done through a wrap method which consisted of return new.instancemethod(method, self, self.__class__). It would be used like this: obj.method = obj.wrap(some_func).

This caused a subtle problem that took me a while to track down. In JavaScript, any function can reference the built-in this variable, which works whether the function is bound to some specific object or not. (Even global functions are bound to the global object.) But Python doesn’t have such a keyword. The language prefers the explicit, and uses a explicitly passed parameter, called by convention self. The call to wrap a specific function also had the effect of binding the self parameter to that particular object; even if it later became a prototype for some other object. This manifested itself as all kinds of broken behavior. For example, the original code has a global scope object, and every time a new scope was entered, the global pointed to a newer object that kept a reference to the rest of the scope chain. But in the object’s methods, self pointed to something different than the global. Messy.

Before I get into solutions, I’d like to see what readers say. How would you go about implementing prototypical inheritance in Python? And what is a more pythonic way to accomplish the same thing? Comment below. Thanks! -m

Wednesday, January 24th, 2007

Histogram of top 10 words used in the 2007 State of the Union address:

I’ve always had a thing for text analysis.

  • the 352
  • and 250
  • to 225
  • of 188
  • in 118
  • a 108
  • we 100
  • is 76
  • our 75
  • that 72

Source. -m

Tuesday, September 19th, 2006

Shiver me timbers! Python 2.5 be here

Link. -m

Wednesday, September 6th, 2006

Disk Usage, Python, and SVG (oh my)

Check out this script. -m