Microformat validation with Python and XPath

Python+XPath is a surprisingly powerful combination for doing all kinds of arbitrary validation tasks. I should know. I’ve recently figured out a few things that make it even better.

Line numbers in error messages. Libxml2 docs aren’t exactly forthcoming in this area. It’s pretty easy to register an error callback, but maddeningly it doesn’t include line numbers (except when piping errors directly to stdout, as several examples show). The C APIs have a whole notion of Structured Error Handling, which doesn’t seem to come across to the Python bindings. Getting the line number of a node is also straightforward, but I couldn’t figure out how to get the line of an error. Fortunately, the answer is simple:

e = libxml2.lastError()
print e.line()  # in contrast to node.lineNo()...

Checking for a class. Another common task in validating microformats is checking whether an element has a certain class applied to it. Since the class attribute takes a space-separated list of class values, this is harder than string search–you really need a tokenizer. Again, Python Libxml2 comes through. It’s reasonably simple to write an XPath extension function in Python:

def hasClass(ctx, content, cssclass):
rc = ""
if (isinstance(content, str):
tokens = content.split()
for token in tokens:
if token==cssclass: rc = "1"
return rc

# register the function on an XPath context (ctx)
ctx.registerXPathFunction("hasClass", "http://some.uri", hasClass)

(WordPress keeps eating the indentation on the above…you’ll figure it out.) Props to Kimbro Staken who did all the initial hard work. Comments? -m

Microformat validation with Python and XPath

Related Posts

Explosive growth of RDFa

Eulogy for SearchMonkey