Mark Birbeck, Web Backplane.
Problem statement: You shouldn’t have to “scrape” government sites.
Solution: RDFa
<div typeof="arg:Vacancy"> Job title: <span property="dc:title">Assistant Officer</span> Description: <span property="dc:description">To analyse... </span> </div>
This resolves to two full RDF triples. No separate feeds, uses existing publishing systems. Two of the most ambitious RDFa projects are taking place in the UK. Flexible arrangements possible.
Steps: 1. Create vocabulary. 2. Create demo. 3. Evangelize.
Vocabulary under Google Code: Argot Hub. Reuse terms (dc:title, foaf:name) where possible, developed in public.
Demos: Yahoo! SearchMonkey, (good for helping not-so-technical people to “get it”) then a Drupal hosted one (a little more control).
Next level, a new server that aggregates specific info (like all job openeings for Electricians), incuding geocoding. Ubiquity RDFa helps here.
Evangelizing: Detailed tutorials. Drupal code will go open source. More opportunities with companies currently screen-scrapting. More info @ rdfa.info.
Q&A: Asking about predicate overloading (dc:title). A general SemWeb issue. Context helps. Is RDFa tied to HTML? No, SearchMonkey itself uses RDFa–it’s just attributes.
-m