Some great data from my one-time colleague Peter Mika. Based on data culled from 12 billion web pages, RDFa is on 3.5 percent of them, even after discounting “trivial” uses of it. Just look at how much that dark blue bar shot up since the last measurement, some 18 months earlier. Also of note: eRDF…
Category: metadata
Good news for big data fans. The FCC has released APIs to several large databases involving broadband statistics, spectrum licenses, and some related topics. I haven’t had a chance for a close look yet, perhaps we can do that together. Link. -m
This is indeed a sad day for all of us, for on October 1, a great app will be gone. Though we hardly had enough time during his short life to get to know him, like the grass that withers and fades, this monkey will finish his earthly course. I know he left many things…
Thought experiment: are there any commonly-expressed semantic queries–the kind of queries you’d run over a triple store, or perhaps a SearchMonkey-annotated web site–expressible in common type-in-a-searchbox query grammar? As a refresher, here’s some things that Google and other search engines can handle. The square brackets represent the search box into which the queries are typed,…
I wish I could say I had something to do with the planning of this: part of Balisage 2010 is a contest to “encourage markup experts to review and to research the current state of wiki markup languages and to generate a proposal that serves to de-babelize the current state of affairs for the long…
Link credit goes to Joho. This looks pretty significant. The AZ Supreme Court ruled that document metadata must be disclosed under existing public records law. This may start a chain reaction with other states following suit. With the movement toward open data including data.gov and the Federal Register, this fits in well. Quite often metadata…
Come learn more about Mark Logic and get a behind-the-scenes look at the new Application Builder. I’ll be speaking at the NOVA MUG (Northern Virginia Mark Logic User Group) on October 27. This turns out to be pretty close to the big Semantic Web conference, so I’ll stick my head in there too. Stop by…
I had been asking around earlier for large RDF datasets. Here’s one. Looks like a great contest to build an app around this, but unfortunately, the deadline looks like it’s soonish (1 Oct). What is it? The major part of the dataset was crawled during February/March 2009 based on datasets provided by Falcon-S, Sindice, Swoogle,…
A great introduction article. Maybe it’s just the crowd I hang with, but RDFa looks like it’s moving from trendy to serious tooling. -m
I spent 2 days at the Yahoo! campus at a VoCamp event, my first. Initially, I was dismayed at the schedule. Spend all the time the first day figuring out why everybody came? It seemed inefficient. But having gone through it, the process seems productive, exactly the way that completely decentralized groups need to get…
This brilliant bit is almost a throwaway paragraph on page 304, near the end. [Two men in a satirical dialog] managed only to demonstrate that the mathematical limit of an infinite sequence of “doubting the certainty with which something doubted is known to be unknowable when the ‘something doubted’ is still a preceding statement ‘unknowability’…
The new feature called rich snippets shows that SearchMonkey has caught the eye of the 800 pound gorilla. Many of the same microformats and RDF vocabularies are supported. It seems increasingly inevitable that RDFa will catch on, no matter what the HTML5 group thinks. -m
I’ve been experimenting with the preview version of Wolfram Alpha. It’s not like any current search engine because it’s not a search engine at all. Others have already written more eloquent things about it. The key feature of it is that it doesn’t just find information, it infers it on the fly. Take for exmple…
The remarkable (and prolific) Stephen Wolfram has an idea called Wolfram Alpha. People used to assume the “Star Trek” model of computers: that one would be able to ask a computer any factual question, and have it compute the answer. Which has proved to be quite distant from reality. Instead But armed with Mathematica and…
At least, that’s how I’ve summarized John Allsopp’s article on HTML5 semantics. -m
After a delay, the code to my RDFa parser in XQuery is now available under an Apache license. Go get it. This is some of the earliest XQuery code I ever wrote, so go easy on me. It follows the earlier work on a functional definition of RDFa. And feel free to send in patches….
Mark Birbeck, Web Backplane. Problem statement: You shouldn’t have to “scrape” government sites. Solution: RDFa <div typeof=”arg:Vacancy”> Job title: <span property=”dc:title”>Assistant Officer</span> Description: <span property=”dc:description”>To analyse… </span> </div> This resolves to two full RDF triples. No separate feeds, uses existing publishing systems. Two of the most ambitious RDFa projects are taking place in the UK….
Ronald Reck, SAP; Kenneth Sall, SAIC “I wish I knew when people were saying bad things about me.” Sentiment analysis. Kapow used initially. From 800k news articles (from 1996 and 1997), extracted 450M RDF assertions. The 13 Reuters standard metadata elements not used in this case. Used Redland for heavy RDF lifting. Inxight ThingFinder (commercial)…
I’ve been playing lately with this site, and it’s a fantastic resource. The word carboy probably comes from Persian qarabah “large flagon.” Who knew? -m
This post will be continuously updated to contain the most recent details about an XQuery 1.0 RDFa parser I wrote for Mark Logic. It follows the Functional RDFa pattern. At present there is little to say, but eventually code and more will be available. Stay tuned. -m
It would be awesome of someone made a site that catalogued all the common mis-encodings. Even in 2008, I see these things all over the web–mangled quotation marks, apostrophes, em-dashes. I’d love to see a pictoral guide. curly apostrophe looks like ?’ – original encoding=_________ mislabeled as __________ . That sort of thing. Surely somebody…
On the eRDF discussion posting, Toby Inkster, an implementer of eRDF, talks about why it’s bad to steal the id attribute, and why RDFa is better suited for general purpose metadata. Worth a read. -m
Through the weekend I put most of the final touches on an implementation of RDFa in XQuery. The implementation is based on the functional specification of RDFa, an offshoot of the excellent work coming out of the W3C task force. The spec contains a procedural description of the parsing algorithm, and several have successfully followed…
The W3C RDFa specification is now in Candidate Recommendation phase, with an explicit call for implementations (of which there are several). Momentum for RDFa is steadily building. What about eRDF, which favors the existing HTML syntax over new attributes? There’s still a place for a simpler syntactic approach to embedding RDF in HTML, as evidenced…
I haven’t seen an announcement about this, but try the following query on Yahoo Search: [searchmonkeyid:com.yahoo.rdf.rdfa] (link). It shows documents containing RDFa, with Digg at the top. Since this is a Searchmonkey ID, it’s also usable in Searchmonkey to actually extract the metadata and use it to customize search results. Does your site use RDFa…
The result of tons of work by lots of smart people. Go forth and implement. And I need to put in a plug for Metadata for Grandma which (indirectly, as it turned out) influenced the spec. RDFa is already a big deal, used in places like SearchMonkey. The subset of RDFa used by SearchMonkey is…
Reminder: Thursday evening at Yahoo! Sunnyvale headquarters is the launch party for the developer-facing side of SearchMonkey. In case you haven’t been paying attention, SearchMonkey is a new platform that lets developers craft their own awesomized search results. If you’re interested in SEO or general lowercase semantic web tools, you’ll love it. Meet me there….
I haven’t mentioned it yet, but SearchMonkey (now an official name, not just a project name) is in external limited beta. Keep an eye on ysearchblog, lots more technical content is on the way. -m
So today Yahoo! announced a major facet of what I’ve been working on lately: making the web more meaningful. Lots of fantastic coverage, including TechCrunch and ReadWriteWeb (and others, please link in the comments), and supportive responses and blog posts across the board. It’s been a while since I’ve felt this good about being a…