I haven’t mentioned it yet, but SearchMonkey (now an official name, not just a project name) is in external limited beta. Keep an eye on ysearchblog, lots more technical content is on the way. -m
Archive for the 'metadata' Category
Thursday, March 13th, 2008
The (lowercase) semantic web goes mainstream
So today Yahoo! announced a major facet of what I’ve been working on lately: making the web more meaningful. Lots of fantastic coverage, including TechCrunch and ReadWriteWeb (and others, please link in the comments), and supportive responses and blog posts across the board. It’s been a while since I’ve felt this good about being a Yahoo.
So what exactly is it?
A few months ago I went through the pages on this very blog and added hAtom markup. As a result of this change…well, nothing happened. I had a good experience learning about exactly what is involved in retrofitting an existing site with microformats, but I didn’t get any tangible benefit. With the “SearchMonkey” platform, any site using microformats, or RDFa or eRDF, is exposed to developers who can enhance search results. An enhanced result won’t directly make my my site rank higher in search, it it most certainly make it prone to more clicks, and ultimately more readership, more inlinks, and better organic ranking.
How about some questions and answers:
Q: Is this Tim Berners-Lee’s vision of the Semantic Web finally getting fulfilled?
A: No.
Q: Does this presuppose everybody rushing to change their sites to include microformats, RDF, etc?
A: No. After all, there is a developer platform. Naturally, developers will have an easier time with sites that use official and community standards for structuring data, but there is no obligation for any site to make changes in order to participate and benefit.
Q: Why would a site want to expose all its precious data in an easily-extractable way?
A: Because within a healthy ecosystem it results in a measurable increase in traffic and customer satisfaction. Data on the public web is already extractable, given enough eyeballs. An openness strategy pays off (of which SearchMonkey is an existence proof).
Q: What about metacrap? We can never trust sites to provide honest metadata.
A: The system does have significant spam deterrents built in, of which I won’t say more. But perhaps more importantly, the plugin nature of the platform uses the power of the community to shape itself. A spammy plugin won’t get installed by users. A site that mixes in fraudulent RDFa metadata with real content will get exposed as fraudulent, and users will abandon ship.
Q: Didn’t ask.com prove that having a better user interface doesn’t help gain search market share?
A: Perhaps. But this isn’t about user interface–it’s about data (which enables a much better interface.)
Q: Won’t (Google|Microsoft|some startup) just immediately clone this idea and take advantage of all the new metadata out there?
A: I’m sure these guys will have some kind of response, and it’s true that a rising tide lifts all boats. But I don’t see anyone else cloning this exactly. The way it’s implemented has a distinctly Yahoo! appeal to it. Nobody has cloned Yahoo! Answers yet, either. In some ways, this is a return to roots, since Yahoo! started off as a human-guided directory. SearchMonkey is similar, except a much broader group of people can now participate. And there are some specific human, technical and financial reasons why as well, but I suggest inviting me out for beers if you want specifics. :-)
Disclaimer: as always, I’m not speaking for my employer. See the standard disclaimer. -m
Update: more Q and A
Q: How is SearchMonkey related to the recently announced Yahoo! Microsearch?
A: In brief, Microsearch is a research project (and a very cool one) with far-reaching goals, while SearchMonkey is targeted as imminently shipping software. I frequently talk to and compare notes with Peter Mika, the lead researcher for Microsearch.
Thursday, March 6th, 2008
microformat search at Yahoo!
Somehow I missed this posting and the underlying news that a Y Research project has a nice public demo of semantic search, driven by RDF, RDFa, and microformats. Still a rough sketch of a full solution, with multiple-second access times. But I particularly like the query for renaissance faire. -m
Tuesday, February 26th, 2008
Yahoo! Announces Open Search Platform
As spotted on TechCrunch, full article. This is a game-changer folks. Check out the comments attached to the article. -m
Wednesday, January 23rd, 2008
Machine tags
Take a look at this URL, and the page behind it. This is a list of all the Flickr photos with the tag “xmlns:dc=http://purl.org/dc/elements/1.1/“. Although these have been around for a while, I hadn’t been aware of this kind of tagging until recently.
Why “xml” in the namespace declaration? This doesn’t have much to do with XML. How many tags are there in the world that start with “dc:” and are not referring to Dublin Core? At least the tag declaring the namespace provides a good hook for finding things with machine tags. It’s only a small step up to RDFa from here, which is good! -m