Archive for the 'yahoo' Category
Sunday, August 22nd, 2010
This is indeed a sad day for all of us, for on October 1, a great app will be gone. Though we hardly had enough time during his short life to get to know him, like the grass that withers and fades, this monkey will finish his earthly course.

Photo by Micah
I know he left many things undone, for example only enhancing 60% of the delivered result pages. He never got a chance to finish his life’s ambition of promoting RDFa and microformats to the masses or to be the killer app of the (lower-case) semantic web. You could say he will live on as “some of this structured data processing will be supported natively by the Microsoft platform”. Part of the monkey we loved will live on as enhanced results continue to flow forth from the Yahoo/Bing alliance.
The SearchMonkey Alumni group on LinkedIn is filled with wonderful mourners. Micah Alpern wrote there
I miss the team, the songs, and the aspiration to solve a hard problem. Everything else is just code.
Isaac Asimov was reported to have said “If my doctor told me I had only six minutes to live, I wouldn’t brood. I’d type a little faster.” Today we can identify with that sentiment. Keep typing.
-m
Permalink
Filed under announcement, metadata, microformats, microsoft, search, yahoo
Friday, June 19th, 2009
I spent 2 days at the Yahoo! campus at a VoCamp event, my first. Initially, I was dismayed at the schedule. Spend all the time the first day figuring out why everybody came? It seemed inefficient. But having gone through it, the process seems productive, exactly the way that completely decentralized groups need to get things done. Peter Mika did a great job moderating.
Attendees numbered about 35, and came from widely varying backgrounds from librarian to linguist to professor to student to CTO, though uniformly geeky. With SemTech this week, the timing was right, and the number of international attendees was impressive.
In community development, nothing gets completely decided just because a few people met. But progress happens. The first day was largely exploratory, but also covered plenary topics that nearly everyone was interested in. Namely:
- Finding, choosing, and knowing when to create vocabularies
- Mapping from one vocabulary to another
- RDBMS to RDF mapping
Much of the shared understanding of these discussions is captured on various wiki pages connected to the one at the top of this article.
For day 2, we split into smaller working groups with more focused topics. I sat in on a discussion of Common Tag (which still feels too complex to me, but does fulfill a richer use case than rel-tag). Next, some vocabulary design, planning a microformat (and eventual RDF vocab) to represent code documentation: classes, functions, parameters, and the like. Tantek Çelik espoused the “scientific method” of vocab design: would a separate group, in similar circumstances, come up with the same design? If the answer is ‘yes’, then you probably designed it right. The way to make that happen is to focus on the basics, keeping everything as simple as possible. If any important features are missed, you will find out quickly. The experience of getting the simple thing out the door will provide the education needed to make the more complicated follow-on version a success.
From the wrap-up: if you are designing a vocabulary, the most useful thing you can do is NOT to unleash a fully-formed proposal on the world, but rather to capture the discussion around it. What were the initial use cases? What are people currently doing? What design goals were explicitly left off the table, or deferred to a future verson, or immediately shot down? It’s better to capture multiple proposals, even if fragmentary, and let lots of people look them over and gravitate toward the best design.
Lastly, some cool things overheard:
“Relational databases? We call those ‘legacy’.”
“The socially-accepted schema is fairly consistent.”
“It’s just a map, it’s not the territory.”
-m
Permalink
Filed under aswemaythink, everythingismiscellaneous, intentional web, metadata, yahoo
Thursday, June 4th, 2009
I was shocked today to find out that one of my old friends from the Yahoo Search days was let go in the last round. He’s simply brilliant and would have been one of the last people I would have expected that the managers-in-purple could do without.
At the same time, I’m getting hounded by recruiters–five so far just this week.
So let me put these two forces against each other and see if they cancel out. To any former Yahoos: get in touch with me and I’ll do what I can to hook you up with a cool opportunity. This offer is good for June and July–after that I can’t reasonably say I’ll have time for matchmaking. Send me your CV via email and I’ll get started. No promises on results, but I’ll do what I can. :-)
-m
Permalink
Filed under announcement, commercialism, yahoo
Tuesday, May 12th, 2009
The new feature called rich snippets shows that SearchMonkey has caught the eye of the 800 pound gorilla. Many of the same microformats and RDF vocabularies are supported. It seems increasingly inevitable that RDFa will catch on, no matter what the HTML5 group thinks. -m
Permalink
Filed under commercialism, google, intentional web, languages, metadata, microformats, search, yahoo
Sunday, May 10th, 2009
As of today, I have been out of Yahoo! for a full year. And what a year it’s been… I guess that means I’m now free to recruit…any good XML people still wearing purple? -m
Permalink
Filed under announcement, yahoo
Saturday, April 25th, 2009
Lots of news reports about Geocities claim it was purchaed for “4 billion” dollars. But not really–that’s a pretty hefty rounding from 3.57 B. Also, that wasn’t cash, but magic boom time inflated stock. Yahoo was at $335.875 on announcement, so the deal amounted to about 10.6 million shares. Or at today’s values, a little over $150 million. Your call on whether they got their money’s worth. -m
Permalink
Filed under commercialism, yahoo
Sunday, March 8th, 2009
The remarkable (and prolific) Stephen Wolfram has an idea called Wolfram Alpha. People used to assume the “Star Trek” model of computers:
that one would be able to ask a computer any factual question, and have it compute the answer.
Which has proved to be quite distant from reality. Instead
But armed with Mathematica and NKS [A New Kind of Science] I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable.
It’s not easy to do this. Every different kind of method and model—and data—has its own special features and character. But with a mixture of Mathematica and NKS automation, and a lot of human experts, I’m happy to say that we’ve gotten a very long way.
I’m still a SearchMonkey guy at heart, so I wonder how much Wofram’s team is familiar with existing Semantic Web research and practice–because at a high level this seems very much like RDF with suitable queries thereupon. If that’s a good characterization, that’s A Good Thing, since practical application has been one of SemWeb’s weak spots.
-m
Permalink
Filed under AI, Mark Logic, aswemaythink, commercialism, intentional web, languages, math, metadata, software, yahoo
Tuesday, September 2nd, 2008
I prefer the Yahoo! Search iPhone interface. Search Assist and SearchMonkey goodness abound, and make a concrete improvement to the experience.
But why can’t I get Yahoo! Go for iPhone? I’m gobsmacked that such a strategic app isn’t available this far into the game. Yahoo! Go was first announced in 2006. Then 2007. Then 2008. Maybe 2009 will be the year. -m
Permalink
Filed under mobile, yahoo
Monday, July 28th, 2008
The W3C RDFa specification is now in Candidate Recommendation phase, with an explicit call for implementations (of which there are several). Momentum for RDFa is steadily building. What about eRDF, which favors the existing HTML syntax over new attributes?
There’s still a place for a simpler syntactic approach to embedding RDF in HTML, as evidenced by projects like Yahoo! SearchMonkey. And eRDF is still the only game in town when it comes to annotating RDF within HTML-without-the-X.
One thing the RDFa folks did was define src as a subject-bearing node, rather than an object. At first I didn’t like this inversion, but the more I worked with it, the more it made sense. When you have an image, which can’t have children in (X)HTML, it’s very often useful to use the src URL as the subject, with a predicate of perhaps cc:license.
So I propose one single change to eRDF 1.1. Well, actually several changes, since one thing leads to another. The first is to specify that you are using a different version of eRDF. A new profile string of:
"http://purl.org/NET/erdf11/profile"
The next is changing the meaning of a src value to be a subject, not an object. Perhaps swapping the subject and object. Many existing uses of eRDF involving src already involve properties with readily available inverses. For example:
<!-- eRDF 1.0 -->
<img class="foaf.depiction" src="http://example.org/picture" />
<!-- eRDF 1.1 -->
<img src="http://example.org/picture" class="foaf.depicts" />
With the inherent limitations of existing syntax, the use case of having a full image URL and a license URL won’t happen. But XHTML2 as well as a HTML5 proposal suggest that adding href to many attributes might come to pass. In which case this possibility opens:
<img src="http://example.org/picture" class="cc.license"
href="http://creativecommons.org/licenses/by/2.0/" />
Comments? -m
Permalink
Filed under browsers, everythingismiscellaneous, intentional web, metadata, trends, web20, yahoo
Thursday, July 3rd, 2008
I haven’t seen an announcement about this, but try the following query on Yahoo Search: [searchmonkeyid:com.yahoo.rdf.rdfa] (link). It shows documents containing RDFa, with Digg at the top. Since this is a Searchmonkey ID, it’s also usable in Searchmonkey to actually extract the metadata and use it to customize search results.
Does your site use RDFa yet? -m
Permalink
Filed under everythingismiscellaneous, intentional web, metadata, trends, yahoo
Wednesday, July 2nd, 2008
Commentators, having long since run out of useful things to say about YHOO+MSFT, only bemoan how it continues to drag out. In reality, deals of this size do tend to take a while. Microsoft (and specifically Ballmer) aren’t walking. Why?
Because they need Yahoo. They need search share–the deal with Google only puts on more pressure. But they also need a non-schizophrenic brand under which to put all their audience attractors. In short, I’d say MSFT has been terrible at tactics (and non-intimidation-based negotiating), and YHOO has been mediocre at strategy and terrible at execution. Maybe they are meant for each other…
Prediction: by the end of the year 1) some kind of deal happens, and 2) Yang is out as CEO. $28.
Disclosure: I still hold long YHOO shares
Disclosure: The irony of this post is not lost on me
-m
Permalink
Filed under google, microsoft, stuff, yahoo
Saturday, June 28th, 2008
Several folks, including me, have experienced increased CPU usage on Firefox 3, especially on OSX. Try disabling it, going back to the bookmarklet. -m
Permalink
Filed under firefox, yahoo
Thursday, June 26th, 2008
Even though the timing is about perfect, it’s not gonna happen But if it did, would that be awesome or what? -m
Permalink
Filed under microsoft, yahoo
Thursday, June 19th, 2008
A common point of debate within Yahoo! was whether employees should feel compelled to use Y properties (“eat your own dogfood”) or whether said properties should have to compete on pure merit to earn internal usage. But in any case, there’s always pressure, even if subliminal, to use internal products.
I’ve free of such influence for six weeks now. What Yahoo! services do I still use? Which ones not so much?
Yahoo Answers: not so much. Even the 1 point-per-day for visiting doesn’t entice me. If I had a burning question that would be a good fit for a community answer, I’d go back.
Yahoo Mail: all the time. I used Yahoo mail long before I worked there, and I’ll be using it long after.
Yahoo News: almost daily. Still a good collection of global, national, and local news.
My Yahoo & Finance: multiple times daily. I’ve peeked at iGoogle, but the Y is too comfy, and the competion isn’t easy enough to get comfortable with. But often the page takes up to 30 seconds to load. If that doesn’t improve, I’ll leave.
Yahoo Search: still my default. But only because of tweaks I put in place with SearchMonkey. The baseline quality of results is right on par with Google. I still recommend Y search to friends and family.
Yahoo Maps: rarely used. Google maps is just better, particularly street view.
Yahoo 360: Abandoned. Tons of site bugs, no fixes on the horizon. In fact, they’ve announced shuttering of the service, to be replaced with some unspecified alternative. But who knows when that will happen? So the Meadblog is on hold until further notice. I’ll still check once in a while for postings from friends and family.
Yahoo front page: Still use it to check whether wireless is working. Most often with ping, not HTTP though. :-)
What Yahoo services do you still use? Comment below. -m
Update: a few more inspired by the comments.
Delicious: still use, mainly through the browser extension.
Flickr: still use, but I’m not much of a photos guy. I’ll be using it again shortly to upload screenshots for a blog-post tutorial I’m writing.
Permalink
Filed under yahoo
Tuesday, June 17th, 2008
According to Ars Technica, Google captured 61% of mobile search market share in the first four months of 2008. Yahoo! came in at a distant 18%, so pretty much reflecting desktop search market share. This is due, of course, to Google being the default provider on the iPhone, and the iPhone being the biggest bulk of mobile internet usage.
So Jerry (or whoever is on deck as CEO), you should probably look into this mobile thing and see what’s up with leadership there and whether anything is salvageable… -m
Permalink
Filed under apple, google, hardware, mobile, trends, yahoo
Thursday, June 5th, 2008
From the Yahoo! Developer blog, new search keywords you can use to hone in on indexed microformats.
For example, to see every hAtom-bearing page that mentions ‘dubinko’ use the query [searchmonkeyid:com.yahoo.uf.hatom dubinko]. Works similarly for hCard, hCalendar, hReview, and XFN. I’m sure more are coming soon too. -m
Permalink
Filed under announcement, microformats, yahoo
Sunday, May 18th, 2008
You probably noticed the byline on my recent Yahoo! developer network posting. It, and a few more posts still in the pipe, list me as a “SearchMonkey Team Alumnus”. So yeah, it’s official, I’ve hung up my exclamation point and moved on to something else.
Specifically, Mark Logic, where a group of impressively talented people reside, recently including Norm Walsh. My first day there is tomorrow, so I don’t fully know what I’ll be working on, though it does involve
the core server, and taking it from it current state of awesome raw bare-metal power into something more akin to a application development platform.
Mark Logic strikes me like this: think back 10 years or so to all the hype and introductory articles around this new thing called XML–how it would enable whole new kinds of applications though the miraculous abilities of “markup” and perform realtime structured search over the results. It turns out that all these dreams were missing one critical piece, a way to do all the fancy indexing and repository management needed to make that happen. And the MarkLogic Server, to a very good approximation, IS that piece.
So what do I think of SearchMonkey at this point? No change, really. Good riddance to the ten-blue-links result pages. It’s breaking new ground in search, and Google will have a hard time stomaching an equally radical (and potentially revenue-impacting) change. SearchMonkey is really good news for the lowercase semantic web, including microformats and RDFa. It’s doing all the right things for the right reasons. The project will do fine without me. :-)
I had a good run at Yahoo! and I’m proud to have accomplished all I did there. Onward. -m
Permalink
Filed under Mark Logic, announcement, yahoo
Saturday, May 17th, 2008
Yeah, more than ever before. See my article on Yahoo! developer net. The stuff I talk about here is currently live in the indexer. -m
Permalink
Filed under announcement, microformats, yahoo
Wednesday, May 14th, 2008
Reminder: Thursday evening at Yahoo! Sunnyvale headquarters is the launch party for the developer-facing side of SearchMonkey. In case you haven’t been paying attention, SearchMonkey is a new platform that lets developers craft their own awesomized search results. If you’re interested in SEO or general lowercase semantic web tools, you’ll love it. Meet me there. Upcoming link. Party starts at 5:30. -m
Update: The developer tool is live. Rasmus has a nice walkthrough.
Permalink
Filed under announcement, aswemaythink, browsers, intentional web, metadata, stuff, yahoo
Friday, May 2nd, 2008
If you have webdev skillz, you might be interested in the SearchMonkey launch party on May 15. Good food, good drink, good coding. Space is limited, but I have a few invites to share. Comment here or contact me offline if interested. -m
Permalink
Filed under announcement, browsers, intentional web, yahoo
Monday, April 28th, 2008
I haven’t mentioned it yet, but SearchMonkey (now an official name, not just a project name) is in external limited beta. Keep an eye on ysearchblog, lots more technical content is on the way. -m
Permalink
Filed under announcement, intentional web, metadata, software, yahoo
Saturday, April 26th, 2008

I’m not involved in the the corporate wrangling about Microsoft and Yahoo! talks. Which leaves me relatively free to comment on it. [Disclosure: I am, not too surprisingly, a Yahoo! shareholder.]
Lots of things have been happening lately. A deadline of, well, today. Talks of Google adsense trials. And all kinds of merger speculation involving Rupert Murdoch in some fashion, or else AOL.
But I haven’t seen anyone point out this connection: Google owns 5% of AOL, having invested a billion bucks and taken over search there a couple of years ago. So if Yahoo! and AOL merged, there would already be a Google advertising connection in place. Running pre-trials now is just due dilligence on something that might happen anyway.
Having both an in-house advertising network and an outsourced one has some advantages too, namely in the form of “knobs” that can be adjusted to tune margins as conditions warrant. And maintaining the in-house system keeps Google honest and makes sure that relatively good deals can be negotiated in the future.
Lots of pundits talk about regulatory scrutiny, but honestly, it’s been years since any antitrust machinery in this country has been effective. And the recent spectrum auctions showcased Google’s skill at turning regulatory tables in their favor. If it came down to it, the smart people on both sides of the table shouldn’t have a problem crafting an agreement in a way that meets muster, even in the stricter EU.
Summary: based solely on public reports, it seems like the AOL connection might be a credible threat to Microsoft’s appetite. The ball is firmly in Steve’s court now. We’ll see what he does.
Permalink
Filed under commercialism, microsoft, stuff, trends, yahoo
Thursday, March 13th, 2008
So today Yahoo! announced a major facet of what I’ve been working on lately: making the web more meaningful. Lots of fantastic coverage, including TechCrunch and ReadWriteWeb (and others, please link in the comments), and supportive responses and blog posts across the board. It’s been a while since I’ve felt this good about being a Yahoo.
So what exactly is it?
A few months ago I went through the pages on this very blog and added hAtom markup. As a result of this change…well, nothing happened. I had a good experience learning about exactly what is involved in retrofitting an existing site with microformats, but I didn’t get any tangible benefit. With the “SearchMonkey” platform, any site using microformats, or RDFa or eRDF, is exposed to developers who can enhance search results. An enhanced result won’t directly make my my site rank higher in search, it it most certainly make it prone to more clicks, and ultimately more readership, more inlinks, and better organic ranking.
How about some questions and answers:
Q: Is this Tim Berners-Lee’s vision of the Semantic Web finally getting fulfilled?
A: No.
Q: Does this presuppose everybody rushing to change their sites to include microformats, RDF, etc?
A: No. After all, there is a developer platform. Naturally, developers will have an easier time with sites that use official and community standards for structuring data, but there is no obligation for any site to make changes in order to participate and benefit.
Q: Why would a site want to expose all its precious data in an easily-extractable way?
A: Because within a healthy ecosystem it results in a measurable increase in traffic and customer satisfaction. Data on the public web is already extractable, given enough eyeballs. An openness strategy pays off (of which SearchMonkey is an existence proof).
Q: What about metacrap? We can never trust sites to provide honest metadata.
A: The system does have significant spam deterrents built in, of which I won’t say more. But perhaps more importantly, the plugin nature of the platform uses the power of the community to shape itself. A spammy plugin won’t get installed by users. A site that mixes in fraudulent RDFa metadata with real content will get exposed as fraudulent, and users will abandon ship.
Q: Didn’t ask.com prove that having a better user interface doesn’t help gain search market share?
A: Perhaps. But this isn’t about user interface–it’s about data (which enables a much better interface.)
Q: Won’t (Google|Microsoft|some startup) just immediately clone this idea and take advantage of all the new metadata out there?
A: I’m sure these guys will have some kind of response, and it’s true that a rising tide lifts all boats. But I don’t see anyone else cloning this exactly. The way it’s implemented has a distinctly Yahoo! appeal to it. Nobody has cloned Yahoo! Answers yet, either. In some ways, this is a return to roots, since Yahoo! started off as a human-guided directory. SearchMonkey is similar, except a much broader group of people can now participate. And there are some specific human, technical and financial reasons why as well, but I suggest inviting me out for beers if you want specifics. :-)
Disclaimer: as always, I’m not speaking for my employer. See the standard disclaimer. -m
Update: more Q and A
Q: How is SearchMonkey related to the recently announced Yahoo! Microsearch?
A: In brief, Microsearch is a research project (and a very cool one) with far-reaching goals, while SearchMonkey is targeted as imminently shipping software. I frequently talk to and compare notes with Peter Mika, the lead researcher for Microsearch.
Permalink
Filed under announcement, everythingismiscellaneous, intentional web, metadata, microformats, search, standards, trends, web20, yahoo
Thursday, March 6th, 2008
Somehow I missed this posting and the underlying news that a Y Research project has a nice public demo of semantic search, driven by RDF, RDFa, and microformats. Still a rough sketch of a full solution, with multiple-second access times. But I particularly like the query for renaissance faire. -m
Permalink
Filed under announcement, everythingismiscellaneous, metadata, microformats, yahoo
Tuesday, February 26th, 2008
As spotted on TechCrunch, full article. This is a game-changer folks. Check out the comments attached to the article. -m
Permalink
Filed under announcement, metadata, search, stuff, trends, yahoo
Wednesday, February 13th, 2008
It’s been an exhausting past couple of weeks, but life goes on. WebPath made front page at next.yahoo. I’m starting to get feedback from developers who are actually using it, filing bugs, suggesting features, and it’s gratifying. The community is still building up. Won’t you join too? -m
Permalink
Filed under announcement, python, xpath, yahoo
Thursday, January 24th, 2008

WebPath, my experimental XPath 2.0 engine in Python is now an open source project with a liberal BSD license. I originally developed this during a Yahoo! Hack Day, and now I get to announce it during another Hack Day. Seems appropriate.
The focus of WebPath was rapid development and providing an experimental platform. There remains tons of potential work left to do on it…watch this space for continued discussion. I’d like to call out special thanks to the Yahoo! management for supporting me on this, and to Douglas Crockford for turning me on to Top Down Operator Precedence parsers. Have a look at the code. You might be pleasantly surprised at how small and simple a basic XPath 2 engine can be. So, who’s up for some XPath hacking?
Code download. (Coming to SourceForge with CVS, etc., in however many days it takes them to approve a new project) I hope this inspires more developers to work on similar projects, or better yet, on this one! -m
Permalink
Filed under announcement, intentional web, python, xml, xpath, yahoo
Wednesday, January 23rd, 2008
Take a look at this URL, and the page behind it. This is a list of all the Flickr photos with the tag “xmlns:dc=http://purl.org/dc/elements/1.1/“. Although these have been around for a while, I hadn’t been aware of this kind of tagging until recently.
Why “xml” in the namespace declaration? This doesn’t have much to do with XML. How many tags are there in the world that start with “dc:” and are not referring to Dublin Core? At least the tag declaring the namespace provides a good hook for finding things with machine tags. It’s only a small step up to RDFa from here, which is good! -m
Permalink
Filed under everythingismiscellaneous, intentional web, metadata, yahoo
Monday, January 7th, 2008
Admittedly, their marketing folks wouldn’t describe it that way, but essentially that’s what was announced today. (documentation in PDF format, closely related to what-used-to-be Konfabulator tech; here’s the interesting part in HTML) The press release talks about reaching “billions” of mobile consumers; even if you don’t put too much emphasis on press releases (you shouldn’t) it’s still talking about serious use of and commitment to XForms technology.
Shameless plug: Isn’t it time to refresh your memory, or even find out for the first time about XForms? There is this excellent book available in printed format from Amazon, as well as online for free under an open content license. If you guys express enough interest, good things might even happen, like a refresh to the content. Let’s make it happen.
From a consumer standpoint, this feels like a welcome play against Android, too. Yahoo! looks like it’s placing a bet on working with more devices while making development easier at the same time. I’ll bet an Android port will be available, at least in beta, before the end of the year.
Disclaimer: I have been out of Yahoo! mobile for several months now, and can’t claim any credit for or inside knowledge of these developments. -m
P. S. Don’t forget the book.
Permalink
Filed under XForms, amazon, browsers, mobile, software, standards, web20, yahoo
Monday, December 31st, 2007
Thanks to all the folks who showed interest in this little XPath puzzler published here a few weeks ago. Some asked to see the dataset, but I’m not able to release it at this time (but ask me again in 3 months).
Turns out it was a combination of two bugs, one mine, one somebody else’s. Careful observers noted that I wasn’t using any namespace prefixes in the XPath, and since I did specify that it was XPath 1.0, that technically rules out XHTML as the source language. Like nearly all XML I work with these days, the first thing I do is strip off the namespaces to make it easier to work with. Bug #1 was that in a few cases, the namespaces didn’t get stripped.
Bug #2 was in the XPath engine itself. Which one? Uh, whatever one ships with the “XPath” plugin for JEdit. It’s hard to tell directly, but I think it might be an older version of Xalan-J. In the case of the expression //meta, it properly located only those elements part of no namespace. But in the case of //meta/@property, it was including all the nodes that would have been selected by //*[local-name(.)='meta']/@property. Hence, a larger number of returned nodes.
Confusing? You bet! -m
P.S. WebPath would not have this problem, since in the default mode it matches local-names only to begin with.
Permalink
Filed under annoyance, xml, xpath, yahoo