I’ve always had a thing for text analysis.
- the 352
- and 250
- to 225
- of 188
- in 118
- a 108
- we 100
- is 76
- our 75
- that 72
Wednesday, January 24th, 2007
I’ve always had a thing for text analysis.
Tuesday, January 23rd, 2007
So, about a year ago, I wanted to use XPath 2.0 on a project. Turns out no non-toy, non-alpha versions existed except in Java land (where Saxon is quite good). Has the situation changed at all? Anything on the horizon? Libxml2? Anybody?? -m
Tuesday, January 23rd, 2007
The nofollow setting on an outbound link should be a user-editable option, subject to the same community process that all other content on wikipedia already is. (Site guidelines, dispute resolution, restricted editing on certain articles for unregistered users, etc.) By default, links would get nofollow, but over time, they could be ‘blessed’, perhaps after a certain amount of time or human review. Wasn’t this how nofollow was supposed to work in the first place?
The community process works. Why maneuver around it? -m
Monday, January 8th, 2007
(Press release) Starting today, Y! is the exclusive search partner for Opera Mini across more than 100 countries. The release also names “oneSearch”, going live later in Q1–definitely something to keep an eye on. -m
Sunday, November 26th, 2006
This Wednesday, I’m visiting Berkeley to speak with visiting professor Erik Wilde and his School of Information students. It’s an open-ended discussion, but will almost certainly center on XForms, the intentional web, and related information flow technologies. If you’re in Berkeley this Wednesday, drop me a line. -m
Sunday, October 1st, 2006
Today Softbank Mobile launched a new mobile service, delivering tons of Yahoo! Japan content, powered by Yahoo! US technology, to Softbank Mobile phones. This is notable for a few reasons:
So, watch this space. More good things are coming. -m
Monday, September 11th, 2006
For the first time today, I momentarily wished that jEdit had a particular Emacs key binding, not the other way around. -m
Wednesday, September 6th, 2006
I’ve written before about the xslt2xforms project by Sébastien Cramatte. The project is not only still alive, but expanded into an entire utility kit including a PHP5 framework and forming “a complete xforms/xml toolbox based only on w3c standards”. Check it out on sourceforge. -m
Friday, September 1st, 2006
Most of the censorship stories you hear on the news involve public libraries, but right now I’m writing this from a hospital, which has free wi-fi. Someone providing a service like this has latitude to do pretty much as they please, including censorship, but is it a good idea?
The system here evidently consists of a monitor observing every HTTP access, either forwarding it on or bouncing to another server, one that seems to be down. That second server, referred to only by numeric IP, has yet to ever actually respond, so trying to load any page with a blocked site requres a lengthy timeout of about two minutes before landing on a browser error page with a URL something like this:
Let’s take a look at what kind of sites this inane system prevents hospital visotors from viewing directly:
At some point, somebody must have pointed out a flaw in their system–that any named site can also be viewed through a numeric IP. Instead of actually thinking about the problem, they also banned all numeric IPs, even for sites that would otherwise work.
The upside to retarded filtering is that it’s easy to get around. Techniques that work here include using a search engine cached page, Coral Cache (.nyud.net:8080), SSH tunneling, VPN, and adding a new entry to hosts to access the same site under a different name. The access is so slow, however (hmm… in a way another form of censorship) that the strain of the additional measures often leads to timeouts and various other errors.
Fortunately, the filtermasters haven’t caught on to dubinko.info yet, thus allowing this post to appear. I hear that site is pretty subversive.
What’s the net?
Friday, August 25th, 2006
I dug into my mail configuration a bit more and made a few changes. In the past, I had been lazy, so when I needed new email addresses like webmaster at xformsinstitute.com and contact at xformsinstitute.com, I just set up a catch-all. I knew catch-alls would collect lots of spam, but I didn’t know (until now) that the particular skew of the spam would be such that tends to get around the filters.
So all the catch-alls are turned off. I set up explicit forwards for used email addresses, and I think I got them all, but if you get a bounce from any email address on any of my sites, let me know. After another 24 hours, I had:
A significant improvement. I wonder if it’s worth resetting the training data from scratch at this point? -m
Thursday, August 24th, 2006
Yes, I’ve been painstakingly training positive and negative cases for weeks. This is a standard TBird setup on imap with the adaptive filter enabled. Here’s the results from a 24 hour experiment:
Is this typical performance, or has something gone bad? Sifting through ~100 spammy messages a day is bad; losing 3 important things a day is worse. -m
Wednesday, August 9th, 2006
How hard could this be? A six month project if three engineers are doing it in a garage. Five years if you put one hundred programmers on it.
Tuesday, August 8th, 2006
This is excellent: a Python Developer Center at Yahoo!. -m
Saturday, August 5th, 2006
Thursday, August 3rd, 2006
Hmm, this seems like a new feature, auto-installed after my last mail client restart. Unfortunately, there’s no “what’s this?” link for further information.
I find it interesting that the scam message wasn’t also labeled as “Junk”. Also, for some reason, the word ‘scam’ feels unexpectedly slangy in this setting. Great feature, I just wish I was a little more transparent. -m
Thursday, July 13th, 2006
According to the authoratative site. Looks like the virtualization markup is getting interesting. -m
Saturday, June 24th, 2006
I wonder, will this lead to better libraries for dealing with HTTP headers? Or at least better developer understanding of the benefits of not just taking whatever Apache or Tomcat or whatever yields by default? -m
Sunday, June 18th, 2006
I spend a Pareto portion of my work day in three applications: jEdit, Firefox, and a terminal.
I hang around Emacs (and VI)-loving folks all day. Emacs. jEdit. Emacs. jEdit. The tension is palpable. :)
Maybe their influence is starting to rub off on me. Here’s what I want: Dear readers, can you provide comments on any tips to achieve any of these in Emacs?
I’ve talked about this before, though my environment now is a little different. (For one, I am now making basic use of GNU Screen for my terminal sessions.) Basically, I want an editor that works like all the other software I use all day, instead of making me remember an entirely different set of key bindings. Every extra bit of my limited wetware storage claimed by my tools detratcts from the stuff I really need to be thinking about. Comments? -m
Friday, June 16th, 2006
Tuesday, June 6th, 2006
Wednesday, May 17th, 2006
Seen on Bill Trippe’s blog.
Gray Knowlton, who indentified himself as a Senior Product Manager for InfoPath 2007 said the next version of SharePoint will “include InfoPath Forms Services, which will render InfoPath forms to browsers and html-enabled mobile devices, and this will not require InfoPath on the form fillers’ desktop, nor will it require any advance download on the part of the person completing the form.”
This is, as far as I know, breaking news. Nice work, Bill!
Now, the big question is, how well will it work outside of IE? -m