Archive for the 'software' Category
Sunday, January 15th, 2012
Check out these tips. The article talks about iPad, but they work on iPhone too, even an old 3G.
One one hand, it shows the intense amount of careful thought Apple puts into the user experience. But on the other hand, it highlights the discovery problem. I know people who have been using iOS since before it was called iOS, and still didn’t know about these. How do you put these kinds of finishing touches into a product and make sure the target audience can find out about them? -m
Permalink
Filed under apple, software, trends
Wednesday, January 5th, 2011
This is a non-technical description of why Yahoo! Mail is unsafe to use in a public setting, and indeed at all. I will be pointing people at this page as I go through the long process of changing an address I’ve had for more than a decade.
What’s wrong with Yahoo Mail?
A lot of web addresses start with http://–that’s a signal that the “scheme” used to deliver the page to your browser is something called HTTP, which is a technical specification that turns out is a really good way to move around web pages. As the page flows to the browser, it’s susceptible to eavesdropping, particularly over a wi-fi connection, and much more so in public, including the usual hotspots like coffee shops, but also workplaces and many home environments. It’s the virtual equivalent of a postcard. When you’re reading the news or checking traffic, it’s not a big deal if someone can sneak a glance at your page.
Some addresses start with https://–notice the extra ‘s’ which stands for “secure”. This means two things 1) that the web page being sent over is encrypted, and thus unavailable to eavesdroppers, and 2) that the people running the site had to obtain a certificate, which is a form of proof of their identity as an organization (that they’re not, say, Ukrainian phishers). Many years ago, serving pages over https was considered quite expensive in that servers needed much beefier processors to run all that encryption. Today, while it still requires extra computation, it’s not as big of a deal. Most off-the-shelf servers have plenty of extra power. To be fair, for a truly ginormous application with millions of users like Yahoo Mail, it is not a trivial thing to roll out. But it’s critically important.
First, to dispel a point of confusion, these days nearly every site, including Yahoo Mail, uses https for the login screen. This is the most critical time when encryption is needed, because otherwise you’d be sending your password on a postcard for anyone with even modest technical skills to peek at. So that’s good, but it’s no longer enough. Because sites are written so that you don’t have to reenter your password on every single new page, they use a tiny bit of information called a “cookie” in your browser to stay logged in. Cookies themselves are neither good nor bad, but if an eavesdropper gets a hold of one, they can control most of your account–everything that doesn’t require re-entering a password. In Yahoo Mail this includes reading any of your messages, sending mail on your behalf, or even deleting messages. Are you comfortable allowing strangers to do this?
As I mentioned earlier, new, more powerful tools have been out for months that automate the process of taking over accounts this way. Zero technical prowess is needed, only the ability to install a browser plug-in. If there are any web companies dealing in personal information for which this wasn’t a all-hands-on-deck security wake-up, they are grossly negligent. Indeed, other sites like Gmail work with https all-the-time. But still, in 2011, Yahoo Mail doesn’t. I have a soft spot for Yahoo as a former employer, and I want to keep liking them. Too bad they make it so difficult.
The deeper issue at stake is that if this serious of an issue goes unfixed for months, how many lesser issues lurk in the site and have been around for months or years? The issue is trust, my friend, and Yahoo just overdrew their account. I’m leaving.
FAQ
Q: So what do you want Yahoo to do about this? A: Well, they should fix their site for their millions of remaining users.
Q: What if they fix it tomorrow? Will you delete this message? A: No. Since I no longer trust the site, I am leaving, even though it takes time to notify all the people who still send me mail, and no matter what other developments unfold in the meantime. This page will explain my actions.
Q: Do you really want everyone else to leave Yahoo Mail? A: No, only those who care about their privacy.
Q: What’s your new email address? A: I have a couple, but <my first name> @ <this domain> is a good general-purpose one.
I will continue to update this page as more information becomes available. -m
Permalink
Filed under announcement, software, stuff, trends, yahoo
Thursday, September 2nd, 2010
This epic posting on MVC helped me better understand the pattern, and all the variants that have flowed outward from the original design. One interesting observation is that the earlier designs used Views primarily as output-only, and Controllers primarily as input-only, and as a consequence the Controller was the one true path for getting data into the Model.
But with browser forms, input and output are tightly intermingled. The View takes care of input and output. Something else has primary responsibility for mediating the data flow to and from the model–and that something has been called a Presenter. This yields the MVP pattern.
The terminology gets confusing quickly, but roughly
XForms Instance == MVP Model
XForms Model == MVP Presenter
XForms User Interface == MVP View
It’s not wrong to associate XForms with MVC–the term has become so blurry that it’s easy to lump variants like MVP into the same bucket. But to the extent that it makes sense to talk about more specific patterns, maybe we should be calling the XForms design pattern MVP instead of MVC. Comments? Criticism? Fire away below. -m
Permalink
Filed under intentional web, languages, patternalia, software, stuff, XForms
Wednesday, July 7th, 2010
As the world of web apps gets more framework-y, I need to get up to speed on contemporary automation testing tools. One of the most popular ones right now is the open source Selenium project. From the look of it, that project is going through an awkward adolescent phase. For example:
- Selenium IDE lets you record tests in a number of languages, but only HTML ones can be played back. For someone using only Selenium IDE, it’s a confusing array of choices for no apparent reason.
- Selenium RC has bindings for lots of different languages but not for the HTML tests that are most useful in Selenium IDE. (Why not include the ability to simply play through an entire recorded script in one call, instead of fine grained commands like selenium.key_press(input_id, 110), etc.?)
- The list of projects prominently mentions Selenium Core (a JavaScript implementation), but when you click through to the documentation, it’s not mentioned. Elsewhere on the site it’s spoken of in deprecating terms.
- If you look at the developer wiki, all the recent attention is on Web Drivers, a new architecture for remote-controlling browsers, but those aren’t mentioned in the docs (yet) either.
So yeah, right now it’s awkward and confusing. The underlying architecture of the project is undergoing a tectonic shift, something that would never see public light of day in a proprietary project. In the end it will come out leaner and meaner. What the project needs in the short term is more help from fresh outsiders who can visualize the desirable end state and help the ramped and productive developers on the project get there.
By the way, if this kind of problem seems interesting to you, let me know. We’re hiring. If you have any tips for getting up to speed in Selenium, comment below.
-m
Permalink
Filed under browsers, commercialism, software, web20
Saturday, June 12th, 2010
This came from a comment on the prior post, and it’s worth a shout of its own. Don Norman on the importance of command lines, including the ubiquitous search box, in modern UI. -m
Permalink
Filed under search, software, stuff
Wednesday, June 9th, 2010
Thought experiment: are there any commonly-expressed semantic queries–the kind of queries you’d run over a triple store, or perhaps a SearchMonkey-annotated web site–expressible in common type-in-a-searchbox query grammar?
As a refresher, here’s some things that Google and other search engines can handle. The square brackets represent the search box into which the queries are typed, not part of the queries themselves.
[term]
[term -butnotthis]
[term1 OR term2]
["phrase term"]
[tem1 OR term2 -"but not this" site:dubinko.info filetype:html]
So what kind of semantic queries would be usefully expressed in a similar way, avoiding SPARQL and the like? For example, maybe [by:"Micah Dubinko"] could map to a document containing a triple like <this document> <dc:author> “Micah Dubinko”. What other kinds of graph queries are interesting, common, and simple to express like this? Comments welcome.
-m
Permalink
Filed under everythingismiscellaneous, intentional web, Mark Logic, metadata, search, software, stuff
Friday, May 14th, 2010
Facebook (v): to deliberately create an impenetrable computer user interface for purposes of manipulating users.
More collected Geek Thoughts at http://geekthoughts.info.
Permalink
Filed under annoyance, commercialism, geekthoughts, software
Wednesday, May 5th, 2010
According to this article, a recent terror suspect almost got on a plane despite being recently added to the no-fly list. Why is it so difficult to administer a no-fly list? The CAP Theorem has answers. (Disclaimer: as always, this blog is apolitical–this isn’t about whether no-fly lists are a good idea or not, only a matter of technical interest)
Without stretching the imagination too much, one can think of a no-fly list as a distributed database. The list apparently changes frequently, and it needs to be accessible from thousands of airport gates and reservation desks. Thus CAP Theorem applies. In a nutshell, that theorem states that of Consistency, Availability, and Partition-tolerance, you can only pick, at most, two. Hit the link above for a much better, more complete description.
If there was one centralized list, the system would be Consistent and Available, but every time a name needed to be checked it would require an immediate network round-trip–should the connection to that central list go down, no further checks would be possible–no Partition tolerance.
Of course, the airline could set a policy that if said network connection goes down, no passengers at all would be able to get on planes. This would be a case of lack of Availability.
Or, the complete list could be periodically copied to each location that needs it. This provides good Availability and Partition tolerance, but fails Consistency, since it’s possible to miss out on late-breaking updates. Apparently, something like this is what happened.
More collected Geek Thoughts at http://geekthoughts.info.
Permalink
Filed under geekthoughts, software
Friday, December 11th, 2009
At first glance, this seems to be the Snow Leopard of Tinderbox releases–lots of behind-the-scenes technology updates and largely the same core features. If you’re looking for a way to get more organized, it’s worth a look. Link. -m
Permalink
Filed under announcement, software, writing
Sunday, November 29th, 2009
One of the lead bullets describing why XForms is cool always mentions that it is based on a Model View Controller framework. When building a full XRX app, though, MVC might not be the best choice to organize things overall. Why not?
Consider a typical XRX app, like MarkLogic Application Builder. (You can download a your copy of MarkLogic, including Application Builder, under the community license at the developer site.) For each page, the cycle goes like this:
- The browser requests a particular page, say the one that lets you configure sorting options in the app you’re building
- The page loads, including client-side XForms via JavaScript
- XForms requests the project state as XML from a designated endpoint; this becomes the XForms Instance Data
- Stuff happens on the page that changes the client-side state
- Just before leaving the page, XML representing the updated state is HTTP PUT back to the endpoint
The benefit of this approach is that you are dealing with XML all the way through, no impedance mismatches like you might find on an app that awkwardly transitions from (say) relational data to Java objects to urlencoded name/value pairs embedded in HTML syntax.
So why not do this in straight MVC? Honestly, MVC isn’t a bad choice, but it can get unwieldy. If an endpoint consists of a separate model+view+controller files, and each individual page consists of separate model+view+controller files, it adds up to a lot of stuff to keep track of. In truly huge apps, this much attention to organization might be worth it, but most apps aren’t that big. Thus the MET pattern.
Model: It still makes sense to keep the code that deals with particular models (closely aligned with Schemas) as a separate thing. All of Application Builder, for example, has only one model.
Endpoint: The job of an endpoint is to GET and PUT (and possibly POST and DELETE) XML, or other equivalent resource bundles depending on how many media types you want to deal with. It combines an aspect of controllers by being activated by a particular URL and views by providing the data in a consistent format.
Template: Since XForms documents already contain MVC mechanics, it not a high-payoff situation to further use MVC to construct the XForms and XHTML wrapper themselves. The important stuff happens within XForms, and then you need various templating mechanisms for example to provide consistent headers, footers, and other pieces across multiple pages. For this, an ordinary templating mechanism suffices. I can imagine dynamic assembly scenarios where this wouldn’t be the case, but again, many apps don’t need this kind of flexibility, and the complexity that comes along with it.
What about separation of concerns? Oh yeah, what about it? :-) Technically both Endpoints and Templates violate classical SOC. In an XRX app, this typically doesn’t lead to the kinds of spaghetti situations that it might otherwise. Endpoints are self contained, and can focus on doing just one thing well; with limited scope comes limited ability to get into trouble. For those times when you need to dig into the XQuery code of an endpoint, it’s actually helpful to see both the controller and view pieces laid out in one file.
As for Templates, simplicity wins. With the specifics of models and endpoints peeled away, the remaining challenge in developing individual pages is getting the XForms right, and again, it’s helpful to minimize the numbers of files one XForms page are split across. YAGNI applies to what’s left, at least in the stuff I’ve built.
So, I’ve been careful in the title to call this an “organizational pattern”, not a “design pattern” or an (ugh) “architectural pattern”. Nothing too profound here. I’d be happy to start seeing XRX apps laid out with directory names like “models”, “endpoints”, and “templates”.
What do you think? Comments welcome.
-m
Permalink
Filed under browsers, Mark Logic, software, XForms, XQuery
Saturday, September 26th, 2009
My personal stability theory, as it applies to software engineering: in a multilayered software architecture, the likelihood layer N works well can be expressed as a probability (less than 1 in practice) relative to the lower level layer N-1. For example, if you attempt to write a mission critical Tcl app on a flaky Tcl interpreter, you’re in for some long nights. Via multiplication, a corollary is that the more layers a system has, the less likely it is to work well. (As an aside, I’m not arguing that all software architectures should have fewer layers–other forces outside the scope of this article work against systems with too few layers.)
Joel said something similar lately in the article The Duct Tape Programmer. There is a strong tendency for many coders to over-engineer a system, building towering heights of abstraction. In contrast, a Duct Tape Programmer gets the job done by making something ugly (and with fewer layers) but at least it works. So far this is a fit with what stability theory predicts.
But then he speaks out against unit testing, referring to it in similar terms to the extravagant tower. Quoting JWZ: “If there’s no unit test the customer isn’t going to complain about that.” Here stability theory makes a different prediction. Particularly in the lower levels of the system, flakiness is disastrous. You have to be sure that your foundation is stable before building upon it, or you’re in for keyboard-on-forehead-induced head trauma. This is true no matter how tight the deadlines are or how much pressure is on. In fact, when you don’t have time for a write-over, its even more important to get it right the first time.
The top accomplishment for a coder is shipping software. Duct Tape Programmers make this happen by avoiding needless complexity, which is a great principle to live by. I’m reminded of what Brian Kernighan is attributed as saying:
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
Debugging, or more generally making software that works well all the way to the user-facing layer, is hard. Anything that provides fundamental assertions about the stability of your foundation is a useful tool, so don’t slack off on the unit testing.
What about you? Have you found stability theory to be supported by the facts? Comment below.
-m
Permalink
Filed under geekthoughts, software
Wednesday, April 22nd, 2009
Hey readers, all seven of you, can you help me out?
I’m perhaps finally switching to a Mac-native text editor, TextWrangler, or if I really like it, BBEdit. Within that app, what’s the easiest way to enter unusual characters not found on a keyboard, say š (Latin s with háček) or ḫ (h-breve below)? In jEdit, one can set up longer strings that get automatically converted into harder-to-type ones. What’s the equivalent in TextWrangler or BBEdit? -m
Permalink
Filed under software, writing
Sunday, March 8th, 2009
The remarkable (and prolific) Stephen Wolfram has an idea called Wolfram Alpha. People used to assume the “Star Trek” model of computers:
that one would be able to ask a computer any factual question, and have it compute the answer.
Which has proved to be quite distant from reality. Instead
But armed with Mathematica and NKS [A New Kind of Science] I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable.
It’s not easy to do this. Every different kind of method and model—and data—has its own special features and character. But with a mixture of Mathematica and NKS automation, and a lot of human experts, I’m happy to say that we’ve gotten a very long way.
I’m still a SearchMonkey guy at heart, so I wonder how much Wofram’s team is familiar with existing Semantic Web research and practice–because at a high level this seems very much like RDF with suitable queries thereupon. If that’s a good characterization, that’s A Good Thing, since practical application has been one of SemWeb’s weak spots.
-m
Permalink
Filed under AI, aswemaythink, commercialism, intentional web, languages, Mark Logic, math, metadata, software, yahoo
Wednesday, January 7th, 2009
I’ve started looking into porting the WebPath code (and eventually XForms Validator) over to Python 3. The first step is external libraries, of which there is only one. WebPath uses the lex.py module from PLY. I had got it into my head that Python 2.x and 3.x were thoroughly incompatible, but leave it to the remarkable David Beazley to blow that assumption out of the water: the latest version of lex.py from SVN works in both 2.x and 3.x.
From there the included 2to3 tool was easy enough to run. (Relatively more difficult was getting 2.6 and 3.0 versions of Python frameworks installed on Mac, but even that wasn’t too bad.) The tool made some moderate changes, and I can run the unit tests, and a few even pass!
The primary remaining problem stems from code where the documentation is a little unclear, and my inexperience is severe. The part of the code in platonicweb.py that reads nasty, grotty HTML via Tidy and produces a clean DOM throws an exception every time. Seems to be a mismatch between String and Byte (encoded string) types, but manifested as a failed XML parse. Sans exception handling, the code looks like:
page = urllib.request.urlopen(fullurl)
markup = page.read()
dom = xml.dom.minidom.parseString(markup)
urlopen() returns a file-like object, but the docs didn’t seem clear on whether it’s like a file opened in byte or string mode. In any case, I’m almost certainly doing it wrong. Suggestions?
-m
Permalink
Filed under languages, python
Tuesday, December 30th, 2008
After a delay, the code to my RDFa parser in XQuery is now available under an Apache license. Go get it. This is some of the earliest XQuery code I ever wrote, so go easy on me. It follows the earlier work on a functional definition of RDFa. And feel free to send in patches. -m
Permalink
Filed under announcement, IPR, Mark Logic, metadata, software
Monday, December 8th, 2008
I was on the panel with Bob DuCharme, Frank Miller, and Evan Lenz discussing content authoring, from DITA to DocBook with some WordML sprinkled in for good measure. It was a good discussion, nothing earth-shaking. This session was laptopless, so I don’t have any significant notes. -m
Permalink
Filed under infopath, software, writing, xml
Friday, December 5th, 2008
The long-awaited Python 3.0 is out. It fixes almost every annoyance I have with the language, particularly around Unicode handling, which is important in the kinds of projects I work on.
Now, to revisit some of my Open Source projects… -m
Permalink
Filed under python
Thursday, October 30th, 2008
I’m pondering implementing the computational parts of the XForms Model in XQuery. Doing so in a largely functional environment poses some challenges, though. Has anybody tackled this before? How about in any functional language, including ML, Haskell, Scheme, XSLT, or careful Python?
I borrowed the book Purely Functional Data Structures from a friend–this looks to be a good start. What else is out there? Comment below. -m
Permalink
Filed under intentional web, patternalia, software, XForms, XQuery
Monday, October 20th, 2008
I haven’t seen this anywhere else: jEdit doesn’t start up under the recent Mac Java 1.6. It bounces in the dock a few times then goes away.
The solution: manually run the main jar with java -jar path-to/jedit.jar, which will work. Go to the plugin manager and delete the MacOSX plugin. Java integration is good enough in 1.6 that this really isn’t needed anyway. Quit jEdit and now it will start up fine the usual way. -m
Permalink
Filed under annoyance, software
Monday, October 13th, 2008
Without any exception I can think of: every top-notch software developer I know is also a skilled technical writer. Technical writing requires skill in choosing words, constructing sentences and paragraphs, and putting together the pieces in the right order to most effectively present the material.
In contrast, narrative writing requires an eye towards the bigger picture, an overall story arc. To put it another way, beginnings, middles, and ends. Hollywood screenwriters have got this down to a science, dividing screenplays into three acts. Next time you visit the movies, look for the parts and how the connect.
Act I, comprising about 1/4 of the whole work, introduces the characters and situation. Between Act I and Act II a key even happens to propel the story forward. Neo swallows the pill. Luke Skywalker finds his Aunt and Uncle killed. In Act II, comprising about 1/2 of the story, the “real story” begins. Another key moment happens to introduce the final Act III, which culminates during the final 1/4 of the story. Three acts: beginning, middle, and end. Other aspects of fiction writing, say characterization, are relatively less important in technical narratives.
A great introduction to these concepts is Syd Field’s Screenplay, to give one a broader view on what story is really all about, and why some stories move people more than others. Many of the concepts apply equally to software narratives. And like I wrote about earlier, such narratives are a powerful (if underused) tool in software development. -m
Permalink
Filed under software, writing
Friday, October 10th, 2008
I haven’t tried this, but these guys claim to have a solution where
The form definitions are saved and exchanged as XForms, and the data as XForm[s] models. The data can be exchanged over http (if the phone users can afford GPRS and have a data connection) or over compressed SMS messages.
Sounds like they have the right idea… -m
Permalink
Filed under browsers, mobile, software, XForms
Wednesday, October 1st, 2008
Evernote now has import/export (in an XML format), meaning it now passes the generation test for data availability and lock-in-avoidance, as I wrote about some years ago. There’s a server API, as well as client-side scripting. I need to look into the details more, but as a start it looks like a home run. -m
Update: looking at the actual export XML, I’m disappointed. Each note is CDATA-escaped XML? Why???
Permalink
Filed under aswemaythink, software
Thursday, September 25th, 2008
I’m working on a piece of software that, while not the answer to world peace, is still pretty neat and approaches a specific problem in a fresh way. The project is at the stage where it needs to get unveiled to early adopters in the target audience. So how does one introduce possibly unfamiliar concepts in the form of a new API?
The approach we ended up using for the initial documentation is essentially a narrative–telling a story. Narrative fills the gap between use case and solution in an engaging way. People are naturally inclined to listen to stories, and to expect certain story structures, such as having a beginning, middle, and end with suitable transitions. Thus, if the listener senses a gap in the story, it’s easy for them to speak up. When the story works, people find it easier to map their personal story on to the narrative, leading to better absorption of new concepts, and a more positive impression of the software.
And it’s working. So far we’ve gotten far more useful feedback than we would have otherwise. Even before showing others, the exercise of writing the narrative has exposed gaps and flaws in our thinking, leading to a better, more cohesive design.
If you think back about how you learned about, say, object oriented programming, or event-driven programming, likely there was a story or detailed use case involved that helped you get on board with a new way of thinking. Software + story: It’s a powerful combination, I recommend it.
BTW, my team is hiring full-time positions. Especially if you’ve got XML skills, you could be part of this team. Send me email if interested. -m
Permalink
Filed under Mark Logic, software, writing
Wednesday, September 17th, 2008
The XQuery Working Group is debating the need for higher-order functions in the language. I’m working on honing my description of why this is an important feature. Does this work? What would work better?
Imagine you are writing a smallish widget app, in an environment without a standard library. When you need to sort your widgets, you’d write a simple function with a signature like sort(sequence-of-widgets). That’s great.
Now imagine you find your app to be steadily growing. An accumulation of smaller one-off solutions won’t work anymore, you need a general solution. What you’ll end up with is something like qsort in C, which takes a pointer to a comparator function. By providing different comparators, you can sort anything any way you like, all through only a single sort function. C and C++ have something like this, as do PHP, Python, Java, JavaScript, and even assembly language. XSLT has it, as proven by Dimitre.
XQuery doesn’t. It should, because people are now using it for more than short queries. People are writing programs in it. -m
P. S. Comment please.
Permalink
Filed under languages, python, standards, XQuery
Tuesday, July 15th, 2008
This article made my day. Very similar approach to what I did in WebPath, but even cleaner. Great explanation and performance numbers. -m
P.S. Thanks to Crock for pointing this out.
Permalink
Filed under python, xpath
Thursday, May 29th, 2008
Bumped into XRX today. XForms + REST + XQuery. I like the sound of this, and XForms on the client just got a whole bunch easier…
I’m seeing multiple signs that the confluence of XForms and XQuery has legs. (And REST just plain makes sense in any situation). -m
Permalink
Filed under browsers, software, XForms, XQuery
Wednesday, May 28th, 2008
I registered ‘xfv’ on Google App Engine. Too bad there doesn’t appear to be any significant XML libraries supported. I have XPath covered by my pure-python WebPath, but what about Relax NG? Anyone know of anything in pure python? -m
Permalink
Filed under announcement, google, languages, python, XForms, xpath
Wednesday, April 30th, 2008
“Rails is a lot of fun, and lets me do cool new things – but it’s hard to eat it.”
Simon St. Laurent
-m
Permalink
Filed under software, stuff
Monday, April 28th, 2008
I haven’t mentioned it yet, but SearchMonkey (now an official name, not just a project name) is in external limited beta. Keep an eye on ysearchblog, lots more technical content is on the way. -m
Permalink
Filed under announcement, intentional web, metadata, software, yahoo
Monday, March 3rd, 2008
The WebPath bug reports continue to roll in. For one, queries against *.wikipedia.* don’t seem to work. You get something back, but it has no resemblance to the page you were looking for. The problem comes from the W3C tidy service that I use, specifically that the (understandably overworked and understaffed) admins at the Wikimedia Foundation seem to have blocked it. It seems like more than a simple IP or user-agent-based block. I’ve emailed them about it but haven’t heard back yet.
So, this highlights the limitation of having a single-source converter in the Platonic Web module of WebPath. So I turn to my readers: do you know of any other tidy servers? Or converters of a non-tidy origin? For any of these to work, they need to return clean XML corresponding to the original page (as opposed to, say, returning something with big headers/footers or ampersand-encoded). This seems like an outstanding need for the open source community.
Please comment below with ideas. Thanks! -m
UPDATE: heard back from the Wikipedia admins, and although professional and helpful-as-can-be-expected, they won’t be changing anything on their end. Still looking for more open source options.
Permalink
Filed under annoyance, intentional web, software, xpath