Archive for the 'everythingismiscellaneous' Category

Sunday, April 24th, 2016

The physics of the impossibly tiny

According to Newtonian gravitation, the attraction between two bodies is proportional to the product of their masses and inversely proportional to the square of the distance between them. Einstein refined this somewhat, but as long as there aren’t crazy speeds or non-flattish spacetime involved, Newton’s formulation is accurate. As far as we know.

I read this interesting article, which spun off the thought experiment below. The EM Drive mechanism keeps confounding some Really Smart people who would genuinely like to discredit it. What’s going on here? I’m probably butchering the explanation, but the article posits that very small accelerations don’t behave in a completely smooth manner. In other words, they’re subject to quantum effects.

Here’s a thought experiment. What is the Newtonian gravitational attraction between this hydrogen atom in my little finger, and one in, say, the Andromeda galaxy? (Pedantically, one that was in the Andromeda galaxy 2.5 million years ago, from the Earth’s frame of reference)

I put this into Wolfram Alpha (which helpfully supplied an value for the big-G constant) and came up with an answer of:

1.419 * 10-109 N (newtons)

That’s not a typo. There’s 109 places to the right of the decimal.

I don’t even have a good analogy for how slight that force is. A trillionth trillionth of the seismic energy of a dandelion seed landing, measured from a trillion km away?? Still too much, by a lot. It’s a reasonable question whether the universe even goes to that level of detail. Could such a slight force even be said to exist in a meaningful sense? In other words, could it be measured, even in principle? I’m no expert, but I doubt it. So maybe physics isn’t completely smooth down to that level.

Is there a limit to how small a force can be and still act like a force?

To accurately compute the complete gravitational affect on this hydrogen atom in my little finger, I’d have to take into account the 1080 particles in the observable universe, the vast majority of which make unmeasurably tiny contributions to the overall sum. That’s for a single instant. For a continuous number, I’d need to repeat this enormous calculation something like 1043 times per second. So all you universe simulator writers out there, take note. Some simplification is probably warranted. :)

(A realistic simulator would only need to take into account the mass of the Earth, Moon, Sun, and for a few decimal places more, Jupiter. But I’m talking about laws running the universe, not engineering hacks.)

It seems there are still some quite interesting things left to discover in the universe. I keep going back to the chapter Surprises from the Real World in Lee Smolin’s Trouble With Physics. Exciting times ahead! -m

Thursday, February 5th, 2015

DIY fizzy yerba mate drink

Filed under need-to-try this: A homebrew version of a popular hacker drink called Club Mate.

I already have a carbonator cap and CO2 setup, as part of my beer brewing hardware.

One variation I would experiment with is cutting down on the sugar. Even though Club Mate isn’t very sweet, it still has a fair amount of sugar in it. Stevia could be a good alternative. -m

Tuesday, February 18th, 2014

Fitness

I can’t blog about secret projects I’m working on, so how about something completely different?

I’ve improved my fitness level substantially over the last five years. (On index cards, I have my daily weight and body fat percentage, according to the bathroom scale, back to November 2009). Here’s some things I’ve learned:

  • Moving counts. A lot. The difference between being completely sedentary and moving a bit (easy walks, standing desk, etc.) is the biggest leap. Everything after that is incremental.
  • Spending $99 in a Fitbit is the best health investment I’ve made, dollar-for-dollar, ever.
  • Expensive shoes don’t help much. My current main shoes were $40 online, and they’re just as good, if not better, than the $120 shoes from Roadrunner.
  • Pilates looks easy if you’ve never tried it.
  • Once you reach a certain level, you will plateau there unless you challenge yourself further.
  • Strength training is helpful for just about everything, even improving your running times.
  • Foam rollers are super useful for managing sore muscles and tendons. Highly-recommended.
  • Boosting your VO2Max is painful–interval training is the gasping-for-air kind of torture many people think of when they hear the word ‘exercise’–but it’s also important if you want to improve your run times.
  • But you shouldn’t try to improve your run times or anything else unless you have specific bigger-picture goals in mind.
  • Seriously–sitting is terrible for you. Get a standing desk.

Invest in yourself. -m

Sunday, March 31st, 2013

Introducing node-node:node.node

Naming is hard to do well, almost as hard as designing good software in the first place. Take for instance the term ‘node’ which depending on the context can mean

  1. A fundamental unit of the DOM (Document Object Model) used in creating rich HTML5 applications.
  2. A basic unit of the Semantic Web–a thing you can say stuff about. Some nodes are even unlabeled, and hence ‘blank nodes’.
  3. In operations, a node means, roughly, a machine on the network. E.g. “sixteen-node cluster”
  4. A software library for event-driven, asynchronous development with JavaScript.

I find myself at the forefront of a growing chorus of software architects and API designers that are fed up with this overloading of a perfectly good term. So I’m happy today to announce node-node:node.node.

The system is still in pre-alpha, but it solves all of the most pressing problems that software developers routinely run in to. In this framework, every node represents a node, for the ultimate in scalable distributed document storage. In addition, every node additionally serves as a node, which provides just enough context to make open-world assumption metadata assertions at node-node-level granularity. Using the power of Node, every node modeled as a node has instant access to other node-node:nodes. The network really is the computer. You may never write a program the old way again. Follow my progress on Sourceforge, the latest and most cutting-edge social code-sharing site. -m

Tuesday, July 5th, 2011

Geek Thoughts: how I take my tea

Having been recently accused of “vile” habits in regard to tea-drinking, I feel that I need to clear the air. :)

I’ve never been officially tested, but I am almost certainly a supertaster. (This explains, among other things, my aversion to most vegetables and my status as a nationally ranked beer judge). I’ve never been medically tested, but I did go through the BBC test and some rough taste-bud-counting with blue dye and a mirror.

So I do not generally follow accepted wisdom with tea. To prepare tea, I get a nice glass of cold water and plunk in a tea bag. Same goes for other tea-like substances, such as yerba mate. The result is a much slower steeping process, where subtle flavors shift throughout the day and with different refills. Does it get bitter? While tannins are part of the tea flavor, you don’t get that intense, mouth-puckering astringency like you would hot-steeping tea for too long. It’s more gradual and interesting.

Different kinds of tea have different spectrums of flavor, as revealed over the course of a day. Earl Grey and green tea are particularly nice. Some interesting combinations are possible too, by combining two teas which reach their flavor peaks at different times.

I say keep an open mind, and don’t knock it if you haven’t tried it. :) -m

 

Sunday, October 24th, 2010

Geek Thoughts: statistical argument against link shortener sustainability

I’ve seen lots of discussion for and against link shorteners, but not specifically this line of argument:

Let me grab a random shortened link from Twitter. Don’t go away, I’ll be right back.

http://bit.ly/b1fYi1

OK, that’s six characters in the domain, a slash, and six more characters. 50 years from now, if bit.ly is still in operation, the URLspace will be rather more crowded, and the part after the slash might be eight or nine characters. This is a significant cliff, since most people have trouble remembering more than 6 or 7 things in their head at a time. Thus, one could conclude that 50 years from now, newly minted bit.ly URLs will be less fashionable than those from newer link-shortening services, particularly if more short TLDs come online, which seems likely. In that scenario, fewer and fewer people will use bit.ly, and it will become a resource-pit as costs go up (for more database storage, among other things) while usage drops, an economic trend that has only one eventual outcome, leading to the breaking all the external links relying on this service.

I’ve been picking on bit.ly here, but the same principle applies to any shortener service. In fact, the more popular, the more quickly the URLspace will fill.

The moral: don’t use link shorteners for anything that needs to be more durable than something you’d scribble on a scrap of paper at your desk.

More collected Geek Thoughts at http://geekthoughts.info.

Tuesday, August 3rd, 2010

Heard, overheard, and misheard at Balisage

The opening day of the conference was not Balisage proper, but a separate symosium on “XML for the long haul”.

Some interesting tidbits overheard, in no particular order…

“it is not necessarily clear that this approach would capture the difference between the ridiculous and the merely implausible.”

Complexity — what is the relationship betwen complexity and long-term data storage?

“Narratives with fancy words in them”

How do you store, say, a video in a format that will be readable in 100 years?

Order of magnitude scale changes produce discontinuities

“The Da Vinci Schema”

Dandelion DNA (Free license)

“Indispensible” — “I don’t think that means what you think it does”

“Keeping electrons alive is really difficult”

“I wondered…with my Topic Map brain damage…”

-m

Saturday, June 26th, 2010

Steve Martin mead joke

Steve Martin leaves an awesome list of demands for venue staff when he’s on tour, including

BEVERAGE SERVICE must include a thoughtful assortment of meads and bendy straws.

IMPORTANT NOTE: Bendy straws must be strong enough to be able to be used as blowguns.  ADDITIONAL IMPORTANT NOTE: Local paramedic aid may be required.

Read the rest, it’s great. -m

Wednesday, June 9th, 2010

“Google syntax” for semantic queries?

Thought experiment: are there any commonly-expressed semantic queries–the kind of queries you’d run over a triple store, or perhaps a SearchMonkey-annotated web site–expressible in common type-in-a-searchbox query grammar?

As a refresher, here’s some things that Google and other search engines can handle. The square brackets represent the search box into which the queries are typed, not part of the queries themselves.

[term]

[term -butnotthis]

[term1 OR term2]

[“phrase term”]

[tem1 OR term2 -“but not this” site:dubinko.info filetype:html]

So what kind of semantic queries would be usefully expressed in a similar way, avoiding SPARQL and the like? For example, maybe [by:”Micah Dubinko”] could map to a document containing a triple like <this document> <dc:author> “Micah Dubinko”. What other kinds of graph queries are interesting, common, and simple to express like this? Comments welcome.

-m

Thursday, June 3rd, 2010

Reverse Engineering Corexit 9500

If you dig a bit, there’s all kinds of interesting background material about the terrible disaster ongoing in the Gulf of Mexico. For example, a map of the thousands of rigs and tens-of-thousands of miles of pipelines. Some of the best infographics are from BP itself. And for when you can no longer stand the overwhelming sense of disaster, a fake twitter feed.

But this really caught my eye, from Nalco, the manufacturer of the oil dispersant Corexit 9500 which is being used both in unprecedented quantities and depths in the Gulf. Here’s how they cleverly describe the ingredients of their product, an ingredient list they protect as a trade-secret:

  1. One ingredient is used as a wetting agent in dry gelatin, beverage mixtures, and fruit juice drinks.
  2. A second ingredient is used in a brand-name dry skin cream and also in a body shampoo.
  3. A third ingredient is found in a popular brand of baby bath liquid.
  4. A fourth ingredient is found extensively in cosmetics and is also used as a surface-active agent and emulsifier for agents used in food contact.
  5. A fifth ingredient is used by a major supplier of brand name household cleaning products for “soap scum” removal.
  6. A sixth ingredient is used in hand creams and lotions, odorless paints and stain blockers.

That is one impressive bit of verbal agility, my complements to their staff writer(s). It would be a fun exercise some day to see what kinds of toxic sludge could be described in similar terms. But let’s see if we can figure out the exact ingredient list: here’s the MSDS for the substance. According to it Propylene Glycol is clearly one of the ingredients, as are “Distillates, petroleum, hydrotreated light” and “Organic sulfonic acid salt”. “Wetting agent” and “surface-acting” are both code words for a surfactant. A little knowledge of chemistry along with household product label reading might go a long way… Got insight? Add a comment here to describe what you find.

-m

6/10 Update: Nalco released the full ingredient list and cheat sheet:

CAS # Name Common Day-to-Day Use Examples
1338-43-8 Sorbitan, mono-(9Z)-9-octadecenoate Skin cream, body shampoo, emulsifier in juice
9005-65-6 Sorbitan, mono-(9Z)-9-octadecenoate, poly(oxy-1,2-ethanediyl) derivs. Baby bath, mouth wash, face lotion, emulsifier in food
9005-70-3 Sorbitan, tri-(9Z)-9-octadecenoate, poly(oxy-1,2-ethanediyl) derivs Body/Face lotion, tanning lotions
577-11-7 * Butanedioic acid, 2-sulfo-, 1,4-bis(2-ethylhexyl) ester, sodium salt (1:1) Wetting agent in cosmetic products, gelatin, beverages
29911-28-2 Propanol, 1-(2-butoxy-1-methylethoxy) Household cleaning products
64742-47-8 Distillates (petroleum), hydrotreated light Air freshener, cleaner
111-76-2 ** Ethanol, 2-butoxy Cleaners

The * footnote indicates, essentially, “contains propylene glycol”.

The ** footnote indicates that this chemical is found only in Corexit 9527, not the one most commonly used in the Deepwater Horizon cleanup.

Monday, April 19th, 2010

The Rick Wakeman clause?

Phrase seen in this article about whether video games are art, and Roger Ebert’s opinions thereon.

“Video games by their nature require player choices, which is the opposite of the strategy of serious film and literature…”

Hmm, Mr. Ebert doesn’t seem to be up on the concept of hypertext, which has manifold connections with cinema. See for instance the scholarly paper Cinematic Paradigms for Hypertext. In fact, making a hypertext or branching narrative requires even greater amounts of authorial skill.

But I’m still curious, what is the Rick Wakeman clause? From where did that term originate? -m

Monday, February 8th, 2010

Economics 101 question

Let’s say you have a box that (completely legally) spits out 1 dollar per day. I’m using “box” in an abstract sense here: maybe it’s an investment or a business opportunity. How much would you pay for this box? In other words, what’s its fair market value?

What if it spit out one dollar per hour? Would you pay exactly 24x as much for it then? Or one per week–would you pay 1/7th as much?

What if it’s hard to measure how much money comes out of it–maybe sometimes it emits a dollar, but sometimes you have to put one in. Then what? -m

Thursday, December 10th, 2009

500th Post

Celebrating 500 posts since I went to WordPress in May 2006. Prior to that, an additional 730 posts as I floated through a typical evolution of blogging platforms:

  • Easy start: blogger (299 posts in 24 months)
  • Succumbing to the desire to roll your own (259 posts in 12 months)
  • Realizing that rolling your own is too difficult: Pyblosxom (172 posts in 12 months)
  • Moving to a mature platform you don’t need to worry about much: WordPress (500 posts in 42+ months)

-m

Sunday, November 8th, 2009

High Temperature Superconductors

If this site is accurate, it’s now possible to have superconducting material at household freezer temperatures: 254k, or a tiny bit below 0F. From power lines to maglevs to supercolliders to energy storage, the potential applications boggle the mind. -m

Note: I’m having trouble finding independent verification of this, other than what appears to be re-hashes of the superconductor.org article. If you have any additional proof or refutation, please post it in the comments.

Thursday, November 5th, 2009

Metadata FTW

Link credit goes to Joho.

This looks pretty significant. The AZ Supreme Court ruled that document metadata must be disclosed under existing public records law. This may start a chain reaction with other states following suit. With the movement toward open data including data.gov and the Federal Register, this fits in well. Quite often metadata including creation date and author and the like make for much better searching and faceting. -m

Friday, July 31st, 2009

Pragmatic Namespaces

In case any of the 7 regular readers here aren’t following xml-dev, check out and add to the discussion about Pragmatic Namespaces, proposed as a solution for the “distributed extensiblity” problem in HTML5.

For years people have been pointing to Java as the model for how XML namespaces should work, so this proposal goes that direction. Either it will work, or else it will get people to finally shut up about the whole idea. :)

It’s heavily based on Tom Bradford’s Clean Namespaces proposal, which doesn’t have a living URL anymore but is available on archive.org.

-m

Friday, June 19th, 2009

VoCamp Wrap-up

I spent 2 days at the Yahoo! campus at a VoCamp event, my first. Initially, I was dismayed at the schedule. Spend all the time the first day figuring out why everybody came? It seemed inefficient. But having gone through it, the process seems productive, exactly the way that completely decentralized groups need to get things done. Peter Mika did a great job moderating.

Attendees numbered about 35, and came from widely varying backgrounds from librarian to linguist to professor to student to CTO, though uniformly geeky. With SemTech this week, the timing was right, and the number of international attendees was impressive.

In community development, nothing gets completely decided just because a few people met. But progress happens. The first day was largely exploratory, but also covered plenary topics that nearly everyone was interested in. Namely:

  • Finding, choosing, and knowing when to create vocabularies
  • Mapping from one vocabulary to another
  • RDBMS to RDF mapping

Much of the shared understanding of these discussions is captured on various wiki pages connected to the one at the top of this article.

For day 2, we split into smaller working groups with more focused topics. I sat in on a discussion of Common Tag (which still feels too complex to me, but does fulfill a richer use case than rel-tag). Next, some vocabulary design, planning a microformat (and eventual RDF vocab) to represent code documentation: classes, functions, parameters, and the like. Tantek Çelik espoused the “scientific method” of vocab design: would a separate group, in similar circumstances, come up with the same design? If the answer is ‘yes’, then you probably designed it right. The way to make that happen is to focus on the basics, keeping everything as simple as possible. If any important features are missed, you will find out quickly. The experience of getting the simple thing out the door will provide the education needed to make the more complicated follow-on version a success.

From the wrap-up: if you are designing a vocabulary, the most useful thing you can do is NOT to unleash a fully-formed proposal on the world, but rather to capture the discussion around it. What were the initial use cases? What are people currently doing? What design goals were explicitly left off the table, or deferred to a future verson, or immediately shot down? It’s better to capture multiple proposals, even if fragmentary, and let lots of people look them over and gravitate toward the best design.

Lastly, some cool things overheard:

“Relational databases? We call those ‘legacy’.”

“The socially-accepted schema is fairly consistent.”

“It’s just a map, it’s not the territory.”

-m

Wednesday, June 10th, 2009

The Inmates are Running the Asylum: review and RFE

The central thesis of The Inmates are Running the Asylum by Alan Cooper is dead on: engineers get too wrapped up in their own worlds, and left entirely to their own whims can easily make a product incomprehensible to ordinary folks. For this reason alone, it’s worth reading.

But I do question parts of his thesis. He (with tongue in cheek) posits the existence of another species of human, called Homo Logicus. Stepping on to an airplane, Homo Logicus turns left into the cockpit with a million buttons but ultimate control over every aspect of the plane. Regular Homo Sapiens, on the other hand, turn right and tuck themselves into a chair–no control but at least they can relax.

But if there was only one “species” of Homo Logicus, members (like me) would never experience usability issues in software created by fellow Logicians. But ordinary fax machines give me fits. The touch-screen copier at work instills dread in my heart. And the software I need to use to file expense reports–written by enterprise software geeks probably very similar to me–is a usability nightmare. Words fail me in expressing my disdain for this steaming heap of fail.

The book is sub-titled “Why High-Tech Products Drive Us Crazy”, but one doesn’t have to look very far to find similar usability bugs in the low-tech world. Seth Godin, for example, likes to talk about different things in life that Just Don’t Work, along with reasons why. Some examples:

  • airport cab stand (75 cabs, 75 people, and it takes an hour)
  • “don’t operate heavy machinery” warning on dog’s prescription medicine
  • excessive fine print on liability agreements–intentionally hard to read and figure out
  • official “Vote for Pedro” shirts that look nothing like the ones in the movie
  • more examples on the web site

If anything, I think Cooper’s work doesn’t go far enough. It is relatively short on good examples, stretching out only four examples over four chapters. If properly-designed software is so hard to come up with examples of, then there are bigger problems in play (that would need to be dealt with by something more manifesto than book).

The book now 5 years old. Perhaps it’s time for an update. Particularly in the world of web software, lots has happend in 5 years. Flickr. Gmail. Yahoo Pipes. Google Docs. Even SearchMonkey. Instead of focusing on pointing at crappy software, I’d like to see more emphasis on properly-done interfaces. More delving into nuance, and common factors behind why both high-tech and low-tech products miss the mark.

But maybe that’s just me. -m

Friday, May 15th, 2009

A nugget from _A Canticle for Leibowitz_

This brilliant bit is almost a throwaway paragraph on page 304, near the end.

[Two men in a satirical dialog] managed only to demonstrate that the mathematical limit of an infinite sequence of “doubting the certainty with which something doubted is known to be unknowable  when the ‘something doubted’ is still a preceding statement ‘unknowability’ of something doubted,” that the limit of this process at infinity can only be equivalent to a statement of absolute certainty, even though phrased ans an infinite series of negations of certainty.

It’s not like the whole book is like this…far from it. But it is chock full of little gems.

-m

Thursday, March 26th, 2009

Signs of life in cold fusion research

This article seems encouraging. I’ve never been able to come to grips with the anti-CF bias of the scientific community. Sure a few researchers made fools of themselves two decades ago, but what has that got to do with falsifiable hypotheses? A small amount of research goes on with minimal funding, under the newer name of Low Energy Nuclear Reactions (LENR), and the signs are encouraging.

From the article, researchers used plastic as a permanent record of neutron movement and found that, indeed, neutrons are being produced, leaving tiny tracks behind.

Another recent article from Jeffrey Kooistra has more details of current research. Good stuff, and important if it works. Heck, it’s important if it doesn’t work, because that still expands what we know. -m

Wednesday, February 25th, 2009

Brian May explains relativity

This is fantastic. Brian May (yes THAT Brian May) not only blogs, but talks about all kinds of challenging subjects. Like how and why space and time are linked. Worth a read. -m

Monday, February 23rd, 2009

How Orbo works

I’m (just barely) enough of a writer that I can spend cycles on Steorn‘s claims without being branded a crackpot. After all, the novel I’m working on involves a similar device being invented 4,000 years ago. It’s all research.

Imagine if Earth’s gravitational field, instead of being a constant 1.0G, rocked back and forth between 0.99G and 1.01G at some fixed interval. That’d be perhaps not enough to feel, but enough to extract “free energy”. Arrange a heavy weight on a wheel, and time it so that it moves downward (doing work) during the heavier phase and returns to the top during the lighter phase. You’d have more than perpetual motion, you would be able to extract real work out of the device on a continuous basis.

Steorn’s claims are similar, but with permanent magnets instead of gravity.

Orbo is based upon time variant magnetic interactions, i.e. magnetic interactions whose efficiency varies as a function of transaction timeframes.

I get the feeling that they are being very, very careful about what they write. In particular, the word “efficiency” is very odd in this sentence. In my earlier example, it would sound unnatural to talk about the “efficiency of the gravitational interaction”. Unless one talks about the kinds of efficiency that go above 100%…. So let’s roll with it.

It is this variation of energy exchanged as a function of transaction time frame that lies at the heart of Orbo technology, and its ability to contravene the principle of the conservation of energy. Why? Conservation of energy requires that the total energy exchanged using interactions are invariant in time. This principle of time invariance is enshrined in Noether’s Theorem.

So some hitherto unknown process temporarily nudges a magnetic interaction in one direction, only for it to bounce back in the opposite direction, like in the gravity example. Get the timing right and presto, free energy. I don’t understand why they are so cavalier about “contravening” the principle of conservation of energy though. It seems to me that more observations would be in order. As in “the device produced 100 watts for 6 months straight, with no input power sources”–which could be true in various ways that don’t contravene conservation of energy. It’s almost as if they are deliberately being provocative in their statements. Go figure. -m

Wednesday, February 18th, 2009

French, British nuclear subs collide

Honestly, I don’t even need to write a punchline for this one, it sounds so much like the setup of a Monty Python-esque joke. Give it your best shot in the comments… -m

Tuesday, December 9th, 2008

XML 2008 liveblog: Automating Content Analysis with Trang and Simple XSLT Scripts

Bob DuCharme, Innodata Isogen

Content analysis: why? You’ve “inherited” content. Need to save time or effort.

Handy tool 1: “sort”. As in the Unix command line tool. (Even Windows)

Handy tool 2: “uniq -c”  (flag -c means include counts)

Elsevier contest: interface for reading journals. Download a bunch of articles, and see what’s all in there.

Handy tool 3: Trang. Schema language converter. But can infer a schema from one or more input documents. Concat all sample documents under one root, and infer–this gives a list of all doctypes in use.

trang article.dtd article.rng
trang issueContents.xml issueContents.rng
saxon article.rng compareElsRNG.xsl | sort > compareElsRNG.out

compareElsRNG.xsl has text mode output, ignores input text nodes, and checks whether the RNG has references to each element, outputing “Yes: elementname” or “No: elemenname”. (which gets sorted in step 3)

Helps ferret out places where the schema says 40 different child elements are possible but in practice only 4 are used.

Handy tool 4: James Clark’s sx, converts SGML to XML.

Another stylesheet counts elements producing a histogram. [Ed. I would do this in XQuery in CQ.] Again, can help prioritize parts of the XML to use first. Similar logic for parent/child counts; where @id gets used; find all values for a particular attribute.

Another stylesheet goes through multiple converted-to-rng schemas, looking for common substructure. Lists generated this way can be pulled into a stylesheet.

Analyze a SGML DTD? dtd2html -> tidy -> XSLT. Clients like reports (especially spreadsheets). The is more like lego bricks.

-m

Tuesday, December 9th, 2008

XML 2008 liveblog: Using RDFa for Government Information

Mark Birbeck, Web Backplane.

Problem statement: You shouldn’t have to “scrape” government sites.

Solution: RDFa

<div typeof="arg:Vacancy">
  Job title: <span property="dc:title">Assistant Officer</span>
  Description: <span property="dc:description">To analyse... </span>
</div>

This resolves to two full RDF triples. No separate feeds, uses existing publishing systems. Two of the most ambitious RDFa projects are taking place in the UK. Flexible arrangements possible.

Steps: 1. Create vocabulary. 2. Create demo. 3. Evangelize.

Vocabulary under Google Code: Argot Hub. Reuse terms (dc:title, foaf:name) where possible, developed in public.

Demos: Yahoo! SearchMonkey, (good for helping not-so-technical people to “get it”) then a Drupal hosted one (a little more control).

Next level, a new server that aggregates specific info (like all job openeings for Electricians), incuding geocoding. Ubiquity RDFa helps here.

Evangelizing: Detailed tutorials. Drupal code will go open source. More opportunities with companies currently screen-scrapting. More info @ rdfa.info.

Q&A: Asking about predicate overloading (dc:title). A general SemWeb issue. Context helps. Is RDFa tied to HTML? No, SearchMonkey itself uses RDFa–it’s just attributes.

-m

Tuesday, December 9th, 2008

XML 2008 liveblog: Sentiment Analysis in Open Source Information for the US Government

Ronald Reck, SAP; Kenneth Sall, SAIC

“I wish I knew when people were saying bad things about me.” Sentiment analysis. Kapow used initially. From 800k news articles (from 1996 and 1997), extracted 450M RDF assertions. The 13 Reuters standard metadata elements not used in this case. Used Redland for heavy RDF lifting. Inxight ThingFinder (commercial) for entity extraction, supplemented with enumerated lists (Bush Cabinet, Intellegence Agencies, negative adjectives, positive admire verbs, etc.) End result was RDF/XML.

(Kenneth takes the mic) SPARQL Sentiment Query Web UI. Heavy SPARQL ahead… Redland hasn’t implemented the UNION operator yet, making the examples more convoluted.

PREFIX sap: <http://iama.rrecktek.com/ont/sap#>
SELECT ?ent ?type ?name
WHERE {
?ent sap:Method "Name Catalog" .
?ent sap:Type ?type .
?ent sap:Name ?name
}

Difficult learning curve. Need ability to do substring from entity URI -> article URI.

Next steps: current news stories. Leverage existing metadata. RDF at the sentence level. Improve name catalogs. Use rule-based pattern matching engine. Slides.

-m

Sunday, November 9th, 2008

Bronze Medal at California State Homebrew Competition

I won a bronze medal (white ribbon actually) in the Mixed Styles category for my Dusseldorf Altbier, the first non-mead-related beverage I’ve ever entered. It’s a deep copper-colored ale made with a special Alt yeast and with a strong balance of clean malt and hops. There are very few bottles of it left at this point.

The competition itself was a blast–I got to spend the day judging barleywines, including a spectacular one that went on to win the category. Official results should be posted soon. -m

Thursday, November 6th, 2008

Geek Thoughts: “I screwed up”

A special comment. My most vivid memory of my late Grandpa.

Even after retiring, Grandpa needed to do small jobs around town to make ends meet. One was cleaning a small sporting goods store. Once, with all the excitement of visiting family from out of town (that would be us), he forgot to clean one night. The next day, the shopkeeper was understandably irate, and waited around to speak face-to-face. “I screwed up,” Grandpa simply said. A short exchange followed, with typical Midwestern bluntness and politeness, the resolution being that it would never happen again. The shop got exceptionally well cleaned that day. Crisis averted.

That’s been a powerful lesson for me. When you screw something up, admit it, fix it the best you can, and move on.

-m

P.S. No, I haven’t done any major screw-ups lately, at least any that I know about. I was reminded of this by a much-publicized interview on Letterman.

Friday, October 24th, 2008

Online etymology database

I’ve been playing lately with this site, and it’s a fantastic resource. The word carboy probably comes from Persian qarabah “large flagon.” Who knew? -m

Monday, October 6th, 2008

To any recently downsized eBayers

I know what it’s like to be laid off, I’ve been through it twice. If you need help connecting up with a new gig, whether at MarkLogic or a hand-off to one of the zillion headhunters that constantly harry me, let me know. Send me email and I’ll do what I can. -m

MicahLogic is Stephen Fry proof thanks to caching by WP Super Cache