Actually using big data

By : mdubinko April 15, 2012

I’ve been thinking a lot about big data, and two recent items nicely capture a slice of the discussion.

1) Alex Milowski recounting working with Big Weather Data. He concludes that ‘naive’ (as-is) data loading is a “doomed” approach. Even small amounts of friction add up at scale, so you should plan on doing som in-situ cleanup. He came up with a slick solution in MarkLogic–go read his post for details.

2) Chris Dixon on Making Large Datasets Useful. Typical approaches like machine learning only solve 80-90% of the problem. So you need to either live with errorful data, or invoke manual clean-up processes.

Both worth a read. There’s more to say, but I’m not ready to tip my hand on a paper I’m working on…

-m

bigdata

Proudly powered by WordPress | Theme: Shree Clean by Canyon Themes.

Actually using big data

Related Posts

Open to work: What I’m Looking For

Cleaning Data by David Mertz now available