{"id":968,"date":"2012-04-15T20:20:35","date_gmt":"2012-04-16T03:20:35","guid":{"rendered":"https:\/\/dubinko.info\/blog\/?p=968"},"modified":"2012-04-15T20:20:35","modified_gmt":"2012-04-16T03:20:35","slug":"actually-using-big-data","status":"publish","type":"post","link":"https:\/\/dubinko.info\/blog\/2012\/04\/actually-using-big-data\/","title":{"rendered":"Actually using big data"},"content":{"rendered":"<p>I&#8217;ve been thinking a lot about big data, and two recent items nicely capture a slice of the discussion.<\/p>\n<blockquote><p>1) Alex Milowski recounting working with <a title=\"Experiments with Big Weather Data in MarkLogic - Doomed Approach\" href=\"http:\/\/www.milowski.com\/journal\/entry\/2012-04-13T15:49:24.758-07:00\/\">Big Weather Data<\/a>. He concludes that &#8216;naive&#8217; (as-is) data loading is a &#8220;doomed&#8221; approach. Even small amounts of friction add up at scale, so you should plan on doing som in-situ cleanup. He came up with a slick solution in MarkLogic&#8211;go read his post for details.<\/p>\n<p>2) Chris Dixon on <a title=\"There are two ways to make large datasets useful\" href=\"http:\/\/cdixon.org\/2012\/04\/14\/there-are-two-ways-to-make-large-datasets-useful\/\">Making Large Datasets Useful<\/a>. Typical approaches like machine learning only solve 80-90% of the problem. So you need to either live with errorful data, or invoke manual clean-up processes.<\/p><\/blockquote>\n<p>Both worth a read. There&#8217;s more to say, but I&#8217;m not ready to tip my hand on a paper I&#8217;m working on&#8230;<\/p>\n<p>-m<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been thinking a lot about big data, and two recent items nicely capture a slice of the discussion. 1) Alex Milowski recounting working with Big Weather Data. He concludes that &#8216;naive&#8217; (as-is) data loading is a &#8220;doomed&#8221; approach. Even small amounts of friction add up at scale, so you should plan on doing som&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[23,113],"tags":[1084,1085,1087,1086],"class_list":["post-968","post","type-post","status-publish","format-standard","hentry","category-announcement","category-mark-logic","tag-bigdata","tag-dataset","tag-manual","tag-ml"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8eo8l-fC","_links":{"self":[{"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/posts\/968","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/comments?post=968"}],"version-history":[{"count":1,"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/posts\/968\/revisions"}],"predecessor-version":[{"id":969,"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/posts\/968\/revisions\/969"}],"wp:attachment":[{"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/media?parent=968"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/categories?post=968"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dubinko.info\/blog\/wp-json\/wp\/v2\/tags?post=968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}