Big Data: Farming vs. Mining

Big Data is suffering under the weight of more breathless, dewy-eyed hype than any genuinely promising technology should ever have to bear. Sadly, most C-level healthcare executives, even the ones that know they want it, couldn’t give a quick and cogent description of what “it” is. Sadder still is the fact that the US, despite a golden, well-funded, once-in-a-generation chance to realize real benefit from Big Data analytics, is in danger of completely undermining its ability to do so.

What is at the heart of this issue? My sense is that there is a fundamental flaw in the metaphor that seems to underpin much of the current hype surrounding Big Data. There is a line of reasoning, usually postulated as if it is somehow self-evident, that Big Data is like Big Dirt: if you have enough of it there must be gold in there somewhere. The truth, however, is that usually a big pile of dirt is just a big pile of dirt. Purveyors of analytics products that pitch some sort of something-from-nothing value proposition are, most charitably, suggesting that they can mine gold from dirt regardless of the dirt. At its worst, some of the present marketing hype is more akin to data alchemy than data science.

Perhaps we need to revisit our Big Data metaphor. Rather than thinking about data mining, we should be thinking instead about data farming. As any farmer (or preacher) can tell you: you reap what you sow. Soberingly, the corollary is also true: you don’t reap what you don’t sow. How does this relate to Big Data analytics and to whether the US is frittering away a golden opportunity? To put it bluntly, it means that, for America to realize the promise of Big Data analytics in healthcare, there must be pervasive adoption of eHealth standards and HIEs.

Why should we care; what’s at stake? The 2010 President’s Council of Advisors on Science and Technology (PCAST) report describes what some believe may be the most impactful benefits to be realized out of America’s present $35 billion eHealth investment program:

“If the data gathered by healthcare providers and the decisions made at the point of care by providers and patients were gathered and aggregated, they could reveal patterns of illness in a community or nationally, identify potential epidemics at very early stages, enable comparisons of different treatments or medical devices in large and diverse populations, and evaluate the effectiveness of specific treatments and make information about hospitals, physicians, and other providers more comprehensive and accurate.”

In healthcare, we will not be able to reap the benefits of powerful analytic techniques if we do not, today, sow the seeds of our success. How do we do this? The 2010 PCAST report outlines the two necessary prerequisites:

“The first is the adoption by providers of interoperability standards that enable data to be shared across institutions. The second is the creation of network infrastructure and administration that enable distributed data to be indexed and accessed subject to appropriate data access restrictions.”

Tellingly – without these two prerequisites, there is no “network effect” to leverage. No network effect means no analytics-supported population health pay-off. It is as simple as that.

So, where is the danger; we’re making progress, aren’t we? To be fair – there has been a lot of investment and physician EMR adoption is increasing and there is more eHealth message traffic than there used to be. But for any of us who are watching the shift in America’s eHealth agenda, there remains a sobering and disquieting question. Does Direct really satisfy the second of the two necessary prerequisites?


Derek Ritz is the principal at ecGroup Inc., a boutique consultancy that provides advisory services to public and private sector clients regarding eHealth strategy, architecture, implementation and adoption.