Tuesday, October 7, 2014

Don’t Fall in Love Just BECAUSE It’s Big Data

In the YouTube video ‘The Data Scientist’s Toolbox’, one of the salient statements for me was “Be careful not to fall in love with it just BECAUSE it’s Big Data“.  This week I came across Kaleev Leetaru’s ForeignPolicy.com article Why Big Data Missed the Early Warning Signs of Ebola.  Mr. Leetaru begins with media reports citing Harvard’s HealthMap program picking up early reports of a mystery hemorrhagic fever 9 days before the World Health Organization.  He goes on to state that HealthMap was simply picking up on tweets and retweets of a newswire article (in French) reporting on a press conference held by the Guinea Department of Health.  Further, he criticizes the program for not being able to read anything but English.  He’s missing the point.

To begin with, there have been at least 10 outbreaks of Ebola in the past 10 years, so early detection of this disease was more of a test case for the software.  I don’t think anybody was surprised by an Ebola outbreak in March when the Harvard program detected chatter.  Certainly this outbreak has become a major new story with global implications, but in March – it was just another Ebola story.

What I think Mr. Leetaru is really missing is that despite it catching reporting of official statements – the program worked!  The news was detected, the system mined data from multiple sources and detected something unusual.  Could it be improved upon?  Certainly.  Is translation a missing component?  No doubt.  Still, think of the success of the program.  Health officials at the CDC could be notified days before the WHO report was able to work it’s way through bureaucracies.  Virologists who specialize in hemorrhagic fevers could be notified and placed on alert or begin communicating with colleagues in African countries.

The title Why Big Data Missed the Early Warning Signs of Ebola is misleading.  It didn’t miss the signs, it picked them up – it just happened that it picked up on official, remote, regional, page 52 below the fold news items – and in this, it succeeded.  They Harvard team is doing ground-breaking work.  If it’s not perfect yet we cannot fault them – imagine what their software will be doing in 2 years.  in 5?  In 10?  They are creating a system that will one day (in the not distant future) link what seem like unrelated news stories into the beginnings of new epidemics.  They will provide researchers invaluable data on when and where outbreaks began, helping to more quickly locate patient 0s and determine the source of infections.

Big Data didn’t miss the signs, it just wasn’t the first one to see them.  One of these days – it will be.

No comments:

Post a Comment