Thursday, September 18, 2014

You've Got the Data - Now What?

Collecting and aggregating data is a fairly straightforward process.  Yes, you need the servers, the databases, the storage - but these days the storage infrastructure is less expensive and easier to build than ever before.  Cloud storage, and even cloud clustered computing solutions make it reasonable for anyone to store massive amounts of data and the horsepower to crunch that data.  But then what?  What do you do with all that data you've collected?

One approach is to turn to the crowd.  Instead of having 10 sets of eyes looking at it, figure out a way to segment the data, define rules for what you're trying to see, and distribute the task of looking to possibly billions of Web users.  The Chronicle of Higher Education profiled one such attempt - Alexander S. Szalay challenged the status quo on sharing astronomy data.  He began by stitching together observations from multiple telescopes to provide a clearer picture of the sky, but he realized he needed more eyes looking at all the data he was collecting.


Mr. Szalay's experience led to the creation of Galaxy Zoo, a digital catalog of images from the Sloan Digital Sky survey.  A brief tutorial teaches visitors what to look for and multiple independent observers must agree on the classification of an image to include it.  When the site opened in 2007 so many people used it the servers actually overheated and caught fire and more than 270,000 people have signed up to classify galaxies so far.  One even found a highly unusual object, so significant the Hubble telescope was tasked with observing it.


Collecting and aggregating data is fairly straightforward process.  Getting a thousand, ten thousand, or a hundred thousand sets of eyes searching data is brilliant.  Give users a brief primer and let them perform triage on the data.  The volunteers don't need to know everything - they just need enough to find things that are out of place.  Those objects can be passed to the trained scientists to name, study and dissect.  Opportunities abound - anywhere you can build a simple, straightforward search rule - you can tap into the dynamic horsepower of countless human minds to look for patterns, saving time for the professional researchers to look at candidate objects.  Win!


Source: Chronicle article - http://chronicle.com/article/The-Rise-of-Crowd-Science/65707/

No comments:

Post a Comment