Thursday, September 25, 2014

Visualizing Data

Hans Rosling is one of those people, those magnificent geniuses who see the world differently - and want to force the rest of us to see things from his perspective.  He is a 'statistics guru' who uses data to truly understand the world.  Understanding data is one thing, getting average people to understand what you've spent an entire career trying to figure out - that's no easy feat.  Dr. Rosling is a showman - giving his audience on a visual roller coaster ride through data.  His presentations are fun, and the visual presentations are both convincing and easy to understand.  He draws his pictures so clearly that  it's difficult to argue with his conclusion.

Hans Rosling: The best stats you've ever seen | TED Talks

We can use Dr. Rosling's techniques to more clearly and cleanly explain data.  For instance, reporting Admission trends or current headcount, many business analysts would present a complex spreadsheet to decision makers.  That data is great, and when they have time - that might just be the kind of details they're looking for.  Often times, senior managers are busy and simply wants a digestible bite of data.  In these cases a dashboard provides quick access to decision-making data in the way key stakeholders can use daily to understand the current state of data.  They say, "a picture is worth a thousand words" - visual data provides that same kind of compression and gives us the tools to understand data without having to understand all the underlying business rules.



Thursday, September 18, 2014

You've Got the Data - Now What?

Collecting and aggregating data is a fairly straightforward process.  Yes, you need the servers, the databases, the storage - but these days the storage infrastructure is less expensive and easier to build than ever before.  Cloud storage, and even cloud clustered computing solutions make it reasonable for anyone to store massive amounts of data and the horsepower to crunch that data.  But then what?  What do you do with all that data you've collected?

One approach is to turn to the crowd.  Instead of having 10 sets of eyes looking at it, figure out a way to segment the data, define rules for what you're trying to see, and distribute the task of looking to possibly billions of Web users.  The Chronicle of Higher Education profiled one such attempt - Alexander S. Szalay challenged the status quo on sharing astronomy data.  He began by stitching together observations from multiple telescopes to provide a clearer picture of the sky, but he realized he needed more eyes looking at all the data he was collecting.


Mr. Szalay's experience led to the creation of Galaxy Zoo, a digital catalog of images from the Sloan Digital Sky survey.  A brief tutorial teaches visitors what to look for and multiple independent observers must agree on the classification of an image to include it.  When the site opened in 2007 so many people used it the servers actually overheated and caught fire and more than 270,000 people have signed up to classify galaxies so far.  One even found a highly unusual object, so significant the Hubble telescope was tasked with observing it.


Collecting and aggregating data is fairly straightforward process.  Getting a thousand, ten thousand, or a hundred thousand sets of eyes searching data is brilliant.  Give users a brief primer and let them perform triage on the data.  The volunteers don't need to know everything - they just need enough to find things that are out of place.  Those objects can be passed to the trained scientists to name, study and dissect.  Opportunities abound - anywhere you can build a simple, straightforward search rule - you can tap into the dynamic horsepower of countless human minds to look for patterns, saving time for the professional researchers to look at candidate objects.  Win!


Source: Chronicle article - http://chronicle.com/article/The-Rise-of-Crowd-Science/65707/

Tuesday, September 9, 2014

Considering Data

In Asimov's Foundation novels, the character Hari Seldon develops a new science called 'psychohistory' which allows him to predict the future for large scale events. When I first read those books, they seemed fantastical - no closer to reality than Gulliver's Lilliputians. In 2008, Nate Silver's fivethirtyeight.com called not only the presidential election, but every Senate seat. How did he do it? Data. Months before the election Silver had a statistical model that proved eerily accurate and weeks before the election had predictions that would give even Asimov goose-flesh.

Where does fivethirtyeight.com think the next generation of data miners will come from? UCLA's DataFest. Give students 48 hours locked away with data and have them try to make sense of it. To sweeten the deal, have the data owners present.  To quote fivethirtyeight.com's article, “It’s really rare to get really current data actually being used in the real-live corporate world, so that makes it really special,” Robert Gould, DataFest’s founder and a professor of statistics at UCLA, the host for this weekend’s event, said in a phone interview. “Somehow it’s just not that thrilling for the students to learn all we’ve done is point them to a public data set. There’s something really special to have someone who owns the data present the challenge. It makes the students feel they’re being paid attention to and listened to.

What a fantastic idea.  Create a relatively low pressure environment, provide loads of data, and just see what undergraduate students can make of it.  I've been working with data for so many years, and in such predictable ways - I wonder sometimes if I can still see outside of the box.  Big data can show us so much, for good and bad.  We've all read the story of Target figuring out a teen was pregnant before her father even knew after reviewing data on what newly expectant mothers were likely to buy.  That's the creepier side of big data.  Imagine aggregating statistics on childhood cancers, incidents of asthma, the spread of disease, or electricity usage in a region.  The next few years promise to be exciting times.