Big Data and Small: Collaborations between ethnographers and data scientists

This article first appeared in Big Data and Society journal published by Sage and is licensed by the author under a Creative Commons Attribution license. [PDF]

Abstract

In the past three years, Heather Ford—an ethnographer and now a PhD student—has worked on ad hoc collaborative projects around Wikipedia sources with two data scientists from Minnesota, Dave Musicant and Shilad Sen. In this essay, she talks about how the three met, how they worked together, and what they gained from the experience. Three themes became apparent through their collaboration: that data scientists and ethnographers have much in common, that their skills are complementary, and that discovering the data together rather than compartmentalizing research activities was key to their success.

Continue reading “Big Data and Small: Collaborations between ethnographers and data scientists”

DataEDGE: A conversation about the future of data science

First posted at the Google Policy blog.

With all the hype around “Big Data” lately, you may be inclined to shrug it off as a business fad. But there is more to it than a buzzword. Data science is emerging as a new field, changing the ways that companies get to know their customers, governments their citizens, and relief organizations their constituents. It is a field which will demand entirely new skill sets and information professionals trained to collect, curate, combine, and analyze massive amounts of data.

Today, we create data both actively—as we socialize, conduct business, and organize online—and passively—via a host of remote sensing devices. McKinsey projects a 40% growth in global data generated annually. Companies and organizations are racing to find new ways to make sense of this data and use it to drive decision-making. In the health sector, that includes investigating the clinical and cost effectiveness of new drugs using large datasets. (McKinsey estimates that the efficient and effective use of data could provide as much as $300 billion in value to the United States healthcare sector.) In the public sector, it could mean using historical unemployment data to reduce the amount of time it takes unemployed workers to find new employment. And in the retail sector, it leads to tools that helps suppliers understand demand in stores so they know when they should restock items. Continue reading “DataEDGE: A conversation about the future of data science”