Full disclosure: Diary of an internet geography project #1

Reblogged from ‘Connectivity, Inclusivity and Inequality

Screen Shot 2014-07-10 at 12.28.58 PMOII research fellow, Mark Graham and DPhil student, Heather Ford (both part of the CII group) are working with a group of computer scientists including Brent HechtDave Musicant and Shilad Sen to understand how far Wikipedia has come to representing ‘the sum of all human knowledge’. As part of the project, they will be making explicit the methods that they use to analyse millions of data records from Wikipedia articles about places in many languages. The hope is that by experimenting with a reflexive method of doing multidisciplinary ‘big data’ project, others might be able to use this as a model for pursuing their own analyses in the future. This is the first post in a series in which Heather outlines the team’s plans and processes.  

It was a beautiful day in Oxford and we wanted to show our Minnesotan friends some Harry Pottery architecture, so Mark and I sat on a bench in the Balliol gardens while we called Brent, Dave and Shilad who are based in Minnesota for our inaugural Skype meeting. I have worked with Dave and Shilad on a paper about Wikipedia sources in the past, and Mark and Brent know each other because they both have produced great work on Wikipedia geography, but we’ve never all worked together as a team. A recent grant from Oxford University’s John Fell Fund provided impetus for the five of us to get together and pool efforts in a short, multidisciplinary project that will hopefully catalyse further collaborative work in the future.

In last week’s meeting, we talked about our goals and timing and how we wanted to work as a team. Since we’re a multidisciplinary group who really value both quantitative and qualitative approaches, we thought that it might make sense to present our goals as consisting of two main strands: 1) to investigate the origins of knowledge about places on Wikipedia in many languages, and 2) to do this in a way that is both transparent and reflexive.

In her eight ‘big tent’ criteria for excellent qualitative research, Sarah Tracy (2010, PDF) includesself-reflexivity and transparency in her conception of researcher ‘sincerity’. Tracy believes that sincerity is a valuable quality that relates to researchers being earnest and vulnerable in their work and ‘considering not only their own needs but also those of their participants, readers, coauthors and potential audiences’. Despite the focus on qualitative research in Tracy’s influential paper, we think that practicing transparency and reflexivity can have enormous benefits for quantitative research as well but one of the challenges is finding ways to pursue transparency and reflexivity as a team rather than as individual researchers.


Tracy writes that transparency is about researchers being honest about the research process.

‘Transparent research is marked by disclosure of the study’s challenges and unexpected twists and turns and revelation of the ways research foci transformed over time.’

She writes that, in practice, transparency requires a formal audit trail of all research decisions and activities. For this project, we’ve set up a series of Google docs folders for our meeting agendas, minutes, Skype calls, screenshots of our video call as well as any related spreadsheets and analyses produced during the week. After each session, I clean up the meeting minutes that we’ve co-produced on the Google doc while we’re talking, and write a more narrative account about what we did and what we learned beneath that.

Although we’re co-editing these documents as a team, it’s important to note that, as the documenter of the process, it’s my perspective that is foregrounded and I have to be really mindful of this as reflect what happened. Our team meetings are occasions for discussion of the week’s activities, challenges and revelations which I try to document as accurately as possible, but I will probably also need to conduct interviews with individual members of the team further along in the process in order to capture individual responses to the project and the process that aren’t necessarily accommodated in the weekly meetings.


According to Tracy, self-reflexivity involves ‘honesty and authenticity with one’s self, one’s research and one’s audience’. Apart from the focus on interrogating our own biases as researchers, reflexivity is about being frank about our strengths and weaknesses, and, importantly, about examining our impact on the scene and asking for feedback from participants.

Soliciting feedback from participants is something quite rare in quantitative research but we believe that gaining input from Wikipedians and other stakeholders can be extremely valuable for improving the rigor of our results and for providing insight into the humans behind the data.

As an example, a few years ago when I was at a Wikimedia Kenya meetup, I asked what editorsthought about Mark Graham’s Swahili Wikipedia maps. One respondent was immediately able to explain the concentration of geolocated articles from Turkey because he knew the editor who was known as a specialist of Turkey geography stubs. Suddenly the map took on a more human form — a reflection of the relationships between real people trying to represent their world. More recently, a Swahili Wikipedians contacted Mark about the same maps and engaged him in a conversation about how they could be made better. Inspired by these engagements, we want to really encourage those conversations and invite people to comment on our process as it evolves. To do this, we’ll be blogging about the progress of the project and inviting particular groups of stakeholders to provide comments and questions. We’ll then discuss those comments and questions in our weekly meetings and try to respond to as many of them as possible in thinking about how we move the analysis forward.

In conclusion, transparency and reflexivity are two really important aspects of researcher sincerity. The challenge with this project is trying to put this into practice in a quantitative rather than qualitative project, a project driven by a team rather than an individual researcher. Potential risks are that I inaccurately report on what we’re doing, or expose something about our process that is considered inappropriate. What I’m hoping is that we can mark these entries clearly as my initial, necessarily incomplete reflections on our process and that this can feed into the team’s reflections going forward. Knowing the researchers in the team and having worked with all of them in the past, my goal is to reflect the ways in which they bring what Tracy values in ‘sincere’ researchers: the empathy, kindness, self-awareness and self deprecation that I know all of these team members display in their daily work.

WikiSym Redefined

Ward Cunningham, inventor of the wiki, at the first WikiSym in 2005 which was co-located with ACM OOPSLA in San Diego, California. Pic by Peter Kaminski CC BY on Flickr.
Ward Cunningham, inventor of the wiki, at the first WikiSym in 2005 which was co-located with ACM OOPSLA in San Diego, California. Pic by Peter Kaminski CC BY on Flickr.

There has been much reflecting and soul-searching about the future of WikiSym in the past year (and probably before that as well). Many felt that the conference was becoming dominated by Wikipedia research and that it needed to grow to encompass more research in the open source, open data and open content realm. I felt that the conference needed to attract more social scientists and qualitative researchers in order to reach more detailed understanding of Wikipedia is being integrated into everyday life.

Despite the negatives, everyone felt that WikiSym was and still is the best place for people who do research about Wikipedia and other wikis to gather and that there was a lot of promise in broadening our mandate. This is why I feel so excited about co-chairing a new dedicated Wikipedia track at next year’s WikiSym in Hong Kong along with Mark Graham, also at the Oxford Internet Institute. And that’s why I was also happy that Dirk Riehle, veteren of WikiSym, is at the helm again next year, leading an effort to redesign the event around a changing research landscape.

There are a few key differences to next year’s event:

1. WikiSym 2013 will be held jointly with a new conference called ‘OpenSym’ and the entire event will consist of four tracks dedicated to different research trajectories:

  • Open collaboration (wikis, social media, etc.) research (WikiSym 2013), chaired by Jude Yew of National University of Singapore
  • Wikipedia research (WikiSym 2013), chaired jointly by myself and Mark Graham of the Oxford Internet Institute at the University of Oxford
  • Free, libre, and open source software research (OpenSym 2013), chaired jointly by Jesus M. Gonzalez-Barahona and Gregorio Robles of Universidad Rey Juan Carlos
  • Open access, data, and government research (OpenSym 2013), chaired by Anne Fitzgerald of Queensland University of Technology

This means that Mark and I can focus on getting the very best of Wikipedia research to WikiSym and in thinking hard about what is missing and what needs to be encouraged in the years to come. Continue reading “WikiSym Redefined”

Where does ethnography belong? Thoughts on WikiSym 2012

First posted at Ethnographymatters

On the first day of WikiSym last week, as we started preparing for the open space track and the crowd was being petitioned for new sessions over lunch, I suddenly thought that it might be a good idea for researchers who used ethnographic methods to get together to talk about the challenges we were facing and the successes we were having. So I took the mic and asked how many people used ethnographic methods in their research. After a few raised their hands, I announced that lunch would be spent talking about ethnography for those who were interested. Almost a dozen people – many of whom are big data analysts – came to listen and talk at a small Greek restaurant in the center of Linz. I was impressed that so many quantitative researchers came to listen and try to understand how they might integrate ethnographic methods into their research. It made me excited about the potential of ethnographic research methods in this community, but by the end of the conference, I was worried about the assumptions on which much of the research on Wikipedia is based, and at what this means for the way that we understand Wikipedia in the world. 

WikiSym (Wiki Symposium) is the annual meeting of researchers, practitioners and wiki engineers to talk about everything to do with wikis and open collaboration. Founded by the father of the wiki, Ward Cunningham and others, the conference started off as a place where wiki engineers would gather to advance the field. Seven years later, WikiSym is dominated by big data quantitative analyses of English Wikipedia.

Some participants were worried about the movement away from engineering topics (like designing better wiki platforms), while others were worried about the fact that Wikipedia (and its platform, MediaWiki) dominates the proceedings, leaving other equally valuable sites like Wikia and platforms like TikiWiki under-studied.

So, in the spirit of the times, I drew up a few rough analyses of papers presented.

It would be interesting to look at this for other years to see whether the recent Big Data trend is having an impact on Wikipedia research and whether research related to Wikipedia (rather than other open collaboration communities) is on the rise. One thing I did notice was that the demo track was a lot larger this year than the previous two years. Hopefully that is a good sign for the future because it is here that research is put into practice through the design of alternative tools. A good example is Jodi Schneider’s research on Wikipedia deletions that she then used to conceptualize alternative interfaces  that would simplify the process and help to ensure that each article would be dealt with more fairly. Continue reading “Where does ethnography belong? Thoughts on WikiSym 2012”