The politics of truth: Who wins on Wikipedia? A study of what Wikipedia deletes and who it bans

Below is the research proposal that I wrote when I applied to the Oxford Internet Institute (OII) DPhil Programme in November last year. I’m guessing it’s going to evolve some (especially since I’m wanting to add some statistical work surrounding citations and translations between languages), but I’m really excited about it as it stands. The wonderful Dr Mark Graham is my supervisor at the OII and I’m lucky to also have Dr Chris Davies as my college advisor (I’m at Kellogg College here). Thank you to the OII for putting me forward for the Clarendon Award and to one of my heros, Bishop Desmond Tutu, for inspiring part of the award that got me here. Thanks, lastly and mostly, to Dror for inspiring me 🙂 With all these thanks it sounds like I’m at the end. But it’s only the beginning. I’m looking forward to comments and suggestions on how I might discover the answers to this question. I think I’ll certainly hear them in the months and years to come.

Download as PDF

Abstract: Wikipedia is, in many ways, the poster child of the Internet Age. It has been singled out as the ultimate working example of the collaborative power of the Internet (Shirky, Tapscott) and what Yochai Benkler calls ‘commons-based peer production’ to describe how the Internet has created radical new opportunities for how we make and exchange information, knowledge, and culture (Benkler, 2009). Part of its popularity comes from its power to influence and inform. As the sixth largest website in the world, with over million users and 90,000 active editors, Wikipedia is becoming one of the most influential reference works in history.

For every broad statement about Wikipedia, however, there are examples on the ground that hint at an alternative reality. The ideal that commentators (many of whom are not involved in editing the encyclopaedia on a daily basis) project is of a unified group of rational, detached, individual editors building a neutral, free encyclopaedia that is “the sum of all human knowledge”. But the organic nature of the encyclopaedia, its culture, politics and architecture have produced and continue to produce an encyclopaedia in which particular tactics, identities and relationships, many of which are in defiance of original rules, often prevail over reasoned and rational dialogue. Wikipedia still has a number of “dark spots”: from uneven geographies of articles written about places (Graham, 2011), to low numbers of female contributors (Lam et al, 2011) and vastly different levels of quality (Duguid, 2006). But there are other dark spots too – spots within the encyclopaedia itself: knowledges that are silenced, perspectives that are marginalised and people that are banned.

Who wins and who loses in this open environment? How do culture, politics, regulations, architecture and identity influence who wins or loses? And what does this mean for the way we think about online collaboration, its power and pitfalls? Continue reading “The politics of truth: Who wins on Wikipedia? A study of what Wikipedia deletes and who it bans”

Where does ethnography belong? Thoughts on WikiSym 2012

First posted at Ethnographymatters

On the first day of WikiSym last week, as we started preparing for the open space track and the crowd was being petitioned for new sessions over lunch, I suddenly thought that it might be a good idea for researchers who used ethnographic methods to get together to talk about the challenges we were facing and the successes we were having. So I took the mic and asked how many people used ethnographic methods in their research. After a few raised their hands, I announced that lunch would be spent talking about ethnography for those who were interested. Almost a dozen people – many of whom are big data analysts – came to listen and talk at a small Greek restaurant in the center of Linz. I was impressed that so many quantitative researchers came to listen and try to understand how they might integrate ethnographic methods into their research. It made me excited about the potential of ethnographic research methods in this community, but by the end of the conference, I was worried about the assumptions on which much of the research on Wikipedia is based, and at what this means for the way that we understand Wikipedia in the world. 

WikiSym (Wiki Symposium) is the annual meeting of researchers, practitioners and wiki engineers to talk about everything to do with wikis and open collaboration. Founded by the father of the wiki, Ward Cunningham and others, the conference started off as a place where wiki engineers would gather to advance the field. Seven years later, WikiSym is dominated by big data quantitative analyses of English Wikipedia.

Some participants were worried about the movement away from engineering topics (like designing better wiki platforms), while others were worried about the fact that Wikipedia (and its platform, MediaWiki) dominates the proceedings, leaving other equally valuable sites like Wikia and platforms like TikiWiki under-studied.

So, in the spirit of the times, I drew up a few rough analyses of papers presented.

It would be interesting to look at this for other years to see whether the recent Big Data trend is having an impact on Wikipedia research and whether research related to Wikipedia (rather than other open collaboration communities) is on the rise. One thing I did notice was that the demo track was a lot larger this year than the previous two years. Hopefully that is a good sign for the future because it is here that research is put into practice through the design of alternative tools. A good example is Jodi Schneider’s research on Wikipedia deletions that she then used to conceptualize alternative interfaces  that would simplify the process and help to ensure that each article would be dealt with more fairly. Continue reading “Where does ethnography belong? Thoughts on WikiSym 2012”

Beyond reliability: An ethnographic study of Wikipedia sources

First published on and 

Almost a year ago, I was hired by Ushahidi to work as an ethnographic researcher on a project to understand how Wikipedians managed sources during breaking news events. Ushahidi cares a great deal about this kind of work because of a new project called SwiftRiver that seeks to collect and enable the collaborative curation of streams of data from the real time web about a particular issue or event. If another Haiti earthquake happened, for example, would there be a way for us to filter out the irrelevant, the misinformation and build a stream of relevant, meaningful and accurate content about what was happening for those who needed it? And on Wikipedia’s side, could the same tools be used to help editors curate a stream of relevant sources as a team rather than individuals?

Original designs for voting a source up or down in order to determine “veracity”

When we first started thinking about the problem of filtering the web, we naturally thought of a ranking system which would rank sources according to their reliability or veracity. The algorithm would consider a variety of variables involved in determining accuracy as well as whether sources have been chosen, voted up or down by users in the past, and eventually be able to suggest sources according to the subject at hand. My job would be to determine what those variables are i.e. what were editors looking at when deciding whether to use a source or not? Continue reading “Beyond reliability: An ethnographic study of Wikipedia sources”

What does it mean to be a participant observer in a place like Wikipedia?

This post first appeared on Ethnography Matters on May 1.

The vision of an ethnographer physically going to a place, establishing themselves in the activities of that place, talking to people and developing deeper understandings seems so much simpler than the same activities in multifaceted spaces like Wikipedia. Researching how Wikipedians manage and verify information in rapidly evolving news articles in my latest ethnographic assignment, I sometimes wish I could simply to go the article as I would to a place, sit down and have a chat to the people around me.

Wikipedia conversations are asynchronous (sometimes with whole weeks or months between replies among editors) and it has proven extremely complicated to work out who said what when, let alone contact and to have live conversations with the editors. I’m beginning to realise how much physical presence is a part of the trust building exercise. If I want to connect with a particular Wikipedia editor, I can only email them or write a message on their talk page, and I often don’t have a lot to go on when I’m doing these things. I often don’t know where they’re from or where they live or who they really are beyond the clues they give me on their profile pages. Continue reading “What does it mean to be a participant observer in a place like Wikipedia?”

New geographies

Cross-posted from Ethnography Matters

xkcd’s Updated Map of Online Communities

I arrived in Nairobi last night after an absence of about five years. As I left the plane through the walkway, I took a deep breath and inhaled the familiar southern African smell that I always miss so much living in America. I walked through to customs and baggage claim and to my taxi and hotel and became aware of all the things I was noticing: my slight frustration at the absence of instructions about which line to stand in at the immigration hall; the fact that there was not enough room for my place of birth in the immigration paperwork; the fact that, in stark contrast to the Amsterdam Schiphol Airport that I had come from, this airport seems not to have changed in a decade or so.

I noticed how long we had to wait for our bags to come through, the nationalities of the people coming here, how closely they stood next to one another. And my driver, patiently waiting for me, familiar sign in hand. On the car ride to the hotel, I looked at billboards and noticed what was being advertised and who was being represented, the state of repair of the roads and the roadside flowers and how people drive and the smells of food and industry and bodies.

I thought: Is this the collection of noticings that constitutes a place? And if what defines a place is its signposts, its boundaries, the taken-for-granted ways of doing things, the expected and the unexpected, what are the equivalents in online spaces? How do we know that we have left one space and arrived at another? How does the experience of outsiders (or n00bs) differ from that of locals?

This new way of thinking about social media (new for me, at least) came about when I was asked to speak at a conference about the ‘crucial role of social media’ in the Middle East and elsewhere. Buried in the description of the session was the question: ‘Does what happened in the London Riots diminish the power of social media?’ As I thought about what to say and what was expected of me, it struck me that the problem with the current way questions around social media are framed is that they require defining technological artefacts as good or bad, when it might be more appropriate to talk about technology as a place where good and bad things can, and do, happen.

If we frame social media as places, we can understand more fully the role of people in those places, rather than talking about the technical characteristics of Facebook or Wikipedia as determining a particular type of behaviour. Looking only at the “bad” privacy features of Facebook, for example, we are tempted to assume that “privacy is dead” because of the “forced sharing” that is happening through changes in the technology. But this view fails to represent the ways that people self-censor or move to more intimate spaces in order to protect their privacy, something I noticed in my study of privacy in an educational context, for example.

Framing social media as places enables us to realise how we move between platforms (for example, Facebook and Google+) not only because of the new shiny gadgets we find there, but because of the people who inhabit those spaces. It is the flow of people and practices that defines the place as much as it is its landscape and architectural features. Facebook, for example, is defined by particular boundaries (my page, your page, a photograph that belongs to a particular group), taken-for-granted ways of doing things that define deviance and compliance among particular groups (don’t friend your teacher, don’t send too many updates and flood your friends’ streams, don’t tag drunk pictures of friends) and artefacts (the activity stream, wall and photo albums) that, taken together, define the place.

It seems kind of obvious when you think about it, and it isn’t a new way of thinking about technology: we’ve been talking about going online and migrating from different operating systems for a while. But the fact that we’re surprised that Google+ isn’t currently teeming with people, or that more Kenyans aren’t contributing to Swahili Wikipedia, or that women make up such a small percentage of Wikipedia edits suggests that we are thinking too much of social media as things rather than as places. If we thought about Google+ as a big, shiny, new complex, we’d begin to understand that people won’t necessarily move there just because the technology is better when few of their friends are there.

The key aspect that we miss in thinking of social sites as technological artefacts is that we tend to ignore culture and power – two really big and slippery aspects of what makes certain types of people have certain types of conversations in particular online spaces, and of what defines who feels welcome or unwelcome to participate. It has caused us to define Wikipedia or Facebook at a level of granularity that isn’t deep enough to really get an understanding of what is happening there, where the power is located and how we might engineer to encourage particular creations and conversations. This is not just about understanding the affordances of the software. In order to understand Wikipedia collaboration, I can’t only look at the MediaWiki software – in the same way that to understand Kenya, I couldn’t just read about its legal framework or look at the statistics about the country. Being there, experiencing how people to speak to me, noticing what the signposts say and what they leave out, is part of the necessarily long journey toward a full understanding of the place.

Perhaps most importantly, it is the culture of a place that will dominate my decision to come back or not. And this, in its essence, is at the heart of what every online community seeks, and is the same reason why it’s so hard to control. The government of Kenya can build better roads and speak on television about being welcoming to tourists, for example, but the majority of the experience of being in Kenya as a tourist or a local, is outside of government control. Culture, we find out, is a mysterious mix of so many different qualities in varying proportions that act together to define a place. Understanding culture is probably more art than science, but however we learn about it, it’s an important part of what makes us stay in some places to become loyal nationalists or merely return as tourists.

Introducing Ethnography Matters

This post introduces the new group blog I’m working on called Ethnography Matters
On the first of June this year, I became an ethnographer – but probably only an ethnographer in the sense that I got my first job with that title. My ethnography shoes (a pair of bright green sneakers) are new for me and yet, from the moment I learned about ethnography, I knew that these were the shoes I wanted and that, even though it would take some time to wear them in, they were the right style for me (flats FTW!). And so I did what I always do when faced with a big challenge: I asked some very special people to join me in my journey.

Rachelle Annechino and I are recent graduates of the School of Information at UC Berkeley. We met one cold, sunny summer day in August (only in San Francisco!) when I arrived to find friends to learn Python with. Rachelle and I meet to co-work and chat at Brown Couch Café in Oakland where we talk about fascinating bits and pieces from our lives and work. For her final project, Rachelle and her project partner, Yo-Shang Cheng interviewed San Francisco residents and asked them to draw pictures of their internal images or “mental maps” of the neighborhoods they lived in and of the city as a whole. They then visualized these mental maps according to concepts like ‘corridors’ (where are the hearts of each neighbourhood?), ‘barriers’ (is it really that close? It’s not always as simple as it looks getting from one neighbourhood to another in San Francisco) and ‘boundaries’ (what neighbourhood are you in? according to whom?). Rachelle is simply one of the most insightful, brilliant people I know. And she rocks at Python – which makes her a good friend to have.

I met Jenna Burrell when Rachelle and I took her Qualitative Research Methods class last year. Jenna has been doing research on Internet use in Ghana for the past decade or so and was one of the most inspiring teachers that I had at the I School. Jenna’s forthcoming book ‘Invisible Users: Youth in the Internet Cafes of Urban Ghana’ is an incredibly rich contribution to our understanding of African Internet culture. Mostly when I think of Jenna, I think of the fact that while I was in Accra speaking in staid conference rooms during the Africa preparatory conference for the World Summit on the Information Society in 2005, Jenna was also out talking to young Ghanaians in Internet cafes and in the streets who were disconnected from a discussion which was ostensibly about them. Jenna is an incredible mentor and her writing about ‘The Fieldsite as a Network’ has been so helpful in thinking about how to ‘do’ digital ethnography. She continues to push the boundaries of the discipline and ask important questions about how digital technologies might become part of the grassroots, self-organizing efforts of populations marginalized from the global economy.

Tricia Wang was introduced to me in one of Jenna’s classes when Jofish Kaye suggested I read about the work she had done on Internet censorship in China. I looked her up and just knew we would be friends. Tricia’s critique of the Google China debacle and her calling for Google to employ more ethnographers in order to better understand the Chinese internet culture was so powerful, and her PhD work on migrant workers is inspiring to say the least. As I write this, Tricia is in China doing her fieldwork, sleeping in Internet cafes and accompanying migrant workers as they move through the city. She’s trying to understand how the use of technology changes how people interact with the physical city, a concept she calls’ digital urbanism on the margins’: migrants’ urban lives mediated through communications technologies like mobile phones and computers in Internet cafes.

And then there’s me, the budding ethnographer, finding herself lucky to know these incredible people and looking forward to the little journey we’re going to go on at this site. Ethnography Matters will be a place where we can share what we’re reading and writing about, how we’re thinking about ethnography, and hopefully giving a little insight for others who are thinking about a career in ethnography into what this even means today. We’ll have others join us in the future, and if you’re interested in contributing, please let us know. We’re looking forward to walking around in your shoes too!

What should be remembered?

I’ve been thinking a lot about the disputes around Ushahidi’s role in humanitarian efforts and came round to thinking that we may be looking in the wrong place to discover the work that tools like Ushahidi’s Crowdmap are doing in the world. Whereas humanitarian organisations are asking (good) questions about whether Ushahidi’s tools help or hinder their efforts, another way to look at it might be to look from the perspective of the people making the maps and reports themselves. What work is Ushahidi doing for them? How do they see Ushahidi’s effectiveness? What social role does reporting play and how could we begin to measure effectiveness?

This morning I read a wonderful article by Tamar Ashuri from Ben-Gurion University for an upcoming edition of the journal New Media and Society entitled ‘(Web)sites of memory and the rise of moral mnemonic agents‘. Ashuri looked at how two websites set up by Israelis – one to monitor human rights of Palestinians at Israeli checkpoints; the other to collect testimonies of Israeli soldiers who served in the Occupied Territories – act as agents of collective memory. Ashuri argues that digital networked technologies is challenging the mechanisms that society employs to deny memories of immoral acts and how the online archives created by moral witnesses become a space of living memory and a sphere of moral engagement.

Ashuri explains that ‘collective memory is a social necessity; neither an individual nor a society can do without it.’ She quotes from Kansteiner (2002: 180) to describe how collective memory is different from history:

Collective memory is not history, though sometimes made from similar material […]. It can take hold of historically and socially remote events but it often privileges the interest of the contemporary. It is as much a result of conscious manipulation as unconscious absorption and it is always mediated.

Ashuri describes how Avishai Margalit distinguishes between “common memory” (a group of people who recall a certain episode that each of them experienced) and “shared memory” (which requires communication). Shared memory is is not just an aggregate of individual memories because it requires those who remember the episode to come together to create one version (or at least a few version) through an active presentation and retelling of a story that Margalit terms ‘a division of mnemonic labor’ (2002: 52). Margalit wrote that whereas in traditional society there was a direct line from the people to their priest, storyteller or shaman, shared memory in modern society ‘travels from person to person through institutions, such as archives, and through communal mnemonic devices, such as monuments and the names of streets’ (2002: 52). Ushuri posits a new term “joint memory” to describe a new type of memory that is a ‘compilation of personal histories made public for the public’ (2011: 4). She argues that digital networked technology is challenging the exclusive role of professional mnemonic agents designated by the church, state, monarchy etc.

Significantly, joint memory is not motivated by personal interests – the desire to tell an interesting story or reveal new information – but is driven by a social purpose: Witnesses who add their recollections to an accessible and shareable compilation of memories attempt to expose events that the default collective (such as the nation) denies or wishes to forget.

I don’t agree that such reporting is not motivated at all by personal interests but I do agree with the fact that the social/moral purpose of witnessing is really critical here. Ashuri builds on Margalit’s conception of a moral witness whose testimony is ‘essentially driven by a moral purpose. It reflects hope for the witness to be a social agent who, in testifying to his or her harsh experience, transforms (passive) addressees into active audiences’. She says that what is happening now is slightly different because the moral witness now performs memories of suffering experienced in a public space.

In my conceptualisation, the (moral) mnemonic agent is the one who recalls his or her memories regarding events in whcih others have suffered and by that act of witnessing renders this suffering visible and hence difficult to marginalize or deny. The moral aspect of this act, in my estimation, derives from the content of the mnemonic text (testimony about suffering inflicted by evil) and from its objective (calling on the audience to shed their observer garb and re-enact the experience of the harsh realities). (p5)

I think this is really useful way to think about how websites like Ushahidi as well as engagement on social media sites like Twitter are acting as platforms for this kind of performance and the communication of suffering, and how this is one way of looking at how collective narratives about the world are being wrested from those who traditionally controlled this in the Middle East. Whether it is reporting on human rights violations in Saudi Arabia, harassment of women in Egypt on Harassmap  or reports of arrests and casualties in Syria I think that looking at the maps through the lens of moral witnessing and Ashuri’s “joint memory” could be a wonderful entry point for re-thinking Ushahidi’s role and effectiveness in the world.


Wikipedia narratives

I’ve been spending the last few days thinking about my upcoming research into how Wikipedians currently use and understand sources and citations in different situations (directly after a major international news event like the Japan earthquake and in conflict situations such as the Middle East conflict) and what kinds of software tools could be helpful in advancing some of the goals and philosophies of Wikipedia globally. I’ve learned that there are a number of problems that Wikipedians encounter with the current policies and tools – including what some view as conservatism around sources (bias in favor of traditional, often inaccessible print-based materials and against online sources and commercial research, for example) and some specific observations of how blunt a tool MediaWiki is at supporting citations in emotionally-charged edit wars and in rapidly evolving events.

The big questions that I’m trying to answer in this research are:

1. What debates are Wikipedians having around sources and what does this say about how Wikipedians understand the verifiability policy?
2. What is the effect of the technical features and affordances of current wiki tools on issues of quality (for example, the ability of Wikipedia to be a current and accurate source of information on rapidly-evolving events) and diversity (how Wikipedia might incorporate a wider range of viewpoints that may be situated outside of traditional academic publications)?
3. How might alternative policies and tools affect those principles of quality and diversity?

These are great questions! And great questions are always a good place to start. But deciding on how to answer these questions with limited time and resources is where I need to get creative and hopefully ask for help from wise academics, Wikipedians and friends. These are some of my initial ideas and as a newbie ethnographer I’m hoping for some kind but realistic responses.

I was originally going to start off with a bunch of interviews of Wikipedians working on topics related to big international events. But after chatting to a number of Wikipedians at Wikimania and doing a few more open-ended interviews, I’ve actually realised that starting with particular articles and telling the story of those articles could actually get me to a better understanding of what’s going on in “source talk” a lot quicker. I think that I was initially caught up in the regular social science and usability methods of conducting research where you decide on your sample, go out and collect responses to a specific set of questions and then analyse the data. Spending the past two days reading and analysing talk pages for the ‘hummus’ articles in English, Hebrew and Arabic has made me realise that there is a wealth of incredible data about how Wikipedians actually talk about sources and that a better approach could be more like a detective – starting with the little pieces of evidence and then following the story as I interview the characters reflected in the articles.

Like a detective (or at least the ones in the movies) I’m working towards understanding motivations in order to piece together narratives about what happened and why. When I first thought of Wikipedia article debates, I envisioned a large boardroom table with people sitting around it rationally discussing what to add and what to leave out. Actually, the debates are much more like noisy town hall meetings. You have the crazy person who keeps shouting all sorts of completely irrelevant details and then complaining that they’re being systematically ignored, the exhausted public administrator who has seen the same arguments play out over and over again and who is snippy and terse when newcomers try to cover old territory. There’s the polite newbie who is surely too polite to be making genuine statements, and the loud Westerners who drop into the meeting to make sarcastic remarks about how stupid everyone is for fighting about such trivial matters. I think that these narratives — who is allied to who, what happens to the debate when related national or international events hit, how disruptive editors can deadlock entire articles — are actually at the heart of bigger questions about verifiability and that, especially when we’re thinking of designing new tools to fit into current working methods of users and not the other way round, then understanding exactly how the articles tick makes sense to me.

I started by printing out articles and talk pages for ‘hummus’ in WP English, Hebrew and Arabic (Google translated version at least) and did some rough analysis and coding (using present participles to denote what I thought was happening) plus notes relating to what was interesting in comparison to the other language versions. I will go over this again and then type up themes with related quotations and summarised stories, then follow up leads with some of the editors who were involved, code all of their interviews, add to the thematic groupings and then do the final analysis.

After I’ve done the same thing for a page relating to an international news event (either the 2011 Egyptian Revolution or the 2011 Japanese Earthquake which I have started to look at), I’m hoping to be able to make some good conclusions about how Wikipedians understand verifiability and what the effect of current policies and tools are on issues of quality and diversity. You’ll notice that I’m choosing to go deep rather than wide but in order to really ‘people’ this analysis and understand who is behind the pages and what the dynamics are, I’m thinking that this might be the best way of going about it.

Would love any (kind) thoughts, suggestions and even, yes, encouragement in my lonely space over here 🙂

What’s an ethnographer doing working for a software company anyway?

I wrote a short memo to the Ushahidi team about what exactly an ethnographer does and how ethnography as a discipline could be useful to Ushahidi (and Crowdmap in particular). I’m thinking of actually writing more about this and interviewing ethnographers working at technology companies to shed some light on this growing field.

What is ethnography?

Ethnography is a research method, with roots in anthropology, that aims to gain a rich perspective of user communities. Ethnographic research projects require the researcher to be deeply immersed in a specific research context (also called “participant observation”) and to develop an understanding that would not be achievable with other, more limited research approaches (Lazar, Feng, Hochheiser: 2010). Ethnography emerged from the practice of early anthropologists who studied “new” cultures Continue reading “What’s an ethnographer doing working for a software company anyway?”

The Spaces Between: Towards Private Spaces for Peer Learning

Alex and I completed our masters projects report on Thursday night. I thought I’d post the research that I did looking at information flows at the I School and the role of architecture in shaping the kinds of interactions that were taking place.

Interviewing students, staff and faculty and observing what was going down in the students lounge, the classroom, the co-lab and corridors, I concluded that the “spaces between” class play an important role in the learning experience because it is here where students can construct knowledge with their peers and practice the performance of their new identities. The fact that these spaces are located outside the purview of those in authority and that they enable students to choose who they can be intimate with is critical to the success of these spaces for enabling peer learning. In contrast, private digital spaces are unavailable to students, with the result that students attempted to use spaces like Facebook to engage with one another resulting in harms including exclusion, identity crises and self-censorship.

I noticed that the architecture of online-only educational spaces (looking at learning management systems, social media learning systems and open educational learning environments) seemed to replicate only the classroom space during class but without the protective walls available in conventional learning environments. This is really just exploratory research but I believe that the lack of nuanced social environments in online learning systems is a big part of what is leading to high dropout rates in distance/online learning programs and that we really need to build for “intimacy” rather than either the “private/closed” or “public/open” architecture characterised by current systems.

I’d love to carry on this research in the next few months but would love any feedback in the meantime.

And, yay! I’m going to graduate!

PDF >>