Ward Cunningham, inventor of the wiki, at the first WikiSym in 2005 which was co-located with ACM OOPSLA in San Diego, California. Pic by Peter Kaminski CC BY on Flickr.
There has been much reflecting and soul-searching about the future of WikiSym in the past year (and probably before that as well). Many felt that the conference was becoming dominated by Wikipedia research and that it needed to grow to encompass more research in the open source, open data and open content realm. I felt that the conference needed to attract more social scientists and qualitative researchers in order to reach more detailed understanding of Wikipedia is being integrated into everyday life.

Despite the negatives, everyone felt that WikiSym was and still is the best place for people who do research about Wikipedia and other wikis to gather and that there was a lot of promise in broadening our mandate. This is why I feel so excited about co-chairing a new dedicated Wikipedia track at next year’s WikiSym in Hong Kong along with Mark Graham, also at the Oxford Internet Institute. And that’s why I was also happy that Dirk Riehle, veteren of WikiSym, is at the helm again next year, leading an effort to redesign the event around a changing research landscape.

There are a few key differences to next year’s event:

1. WikiSym 2013 will be held jointly with a new conference called ‘OpenSym’ and the entire event will consist of four tracks dedicated to different research trajectories:

  • Open collaboration (wikis, social media, etc.) research (WikiSym 2013), chaired by Jude Yew of National University of Singapore
  • Wikipedia research (WikiSym 2013), chaired jointly by myself and Mark Graham of the Oxford Internet Institute at the University of Oxford
  • Free, libre, and open source software research (OpenSym 2013), chaired jointly by Jesus M. Gonzalez-Barahona and Gregorio Robles of Universidad Rey Juan Carlos
  • Open access, data, and government research (OpenSym 2013), chaired by Anne Fitzgerald of Queensland University of Technology

This means that Mark and I can focus on getting the very best of Wikipedia research to WikiSym and in thinking hard about what is missing and what needs to be encouraged in the years to come. Continue reading “WikiSym Redefined”

Language, identity and Wikipedia: Some perspectives from the Cairo “Wikipedia in the Arab World” workshop

Mark Graham talks about the stated goal of Wikipedia to become the “sum of all human knowledge” while Ahmed Medat waits to translate into Arabic

It was the end of the final day of our workshop on the outskirts of Cairo and we were all feeling that curious mixture of inspiration, energy and exhaustion that follows those meetings where a world of ideas and people and things are thrown together in a concentrated few days. Mark Graham asked each of us if we’d like to say a few parting words and the participants spoke about how they enjoyed meeting Wikipedians from so many places in the Middle East, that they were happy to come to an event with academics and that they were excited about doing something to make a change in the real world. The majority of participants spoke in English – what was for many of them a third or fourth language – while some had their Arabic translated on the fly by other participants.

I was surprised when we got round to Mohamed Amarochan, Wikipedian, Mozilla hacker and blogger from Morocco, when he said that he would like to speak in Arabic. I knew that Mohamed had a really good command of English because I’d spent a fascinating ride with him from the airport on the way to the workshop where we commiserated with one another about visa hardships. When he chose to speak in Arabic and allowed others to translate into English, I realized that Mohammed was making an important statement about how small decisions like which language you choose to speak in a conversation like this one has big consequences.

As Clive Holes writes, ‘How we speak is an important part of who we are: in a sense, speech is the oral counterpart of how we dress. Both are intimately linked to our sense of self, and of how we prsent ourselves to, and are seen by, others.’ (Holes, 2011) Continue reading “Language, identity and Wikipedia: Some perspectives from the Cairo “Wikipedia in the Arab World” workshop”

The politics of truth: Who wins on Wikipedia? A study of what Wikipedia deletes and who it bans

Below is the research proposal that I wrote when I applied to the Oxford Internet Institute (OII) DPhil Programme in November last year. I’m guessing it’s going to evolve some (especially since I’m wanting to add some statistical work surrounding citations and translations between languages), but I’m really excited about it as it stands. The wonderful Dr Mark Graham is my supervisor at the OII and I’m lucky to also have Dr Chris Davies as my college advisor (I’m at Kellogg College here). Thank you to the OII for putting me forward for the Clarendon Award and to one of my heros, Bishop Desmond Tutu, for inspiring part of the award that got me here. Thanks, lastly and mostly, to Dror for inspiring me 🙂 With all these thanks it sounds like I’m at the end. But it’s only the beginning. I’m looking forward to comments and suggestions on how I might discover the answers to this question. I think I’ll certainly hear them in the months and years to come.

Download as PDF

Abstract: Wikipedia is, in many ways, the poster child of the Internet Age. It has been singled out as the ultimate working example of the collaborative power of the Internet (Shirky, Tapscott) and what Yochai Benkler calls ‘commons-based peer production’ to describe how the Internet has created radical new opportunities for how we make and exchange information, knowledge, and culture (Benkler, 2009). Part of its popularity comes from its power to influence and inform. As the sixth largest website in the world, with over million users and 90,000 active editors, Wikipedia is becoming one of the most influential reference works in history.

For every broad statement about Wikipedia, however, there are examples on the ground that hint at an alternative reality. The ideal that commentators (many of whom are not involved in editing the encyclopaedia on a daily basis) project is of a unified group of rational, detached, individual editors building a neutral, free encyclopaedia that is “the sum of all human knowledge”. But the organic nature of the encyclopaedia, its culture, politics and architecture have produced and continue to produce an encyclopaedia in which particular tactics, identities and relationships, many of which are in defiance of original rules, often prevail over reasoned and rational dialogue. Wikipedia still has a number of “dark spots”: from uneven geographies of articles written about places (Graham, 2011), to low numbers of female contributors (Lam et al, 2011) and vastly different levels of quality (Duguid, 2006). But there are other dark spots too – spots within the encyclopaedia itself: knowledges that are silenced, perspectives that are marginalised and people that are banned.

Who wins and who loses in this open environment? How do culture, politics, regulations, architecture and identity influence who wins or loses? And what does this mean for the way we think about online collaboration, its power and pitfalls? Continue reading “The politics of truth: Who wins on Wikipedia? A study of what Wikipedia deletes and who it bans”

Where does ethnography belong? Thoughts on WikiSym 2012

First posted at Ethnographymatters

On the first day of WikiSym last week, as we started preparing for the open space track and the crowd was being petitioned for new sessions over lunch, I suddenly thought that it might be a good idea for researchers who used ethnographic methods to get together to talk about the challenges we were facing and the successes we were having. So I took the mic and asked how many people used ethnographic methods in their research. After a few raised their hands, I announced that lunch would be spent talking about ethnography for those who were interested. Almost a dozen people – many of whom are big data analysts – came to listen and talk at a small Greek restaurant in the center of Linz. I was impressed that so many quantitative researchers came to listen and try to understand how they might integrate ethnographic methods into their research. It made me excited about the potential of ethnographic research methods in this community, but by the end of the conference, I was worried about the assumptions on which much of the research on Wikipedia is based, and at what this means for the way that we understand Wikipedia in the world. 

WikiSym (Wiki Symposium) is the annual meeting of researchers, practitioners and wiki engineers to talk about everything to do with wikis and open collaboration. Founded by the father of the wiki, Ward Cunningham and others, the conference started off as a place where wiki engineers would gather to advance the field. Seven years later, WikiSym is dominated by big data quantitative analyses of English Wikipedia.

Some participants were worried about the movement away from engineering topics (like designing better wiki platforms), while others were worried about the fact that Wikipedia (and its platform, MediaWiki) dominates the proceedings, leaving other equally valuable sites like Wikia and platforms like TikiWiki under-studied.

So, in the spirit of the times, I drew up a few rough analyses of papers presented.

It would be interesting to look at this for other years to see whether the recent Big Data trend is having an impact on Wikipedia research and whether research related to Wikipedia (rather than other open collaboration communities) is on the rise. One thing I did notice was that the demo track was a lot larger this year than the previous two years. Hopefully that is a good sign for the future because it is here that research is put into practice through the design of alternative tools. A good example is Jodi Schneider’s research on Wikipedia deletions that she then used to conceptualize alternative interfaces  that would simplify the process and help to ensure that each article would be dealt with more fairly. Continue reading “Where does ethnography belong? Thoughts on WikiSym 2012”

“Writing up rather than writing down”: Becoming Wikipedia Literate

Stuart Geiger and I will be presenting our paper about Wikipedia literacy in Linz, Austria for WikiSym 2012 (link below). It’s in the short paper series in which we introduce the concept of of “trace literacy”, a multi-faceted theory of literacy that sheds light on what new knowledges and organizational forms are required to improve participation in Wikipedia’s communities. The paper focuses on three short case studies about the misunderstandings resulting from article deletions in the past year and relate them to three key problems that literacy practitioner and scholar, Richard Darville outlined in his English literacy research. Two of the case studies are from interviews that we did with Kenyan Wikipedians, and the other concerns the Haymarket affair article controversy. Literacy, we believe, has a lot more to do with users being able to understand the complex traces left by experienced editors and how, where and when to argue their case, than simply learning how MediaWiki syntax works.

“Writing up rather than writing down”: Becoming Wikipedia Literate H. Ford and S. Geiger, WikiSym ’12, Aug 27–29, 2012, Linz, Austria

Beyond reliability: An ethnographic study of Wikipedia sources

First published on and 

Almost a year ago, I was hired by Ushahidi to work as an ethnographic researcher on a project to understand how Wikipedians managed sources during breaking news events. Ushahidi cares a great deal about this kind of work because of a new project called SwiftRiver that seeks to collect and enable the collaborative curation of streams of data from the real time web about a particular issue or event. If another Haiti earthquake happened, for example, would there be a way for us to filter out the irrelevant, the misinformation and build a stream of relevant, meaningful and accurate content about what was happening for those who needed it? And on Wikipedia’s side, could the same tools be used to help editors curate a stream of relevant sources as a team rather than individuals?

Original designs for voting a source up or down in order to determine “veracity”

When we first started thinking about the problem of filtering the web, we naturally thought of a ranking system which would rank sources according to their reliability or veracity. The algorithm would consider a variety of variables involved in determining accuracy as well as whether sources have been chosen, voted up or down by users in the past, and eventually be able to suggest sources according to the subject at hand. My job would be to determine what those variables are i.e. what were editors looking at when deciding whether to use a source or not? Continue reading “Beyond reliability: An ethnographic study of Wikipedia sources”

What does it mean to be a participant observer in a place like Wikipedia?

This post first appeared on Ethnography Matters on May 1.

The vision of an ethnographer physically going to a place, establishing themselves in the activities of that place, talking to people and developing deeper understandings seems so much simpler than the same activities in multifaceted spaces like Wikipedia. Researching how Wikipedians manage and verify information in rapidly evolving news articles in my latest ethnographic assignment, I sometimes wish I could simply to go the article as I would to a place, sit down and have a chat to the people around me.

Wikipedia conversations are asynchronous (sometimes with whole weeks or months between replies among editors) and it has proven extremely complicated to work out who said what when, let alone contact and to have live conversations with the editors. I’m beginning to realise how much physical presence is a part of the trust building exercise. If I want to connect with a particular Wikipedia editor, I can only email them or write a message on their talk page, and I often don’t have a lot to go on when I’m doing these things. I often don’t know where they’re from or where they live or who they really are beyond the clues they give me on their profile pages. Continue reading “What does it mean to be a participant observer in a place like Wikipedia?”

Update on the Wikipedia sources project

This post first appeared on the Ushahidi blog.

Last month I presented the first results of the WikiSweeper project, an ethnographic research project to understand how Wikipedia editors track, evaluate and verify sources on rapidly evolving pages of Wikipedia, the results of which will inform ongoing development of the SwiftRiver (then Sweeper) platform. Wikipedians are some of the most sophisticated managers of online sources and we were excited to learn how they collaboratively decide which sources to use and which to dismiss in the first days of the 2011 Egyptian Revolution. In the past few months, I’ve interviewed users from the Middle East, Kenya, Mexico and the United States, studied hundreds of ‘talk pages’ from the article and analysed edits, users and references from the article, and compared these findings to what Wikipedia policy says about sources. In the end, I came up with four key findings that I’m busy refining for the upcoming report:

1.The source <original version of the article and its author> of the page can play a significant role: Wikipedia policy indicates that characteristics of the book, author and publishers of an article’s citations all affect reliability. But the 2011 Egyptian Revolution article showed how influential the Wikipedia editor who edits the first version of the page can be. Making Wikipedia editors’ reputation, edit histories etc more easily readable is a critical component to understanding points of view while editing and reading rapidly evolving Wikipedia articles. Continue reading “Update on the Wikipedia sources project”

Wikipedia Isn’t Journalism, But Are Wikipedians Reluctant Journalists?

Cross-posted from PBS Idea Lab

Wikipedia articles on breaking news stories dominate page views on the world’s sixth-largest website. Perhaps more importantly, these articles drive the most significant editor contribution — especially among new editors.


In the first three months of this year, English Wikipedia articles with the most contributors were the 2011 Tucson shooting, the 2011 Egyptian revolution and the 2011 Tōhoku earthquake and tsunami articles with 460, 405 and 785 editors contributing to the growth of the article respectively.

Interestingly, a number of Wikipedia policies discourage writing articles on breaking news. One of Wikipedia’s 42 policies, titled “What Wikipedia is not” (or WP:NOT), highlights that the site is, above all, an encyclopedia, not a newspaper (Wikipedia:NotNewspaper). The policy states that although the encyclopedia needs to include current and up-to-date information as well as standalone articles on “significant current events,” not all verifiable events are suitable for inclusion in Wikipedia.

Wikipedia articles are not journalism

According to the policy, “Wikipedia should not offer first-hand news reports on breaking stories” because “Wikipedia is not a primary source.” The encyclopedia has a tenuous relationship with primary sources. Policy states that primary sources, “accounts written by people who are directly involved in an event, offering an insider’s view of an event” are (mostly) inappropriate because Wikipedia strives to represent a “Neutral Point of View” (NPOV), and primary sources can be misused to reflect a fringe theory as mainstream. NPOV is one of the five pillars of Wikipedia and frames to a large degree what is allowed into the encyclopedia and what is left out.

News reports on a breaking news story require that Wikipedians use primary sources to update the rapidly evolving articles on issues like death counts after an earthquake. While journalists are able to use primary sources to make a judgment on the death count at the time of publishing and then do the same using new sources when they write successive stories, Wikipedians must do the same collectively and iteratively as new versions are created every few seconds.

In the Japanese earthquake article, this challenge resulted in contradictory facts about the height of the tsunami and the death tolls in the same article, prompting one editor (“Dcoetzee”) to create templates for the number of missing and the dead casualties that could be edited once with changes immediately reflected in every part of the page (see Keegan, Gergle and Contractor’s Hot off the Wiki: Dynamics, Practices, and Structures in Wikipedia’s Coverage of the Tohoku Catastrophes).

Wikipedia articles are not news reports

The barrier to entry into Wikipedia articles is notability: Subjects must be notable enough to create enduring articles on the encyclopedia. According to policy, while news reporting covers announcements, sports news or celebrities, the fact that something is “in the news” is not a sufficient basis for inclusion in the encyclopaedia. Notability is difficult, perhaps impossible to predict directly after an event, and can result in historical events being described in purely modern terms or an article being created about something noteworthy at a particular time which later might not meet notability requirements.

Wikipedians call this “recentism” and have a tag to make it transparent to readers that the article might be skewed towards “recent perspectives.” In an essay on “recentism,” Wikipedians describe the phenomenon as “writing or editing without a long-term, historical view, thereby inflating the importance of a topic that has received recent public attention.”

Both the “Wikipedia articles are not: Journalism” and “Wikipedia articles are not: News reports” policies recommend moving timely news subjects to WikiNews, a sister project to Wikipedia that allows use of primary sources and is intended to be a primary source. But WikiNews has suffered from a low contributor base and disagreement among contributors about the best way to build the news portal.

In September, a large portion of the Wikinews contributor base announced on the Foundation-l mailing list that they had forked the project and started OpenGlobe” after becoming deeply dissatisfied with Wikinews.”

Wikipedia articles are not who’s who

The third item of “Wikipedia:NotNewspaper” explains that, even when an event is notable, individuals involved in it may not be. This policy speaks to the need for enduring articles that will still be notable in the years after the event. While newspapers are often concerned with explaining events through the people affected by such events, Wikipedia wishes to take the long-term view, attempting to avoid cases that give undue weight to the person or event and thus conflict with NPOV.


The first rough draft of history?

It took just 11 minutes for the Japanese Wikipedia to create an article after the 9.0-magnitude undersea megathrust earthquake occurred off the coast of Japan on March 11. Twenty-one minutes later, the English Wikipedia article was created, and although the wire services reported the earthquake within minutes, The New York Times did not file a full story until more than three hours after the earthquake hit.

Despite the distinct discouragement of reporting on current news item for reasons mentioned above, Wikipedia has become the site of major activity around large news events like this one. The ability of anyone to edit the encyclopedia and the lack of any restrictions on editing articles, as well as the fact that notability is a relative concept, means that Wikipedia policy cannot stop the hundreds of editors who flock to the encyclopedia driven by a single purpose to work on a particular page.

But if Wikipedia and not the news media is the first rough draft of history, what does this mean for Neutral Point of View? If Wikipedians are evaluating and synthesizing primary sources rather than sources who have already evaluated the importance of an event, is Wikipedia at the risk of becoming subjective? Consensus may be more easily achieved when the event is a natural disaster, but when it’s a war or a revolution and the editors’ motivations are different, then the same architectural flexibilities can lock articles into disagreement.

Wikipedia may be a reluctant journalist, but its influence on the media landscape is unmistakable.

Why Wikipedia articles are deleted

Stuart Geiger and I just presented some research at WikiSym on why Wikipedia articles are deleted through both the speedy deletion or “CSD” process, a unilateral process whereby administrators can deleted problematic articles without discussion, and the articles for deletions or “AfD” process whereby articles discuss whether articles should be deleted. You might imagine that the majority of CSDs are deleted because of spam or vandalism, but interestingly, we found that the majority (the blue chunk in the chart on the left here) are deleted because of they lack any ‘indication of importance’. We also found that the deletion process is heavily frequented by a relatively small number of longstanding users.

Our key findings include:

1. About half of all deleted articles from June ’07 to Jan ’11 were unilaterally deleted by administrators via the CSD process.
2. Surprisingly, spam, vandalism and patent nonsense make up only 8.00%, 5.69% and 5.36% of CSDs respectively, while the more subjective ‘No indication of importance’ makes up 38.47% of all CSD criteria.
3. With some outliers, AfD discussions have few participants, and those participants are overwhelmingly regulars to the process. 74% of all AfDs are made up entirely of users who have previously participated in an AfD, and 18% of all AfDs only have one newcomer. You can read more on the PDFs below but there’s also a lot of great research by other authors at WikiSym and on from the Wikimedia Foundation’s Summer of Research program.

Poster PDF |  Participation in Wikipedia’s Article Deletion Processes (WikiSym accepted poster research) PDF