Full disclosure: Diary of an internet geography project #2

Reblogged from ‘Connectivity, Inclusivity and Inequality

In this series of blog posts, Heather Ford documents the process by which a group of computer and social scientists are working together in a project to understand the geography of Wikipedia citations. Their aim is not only to better understand how far Wikipedia has come to representing ‘the sum of all human knowledge’ but to do so in a way that lays bare the processes by which ‘big data’ is selected and visualized. In this post, Heather discusses how the group are focusing their work on a series of exploratory research questions.  week3 In last week’s call, we had a conversation about articulating the initial research questions that we’re trying to answer. At it’s simplest level, we decided that what we’re interested in is:

‘How much information about a place on Wikipedia comes from that place?’

In the English Wikipedia article about Guinea Bissau, for example, how many of the citations originate from organisations or authors in Guinea Bissau? In the Spanish Wikipedia article about Argentina, for example, what proportion of editors are from Argentina? Cumulatively, can we see any patterns among different language versions that indicate that some language versions contain more ‘local’ voices than others? We think that these are important questions because they point to extent to which Wikipedia projects can be said to be a reflection of how people from a particular place see the world; they also point to the importance of particular countries in shaping information about certain places from outside their borders. We think it makes a difference to the content of Wikipedia that the US’s Central Intelligence Agency (CIA) is responsible for such a large proportion of the citations, for example.

Past research from Brendan Luyt and Tan (2010, PDF) is instructive here. In 2010, Luyt and Tan took a random sample of national history articles on Wikipedia (English) and found that 17% were government sites and of those 17%, four of the top five sites were US government sites including the US Department of State and the CIA World Fact Book. The authors write that this is problematic because ‘the nature of the institutions producing these documents makes it difficult for certain points of view to be included. Being part of the US government, the CIA World Fact Book, for example, is unlikely to include a reading of Guatemalan history that stresses the role of US intervention as an explanation for that country’s long and brutal civil war.’ (p719) Instead of Luyt and Tan’s history articles, we’re looking at country articles and we’re zeroing in on citations and trying to ‘locate’ those citations in different ways. While we were talking on Skype, Shilad drew this really great diagram to show how we seem to be looking at this question of information geography: Screen Shot 2014-07-22 at 10.04.37 AM In this research, we seem to be looking at locating all three elements (the location of the article, the sources/citations and the editors) and then establishing the relationships between them i.e.

RQ1a What proportion of editors from a place edit articles about that place?

RQ1b What proportion of sources in an article about a place come from that place?

RQ1c What proportion of sources from particular places are added by editors from that place?

We started out by taking the address of the administrative contact contained in a source’s domain registration as the signal for the source’s location but we’ve come up against a number of issues as we’ve discussed the initial results. A question that seems to be a precursor to the questions above seems to be how we define ‘location’ in the context of a citation contained within in an article about a particular place. There are numerous signals that we might use to associate a citation with a particular place: the HQ of the publisher, for example, or the nationality of the author; the place in which the article/paper/book is set, or the place in which the publishers are located. An added complexity has to do with the fact that websites sometimes host content produced elsewhere. Are we using ‘author’ or ‘publisher’ when we attempt to locate a citation? If we attribute the source to the HQ of the website and not the actual text, are we still accurately portraying the location of the source?

In order to understand which signals to use in our large scale analysis, then, we’ve decided to try to get a better understanding of both the shape of these citations and the context in which those citations occur by looking more closely at a random sample of citations from articles about places and asking the questions: RQ0a To what extent might signals like ‘administrative contact of the domain registrant’ or ‘country domain’ accurately reflect the location of authors of Wikipedia sources about places? RQ0b What alternative signals might more accurately capture the locations of sources? Already in my own initial analysis of the English Wikipedia article on Mombasa, I noticed that citations to some articles written by locals were hosted on domains such as wordpress.com and wikidot.com that are registered in the US and Poland respectively. There was also a citation to the Kenyan 2009 census authored by the Kenya National Bureau of Statistics hosted by Scribd.com, and a link to an article about Mombasa written by a Ugandan on a US-based travel blog. All this means that we are going to under-represent East Africans’ participation in the writing of this place-based article about Mombasa if we use signals like domain registration.

We can, of course, ‘solve’ each of these problems by removing hosting sites like WordPress from our analysis, but the concern is whether this will negatively affect the representation of efforts by those few in developing countries who are doing their best to produce local content on Wikipedia. Right now, though, we’re starting with the micro level instances and developing a deeper understanding that way, rather than the other way around. And that I really appreciate.

Full disclosure: Diary of an internet geography project #1

Reblogged from ‘Connectivity, Inclusivity and Inequality

Screen Shot 2014-07-10 at 12.28.58 PMOII research fellow, Mark Graham and DPhil student, Heather Ford (both part of the CII group) are working with a group of computer scientists including Brent HechtDave Musicant and Shilad Sen to understand how far Wikipedia has come to representing ‘the sum of all human knowledge’. As part of the project, they will be making explicit the methods that they use to analyse millions of data records from Wikipedia articles about places in many languages. The hope is that by experimenting with a reflexive method of doing multidisciplinary ‘big data’ project, others might be able to use this as a model for pursuing their own analyses in the future. This is the first post in a series in which Heather outlines the team’s plans and processes.  

It was a beautiful day in Oxford and we wanted to show our Minnesotan friends some Harry Pottery architecture, so Mark and I sat on a bench in the Balliol gardens while we called Brent, Dave and Shilad who are based in Minnesota for our inaugural Skype meeting. I have worked with Dave and Shilad on a paper about Wikipedia sources in the past, and Mark and Brent know each other because they both have produced great work on Wikipedia geography, but we’ve never all worked together as a team. A recent grant from Oxford University’s John Fell Fund provided impetus for the five of us to get together and pool efforts in a short, multidisciplinary project that will hopefully catalyse further collaborative work in the future.

In last week’s meeting, we talked about our goals and timing and how we wanted to work as a team. Since we’re a multidisciplinary group who really value both quantitative and qualitative approaches, we thought that it might make sense to present our goals as consisting of two main strands: 1) to investigate the origins of knowledge about places on Wikipedia in many languages, and 2) to do this in a way that is both transparent and reflexive.

In her eight ‘big tent’ criteria for excellent qualitative research, Sarah Tracy (2010, PDF) includesself-reflexivity and transparency in her conception of researcher ‘sincerity’. Tracy believes that sincerity is a valuable quality that relates to researchers being earnest and vulnerable in their work and ‘considering not only their own needs but also those of their participants, readers, coauthors and potential audiences’. Despite the focus on qualitative research in Tracy’s influential paper, we think that practicing transparency and reflexivity can have enormous benefits for quantitative research as well but one of the challenges is finding ways to pursue transparency and reflexivity as a team rather than as individual researchers.

Transparency

Tracy writes that transparency is about researchers being honest about the research process.

‘Transparent research is marked by disclosure of the study’s challenges and unexpected twists and turns and revelation of the ways research foci transformed over time.’

She writes that, in practice, transparency requires a formal audit trail of all research decisions and activities. For this project, we’ve set up a series of Google docs folders for our meeting agendas, minutes, Skype calls, screenshots of our video call as well as any related spreadsheets and analyses produced during the week. After each session, I clean up the meeting minutes that we’ve co-produced on the Google doc while we’re talking, and write a more narrative account about what we did and what we learned beneath that.

Although we’re co-editing these documents as a team, it’s important to note that, as the documenter of the process, it’s my perspective that is foregrounded and I have to be really mindful of this as reflect what happened. Our team meetings are occasions for discussion of the week’s activities, challenges and revelations which I try to document as accurately as possible, but I will probably also need to conduct interviews with individual members of the team further along in the process in order to capture individual responses to the project and the process that aren’t necessarily accommodated in the weekly meetings.

Reflexivity

According to Tracy, self-reflexivity involves ‘honesty and authenticity with one’s self, one’s research and one’s audience’. Apart from the focus on interrogating our own biases as researchers, reflexivity is about being frank about our strengths and weaknesses, and, importantly, about examining our impact on the scene and asking for feedback from participants.

Soliciting feedback from participants is something quite rare in quantitative research but we believe that gaining input from Wikipedians and other stakeholders can be extremely valuable for improving the rigor of our results and for providing insight into the humans behind the data.

As an example, a few years ago when I was at a Wikimedia Kenya meetup, I asked what editorsthought about Mark Graham’s Swahili Wikipedia maps. One respondent was immediately able to explain the concentration of geolocated articles from Turkey because he knew the editor who was known as a specialist of Turkey geography stubs. Suddenly the map took on a more human form — a reflection of the relationships between real people trying to represent their world. More recently, a Swahili Wikipedians contacted Mark about the same maps and engaged him in a conversation about how they could be made better. Inspired by these engagements, we want to really encourage those conversations and invite people to comment on our process as it evolves. To do this, we’ll be blogging about the progress of the project and inviting particular groups of stakeholders to provide comments and questions. We’ll then discuss those comments and questions in our weekly meetings and try to respond to as many of them as possible in thinking about how we move the analysis forward.

In conclusion, transparency and reflexivity are two really important aspects of researcher sincerity. The challenge with this project is trying to put this into practice in a quantitative rather than qualitative project, a project driven by a team rather than an individual researcher. Potential risks are that I inaccurately report on what we’re doing, or expose something about our process that is considered inappropriate. What I’m hoping is that we can mark these entries clearly as my initial, necessarily incomplete reflections on our process and that this can feed into the team’s reflections going forward. Knowing the researchers in the team and having worked with all of them in the past, my goal is to reflect the ways in which they bring what Tracy values in ‘sincere’ researchers: the empathy, kindness, self-awareness and self deprecation that I know all of these team members display in their daily work.

Review of ‘code/space: software and everyday life’

This review was published in Environment and Planning B last year. I really loved the book and think that it’s a powerful reminder of the importance of context in thinking about how code does work in the world. 

Code/space: Software and Everyday Life By Rob Kitchin and Martin Dodge; MIT Press, Cambridge, London, 2011, 290 pages, ISBN: 978-0262042482

codepsaceKitchen and Dodge’s important new book,  Code/space: Software and Everyday Life,  opens with the crucial phrase – “software matters”. It matters, they argue, because software increasingly mediates our everyday lives – from the digital trail that extends just that little bit further when we order our morning coffee, to the data which is sent to a remote location about our electricity and gas usage from so-called “smart meters”, and the airport security databases that determine whether we are allowed to travel or not. The power of software is its ability to make our lives easier, improve efficiency and productivity; but such efficiencies come at the cost of pervasive surveillance, a feature that is producing a society that “never forgets”.

The key premise of the book is that there are two key gaps in the way that we talk about technology and society. The first critique is aimed at social science and humanities approaches that deal too much with the technologies that software enables rather than explaining the particular code that affects activity and behavior in different contexts. This is akin, the authors argue, to looking only at the effects of ill health on society, rather than also considering ‘the specifics of different diseases, their etiology (causes, origins, evolution, and implications), and how these manifest themselves in shaping social relations’ (p13).

Software studies, on the other hand, is a nascent research field that seeks ‘to open the black box of processors and arcane algorithms to understand how software – its lines and routines of code – does work in the world by instructing various technologies how to act’ (p13). The problem, they write, is that the majority of software studies are aspatial, presuming that space is merely a neutral backdrop against which human activity occurs. Here, Kitchin and Dodge’s critique is directed at scholars such as Lawrence Lessig, whose book Code and Other Laws of Cyberspace (Lessig, 1999) refers to code (in the form of software) as having the ability to automatically regulate activity online. But code, argue Kitchin and Dodge, is not law. It is neither universal nor deterministic, but rather contingent and relational. And space is not simply a container in which things happen but rather ‘subtly evolving layers of context and practices that fold together people and things and actively shape social relations’ (p13).

Kitchin and Dodge formulate two important concepts in arguing for a spatial approach to software studies. The first is ‘code/space’, or the moment when space and code are mutually dependent on one another (a check-in desk at an airport, for example) and ‘coded spaces’, on the other hand, which do not entirely depend on code to function (the use of Powerpoint slides during a presentation, for example). After detailing the types of software employed in the areas of home, travel and consumption, Kitchin and Dodge conclude with a Manifesto for Software Studies that sets out an agenda for software studies to produce ‘detailed case studies of how software does work in the world’ as well as ‘theoretical tools for describing how and explaining why, and the effects, of that work’ (p249). Here they propose studies comparing effects of code in rural Ireland and urban Manchester, for example, where code is analyzed in a manner that is sensitive to place and scale, cultural histories, and modes of activity (p249).

The concept that code is not universal or immutable but contingent and contextual is powerful, but Kitchen and Dodge have a tendency to analyze code in home, travel and consumption without any reference to differences in code’s impact in different places or spaces (the use of international passenger record databases when traveling from Rio de Janeiro, or trying to buy books on Amazon.com from Johannesburg, for example). Although they maintain that their intention is to provide a broad field for future research, the book would have been stronger with some analysis of the different ways in which code codes differing practices and how they are understood, used, and the different moments of its instantiation in places and/or spaces.

Where the book succeeds, and makes its most useful contribution, is in its lucid explanation of how detailed analyses of code are essential to understanding how software is becoming ingrained into our everyday lives, and how it has both an empowering and disciplining effect. Kitchin and Dodge are charting new disciplinary territory here, bringing together the fields of computer science, social science and spatial studies in highly promising ways. Theirs is an inspiring approach to how we might come to unveil the hidden choices behind the code that governs our everyday lives, and how we might come to understand a phenomenon that has an increasingly powerful role in society.

References

Lessig, L, 1999 Code: And Other Laws of Cyberspace (Basic Books, New York)

How Wikipedia’s Dr Jekyll became Mr Hyde: Vandalism, sock puppetry and the curious case of Wikipedia’s decline

This is a (very) short paper that I will be presenting at Internet Research in Denver this week. I want to write something longer about the story because I feel like it represents in many ways what I see as emblematic of so many of us who lived through our own Internet bubble: when everything seemed possible and there was nothing to lose. This is (a small slice of) Drork’s story. 

Richard Mansfield starring in The Strange Case of Dr. Jekyll and Mr. Hyde. Wikipedia. Public Domain.

Richard Mansfield starring in The Strange Case of Dr. Jekyll and Mr. Hyde. Wikipedia. Public Domain.

Abstract This paper concerns the rise and fall of Wikipedia editor, ‘drork’ who was blocked indefinitely from the English version of the encyclopedia after seven years of constructive contributions, movement leadership and intense engagement. It acts as a companion piece to the recent statistical analyses of patterns of conflict and vandalism on Wikipedia to reflect on the questions of why someone who was once committed to the encyclopedia may want to vandalize it. The paper compares two perspectives on the experience of being a Wikipedian: on the other hand, a virtuous experience that enables positive character formation as more commonly espoused, and alternatively as an experience dominated by in-fighting, personal attacks and the use of Wikipedia to express political goals. It concludes by arguing that the latter behavior is necessary in order to survive as a Wikipedian editing in these highly conflict-ridden areas.

Introduction

Recent scholarship has painted two competing pictures of what Wikipedia and Wikipedians are “like” and what they are motivated by. On the one hand, Benkler and Nissenbaum argue that because people contribute to projects like Wikipedia with motivations “ranging from the pure pleasure of creation, to a particular sense of purpose, through to the companionship and social relations that grow around a common enterprise”, the practice of commons-based peer production fosters virtue and enables “positive character formation” (Benkler and Nissenbaum, 2006). On the other hand, we have heard more recently about how “free and open” communities like Wikipedia have become a haven for aggressive, intimidating behavior (Reagle, 2013) and that reversions of newcomers’ contributions has been growing steadily and may be contributing to Wikipedia’s decline (Halfaker, Geiger, Morgan, & Riedl, in-press).   Continue reading

Isolated vs overlapping narratives: the story of an AFD

Editor’s Note: This month’s Stories to Action edition starts off with Heather Ford’s @hfordsa’s story on her experience of watching a story unfold on Wikipedia and in person. While working as an ethnographer at Ushahidi, Heather was in Nairobi, Kenya when she heard news of Kenya’s army invading Somolia. She found out that the article about this story was being nominated for deletion on Wikipedia because it didn’t meet the encyclopedia’s “notability” criteria. This local story became a way for Heather to understand why there was a disconnect between what Wikipedia editors and Kenyans recognised as “notable”. She argues that, although Wikipedia frowns on using social media as sources, the “word on the street” can be an important way for editors to find out what is really happening and how important the story is when it first comes out. She also talks about how her ethnographic work helped her develop insights for a report that Ushahidi would use in their plans to develop new tools for rapid real-time events. 

Heather shared this story at Microsoft’s annual Social Computing Symposium organized by Lily Cheng at NYU’s ITP. Watch the video of her talk, in which she refers to changing her mind on an article she wrote a few years ago, The Missing Wikipedians.

________________________________________________________________

A few of us were on a panel at Microsoft’s annual Social Computing Symposium led by the inimitable Tricia Wang. In an effort to reach across academic (and maybe culture) divides, Tricia urged us to spend five minutes telling a single story and what that experience made us realize about the project we were working on. It was a wonderful way of highlighting the ethnographic principle of reflexivity where the ethnographer reflects on their attitudes/thoughts/reactions in response to the experiences that they have in the field. I told this story about the misunderstandings faced by editors across geographical and cultural divides, and how I’ve come to understand Articles for Deletions (AFDs) on Wikipedia that are related to Kenya. I’ve also added thoughts that I had after the talk/conference based on what I learned here.   

npaper
In November, 2011, I arrived in Nairobi for a visit to the HQ of Ushahidi and to conduct interviews about a project I was involved with to understand how Wikipedians managed sources during rapidly evolving news events. We were trying to figure out how to build tools to help people who collaboratively curate stories about such events – especially when they are physically distant from one another. When I arrived in Nairobi, I went straight to the local supermarket and bought copies of every local newspaper. It was a big news day in the country because of reports that the Kenyan army had invaded Southern Somalia to try and root out the militant Al Shabaab terrorist group. The newspapers all showed Kenyan military tanks and other scenes from the offensive, matched by the kind of bold headlines that characterize national war coverage the world over.

A quick search on Wikipedia, and I noticed that a page had been created but that it had been nominated for deletion on the grounds that did not meet Wikipedia’s notability criteria. The nominator noted that the event was not being reported as an “invasion” but rather an “incursion” and that it was “routine” for troops from neighboring countries to cross the border for military operations.

In the next few days in Nairobi, I became steeped in the narratives around this event – on television, in newspapers, in the bars, on Twitter, and FB. I learned that the story was not actually a story about the invasion of one country by another, and that there were more salient stories that only people living in Kenya were aware of:

  1. This was a story about Kenyan military trying to prove itself: it was the first time since independence that the military had been involved in an active campaign and the country was watching to see whether they would be able to succeed.
  2. The move had been preceded by a series of harrowing stories the kidnapping of foreign aid workers and tourists on the border with southern Somalia – one of Kenya’s major tourist destinations – and the subsequent move by the British government to advise against Britons traveling to coastal areas near the Somali border. [Another narrative that Mark Kaigwa pointed out was that some Kenyans believed that this was a move by the government to prevent spending cuts to the military, and that, as an election year in Kenya, they wanted to prove themselves]
  3. There were threats of retaliation by al Shabaab – many sympathizers of whom were living inside Kenya. I remember sitting in a bar with friends and remarking how quiet it was. My friends answered that everyone had been urged not to go out – and especially not to bars because of the threat of attacks at which point I wondered aloud why we were there. Al Shabaab acted on those threats at a bar in the city center only a few miles away from us that night.

I used to think that these kind of deletions were just an example of ignorance, of cultural imperialism and even of racism. Although some of the responses could definitely be viewed that way, the editor who nominated the article for deletion, Middayexpress, was engaged in the AfD (Articles for Deletion) discussion, and has contributed the highest number of edits. His/her actions could not be explained by ignorance and bad faith alone.

What I realized when I was interviewing Wikipedians about these and other articles that were threatened with deletion for so-called “lack of notability” was that editors in countries outside of Kenya didn’t have access to these narratives that would make it obvious that this event was notable enough to deserve its own page. People outside of Kenya would have seen the single narrative about the incursion/invasion without any of these supporting narratives that made this stand out in Kenya as obviously important in the history of the country.

The Facebook page for Operation Linda Nchi has 1,825 Likes and contains news with a significant nationalistic bent about the campaign
The Facebook page for Operation Linda Nchi has 1,825 Likes and contains news with a significant nationalistic bent about the campaign

These narratives don’t travel well for three reasons:

a) The volume of international news being covered by traditional media in the West is declining. The story that Western editors were getting was a single story about a military offensive, one they thought must fit within a broader narrative about the Somali war;

b) Much of the local media that people in Kenya were exposed to (and certainly not buzz in the streets and in bars or the threat of bodily harm by terrorists) did not go online in traditional formats but was available on platforms like Facebook and Twitter, and

c) Even where it did, front pages of news websites are especially ineffective at showing readers when there is a single story that is really important. In newspapers, we fill up the entire front page with the story, make the headline shorter, run it along the entire page, and run a massive photograph when there is a war or a huge story. The front page of the Kenyan Daily Nation is always going to be busy, with a lot of competing stories, making it really difficult just by looking at the site whether a story was relatively more important than others.

This story made me realize how important it is for Wikipedians to expose themselves to social media sources so that they can get access to some of these supporting narratives that you just don’t get by looking online, and that despite Wikipedia’s general aversion to social media, this kind of contextual understanding is essential to gaining a more nuanced understanding of local notability. This finding influenced the eventual report for Ushahidi on how Wikipedians manage and debate sources and citations, and lent legitimacy to Ushahidi’s plans to develop news filtering tools for use during rapidly evolving news events such as disasters, elections and political violence.

Featured pic by NS Newsflash (CC-BY) on Flickr

February 2013: The Openness Edition

windows2

First published on ethnographymatters.net.

Last month on Ethnography Matters, we started a monthly thematic focus where each of the EM contributing editors would elicit posts about a particular theme. I kicked us off with the theme entitled ‘The Openness Edition’ where we investigated what openness means for the ethnographic community. I ended up editing some wonderful posts on the topic of openness last month – from Rachelle Annechino’s great post questioning what “informed consent” means in health research, to Jenna Burrell’s post about openaccess journals related to ethnography and Sarah Kendzior’s stimulating piece about by legitimacy and place of Internet research by anthropologists. We also had two really wonderful pieces sharing methods for more open, transparent research by Juliano Spyer (YouTube “video tags” as an open survey tool) and by Jeff Hall, Elizabeth Gin and An Xiao in their inspiring piece about how they facilitated story-building exercises with Homeless Youth in Boyle Heights (complete with PDF instructions!) Below is the editorial that I wrote at the beginning of the month where I try to tease out some of the complexities of my own relationship with the open access/open content movement. Comments welcome!

On Saturday the 12th of January, almost a month ago, I woke to news of Aaron Swartz’s death the previous day. In the days that followed, I experienced the mixed emotions that accompany such horrific moments: sadness for him and the pain he must have gone through in struggling with depression and anxiety, anger at those who had waged an exaggerated legal campaign against him, uncertainty as I posted about his death on Facebook and felt like I was trying to claim some part of him and his story, and finally resolution that I needed to clarify my own policy on open access. Continue reading

Crowd Wisdom

I just posted the article about Ushahidi and its future challenges that was published in the Index on Censorship last month (‘Crowd Wisdom’ by Heather Ford in Index on Censorship December 2012, vol. 41, no. 4 33-39 doi: 10.1177/0306422012465800) . I wrote about Ushahidi’s emergence as a powerful tool used in countries around the world to document elections, disasters and food – among others – and the coming challenges as the majority of Ushahidi implementations remain ‘small data’ projects and as tools move towards automatic verification, something only possible with ‘Big Data’.