BBC Click on Wikipedia interventions

BBC Click interviewed me for a segment on possible manipulation of Wikipedia by the Chinese state (below). Manipulation of Wikipedia by states is not new. What does seem to be new here, though, is the way in which strategies for intervening in Wikipedia (both through the election of administrators and at individual articles) are so explicitly outlined.

Remember, though, that we can never know who is editing these articles. Even wikiedits bots only pick up edits within government IP address ranges. We have no way of knowing whether the person represented by that IP address in that sitting is employed by the government. The point is that there is a lot to be gained from influencing Wikipedia’s representation of people, places, events and things given Wikipedia’s prioritised role as data source for digital assistants and search engines.

It makes sense, then, that institutions (including governments, corporations and other organisations) will try to give weight to their version of the truth by taking advantage of the weak points of the peer produced encyclopedia. Guarding against that kind of manipulation is critical but not a problem that can be easily solved. More thoughts on that soon…

PhD Scholarships on “Data Justice” and “Living with Pervasive Media Technologies from Drones to Smart Homes”

I’m excited to announce that I will be co-supervising up to four very generous and well-supported PhD scholarships at the University of New South Wales (Sydney) on the themes of “Living with Pervasive Media Technologies from Drones to Smart Homes” and “Data Justice: Technology, policy and community impact”. Please contact me directly if you have any questions. Expressions of Interest are due before 20 July, 2017 via the links below. Please note that you have to be eligible for post-graduate study at UNSW in order to apply – those requirements are slightly different for the Scientia programme but require that you have a first class honours degree or a Master’s by research. There may be some flexibility here but that would be ideal.

Living with Pervasive Media Technologies from Drones to Smart Homes

Digital assistants, smart devices, drones and other autonomous and artificial intelligence technologies are rapidly changing work, culture, cities and even the intimate spaces of the home. They are 21st century media forms: recording, representing and acting, often in real-time. This project investigates the impact of living with autonomous and intelligent media technologies. It explores the changing situation of media and communication studies in this expanded field. How do these media technologies refigure relations between people and the world? What policy challenges do they present? How do they include and exclude marginalized peoples? How are they transforming media and communications themselves? (Supervisory team: Michael Richardson, Andrew Murphie, Heather Ford)

Data Justice: Technology, policy and community impact

With growing concerns that data mining, ubiquitous surveillance and automated decision making can unfairly disadvantage already marginalised groups, this research aims to identify policy areas where injustices are caused by data- or algorithm-driven decisions, examine the assumptions underlying these technologies, document the lived experiences of those who are affected, and explore innovative ways to prevent such injustices. Innovative qualitative and digital methods will be used to identify connections across community, policy and technology perspectives on ‘big data’. The project is expected to deepen social engagement with disadvantaged communities, and strengthen global impact in promoting social justice in a datafied world. (Supervisory team: Tanja Dreher, Heather Ford, Janet Chan)

Further details on the UNSW Scientia Scholarship scheme are available on the titles above and here:
https://www.2025.unsw.edu.au/apply/?interest=scholarships 

Wikipedia’s relationship to academia and academics

I was recently quoted in an article for Science News about the relationship between academia and Wikipedia by Bethany Brookshire. I was asked to comment on a recent paper by MIT Sloan‘s Neil Thompson and Douglas Hanley who investigated the relationship between Wikipedia articles and scientific papers using examples from chemistry and econometrics. There are a bunch of studies on a similar topic (if you’re interested, here is a good place to start) and I’ve been working on this topic – but from a very different angle – for a qualitative study to be published soon. I thought I would share my answers to the interview questions here since many of them are questions that friends and colleagues ask regularly about citing Wikipedia articles and about quality issues on Wikipedia.

Have you ever edited Wikipedia articles?  What do you think of the process?

Some, yes. Being a successful editor on English Wikipedia is a complicated process, particularly if you’re writing about topics that are either controversial or outside the purview of the majority of Western editors. Editing is complicated not only because it is technical (even with the excellent new tools that have been developed to support editing without having to learn wiki markup) – most of the complications come with knowing the norms, the rules and the power dynamics at play.

You’ve worked previously with Wikipedia on things like verification practices. What are the verification practices currently?

That’s a big question 🙂 Verification practices involve a complicated set of norms, rules and technologies. Editors may (or may not) verify their statements by checking sources, but the power of Wikipedia’s claim-making practice lies in the norms of questioning  unsourced claims using the “citation needed” tag and by any other editor being able to remove claims that they believe to be incorrect. This, of course, does not guarantee that every claim on Wikipedia is factually correct, but it does enable the dynamic labelling of unverified claims and the ability to set verification tasks in an iterative fashion.

Many people in academia view Wikipedia as an unreliable source and do not encourage students to use it. What do you think of this?

Academic use of sources is a very contextual practice. We refer to sources in our own papers and publications not only when we are supporting the claims they contain, but also when we dispute them. That’s the first point: even if Wikipedia was generally unreliable, that is not a good reason for denying its use. The second point is that Wikipedia can be a very reliable source for particular types of information. Affirming the claims made in a particular article, if that was our goal in using it, would require verifying the information that we are reinforcing through citation and in citing the particular version (the “oldid” in Wikipedia terms) that we are referring to. Wikipedia can be used very soundly by academics and students – we just need to do so carefully and with an understanding of the context of citation – something we should be doing generally, not only on Wikipedia.

You work in a highly social media savvy field, what is the general attitude of your colleagues toward Wikipedia as a research resource? Do you think it differs from the attitudes of other academics?

I would say that Wikipedia is widely recognized by academics, including those of my colleagues who don’t specifically conduct Wikipedia research, as a source that is fine to visit but not to cite.

What did you think of this particular paper overall?

I thought that it was a really good paper. Excellent research design and very solid analysis. The only weakness, I would argue, would be that there are quite different results for chemistry and econometrics and that those differences aren’t adequately accounted for. More on that below.

The authors were attempting a causational study by adding Wikipedia articles (while leaving some written but unadded) and looking at how the phrases translated to the scientific literature six months later. Is this a long enough period of time?

This seems to be an appropriate amount of time to study, but there are probably quite important differences between fields of study that might influence results. The volume of publication (social scientists and humanities scholars tend to produce much lower volumes of publications and publications thus tend to be extended over time than natural science and engineering subjects, for example), the volume of explanatory or definitional material in publications (requiring greater use of the literature), the extent to which academics in the particular field consult and contribute to Wikipedia – all might affect how different fields of study influence and are influenced by Wikipedia articles.

Do you think the authors achieved evidence of causation here?

Yes. But again, causation in a single field i.e. chemistry.

It is important to know whether Wikipedia is influencing the scientific literature? Why or why not?

Yes. It is important to know whether Wikipedia is influencing scientific literature – particularly because we need to know where power to influence knowledge is located (in order to ensure that it is being fairly governed and maintained for the development of accurate and unbiased public knowledge).

Do you think papers like this will impact how scientists view and use Wikipedia?

As far as I know, this is the first paper that attributes a strong link between what is on Wikipedia and the development of science. I am sure that it will influence how scientists and other academic view and use Wikipedia – particularly in driving initiatives where scientists contribute to Wikipedia either directly or via initiatives such as PLoS’s Topic Pages.

Is there anything especially important to emphasize?

The most important thing is to emphasize the differences between fields that I think needs to be better explained. I definitely think that certain types of academic research are more in line with Wikipedia’s way of working, forms and styles of publication and epistemology and that it will not have the same influence on other fields.

Towards software that supports interpretation rather than quantification

Towards software that supports interpretation rather than quantification

[Reblogged from the Software Sustainability Institute blog]

My research involves the study of the emerging relationships between data and society that is encapsulated by the fields of software studies, critical data studies and infrastructure studies, among others. These fields of research are primarily aimed at interpretive investigations into how software, algorithms and code have become embedded into everyday life, and how this has resulted in new power formations, new inequalities, new authorities of knowledge [1]. Some of the subjects of this research include the ways in which Facebook’s News Feed algorithm influences the visibility and power of different users and news sources (Bucher, 2012), how Wikipedia delegates editorial decision-making and moral agency to bots (Geiger and Ribes, 2010), or the effects of Google’s Knowledge Graph on people’s ability to control facts about the places in which they live (Ford and Graham, 2016).

As the only Software Sustainability Institute fellows working in this area, I set myself the goal of investigating what tools, methods and infrastructure researchers working in these fields were using to conduct their research. Although Big Data is a challenge for every field of research, I found that the challenge for social scientists and humanities scholars doing interpretive research in this area is unique and perhaps even more significant. Two key challenges stand out. The first is that data requiring interpretation tends to be much larger than traditionally analysed. This often requires at least some level of quantification in order to ‘zoom out’ to obtain a bigger picture of the phenomenon or issues under study. Researchers in this tradition often lack the skills to conduct such analyses – particularly at scale. The second challenge is that online data is subject to ethical and legal restrictions, particularly when research involves interpretive research (as opposed to the anonymized data collected for statistical research).

In many universities it seems that mathematics, engineering, physics and computer science departments have started to build internal infrastructure to deal with Big Data, and some universities have established good Digital Humanities programs that are largely about the quantitative study of large corpuses of images/films/videos or other cultural objects. But infrastructure and expertise is severely lacking for those wishing to do interpretive rather than quantitative research using mixed, experimental, ethnographic or qualitative research using online data. The software and infrastructure required for doing interpretive research is patchy, departments are typically ill-equipped to support researchers and students with the expertise required to conduct social media research, and significant ethical questions remain about doing social media research, particularly in the context of data protection laws.

Data Carpentry offers some promise here. I organized, with the support of the Software Sustainability Institute, a “Data Carpentry for the Social Sciences workshop” with Dr Brenda Moon (Queensland University of Technology) and Martin Callaghan (University of Leeds) in November 2016 at Leeds University. Data Carpentry workshops tend to be organized for quantitative work in the hard sciences and there were no lesson plans for dealing with social media data. Brenda stepped in to develop some of these materials based partly on the really good Library Carpentry resources and both Martin and Brenda (with additional help from Dr Andy Evans, Joanna Leng and Dr Viktoria Spaiser) made an excellent start towards seeding the lessons database with some social media specific exercises.

The two-day workshop centered on examples from Twitter data and participants worked with Python and other off-the-shelf tools to extract and analyze data. There were fourteen participants in the workshop ranging from PhD students to professors and from media and communications to sociology and social policy, music to law, earth and environment to translation studies. At the end of the workshop participants said that they felt they had received a strong grounding in Python and that the course was useful, interactive, open and not intimidating. There were suggestions, however, to make improvements to the Twitter lessons and to perhaps split up the group in the second day to move onto more advanced programming for some and to go over the foundations for beginners.

Also supported by the Institute was my participation in two conferences in Australia at the end of 2016. The first was a conference exploring the impact of automation on everyday life at the Queensland University of Technology in Brisbane, the second, the annual Crossroads in Cultural Studies conference in Sydney. Through my participation in these events (and via other information-gathering that I have been conducting in my travels) I have learned that many researchers in the social sciences and humanities suffer from a significant lack of local expertise and infrastructure. On multiple occasions I learned of PhD students and researchers running analyses of millions of tweets on their laptops, suffering from a lack of understanding when applying for ethical approval and conducting analyses that lack a consistent approach.

Centers of excellence in digital methods around the world share code and learnings where they can. One such program is the Digital Methods Initiative (DMI) at the University of Amsterdam. The DMI hosts regular summer and winter schools to train researchers in using digital methods tools and provides free access to some of the open source software tools that it has developed for collecting and analyzing digital data. Queensland University of Technology’s Social Media Group also hosts summer schools and has contributed to methodological scholarship employing interpretive approaches to social media and internet research. The common characteristic of such programmes are that they are collaborative (sharing resources across the university departments and between different universities) and innovative (breaking some of the traditional rules that govern traditional research in the university).

Many researchers who handle data in more interpretive studies tend to rely on these global hubs in the few universities where infrastructure is being developed. The UK could benefit from a similar hub for researchers locally, especially since software and code needs to be continually developed and maintained for a much wider variety of evolving methods. Alternatively, or alongside such hubs, Data Carpentry workshops could serve as an important virtual hub for sharing lesson plans and resources. Data Carpentry could, for example, host code that can be used to query APIs for doing social media research and workshops could also be used to collaboratively explore or experiment with methods for iterative, grounded investigation of social media practices.

Due to the rapid increase in the scale and velocity of social media data and because of the lack of technical expertise to manage such data, social scientists and humanities scholars have taken a backseat to the hard sciences in explaining new dimensions of social life online. This is disappointing because it means that much of the research coming out about social media, Big Data and the computation lacks a connection to important social questions about the world. Building from some of this momentum will be essential in the next few years if we are to see social scientists and humanities scholars adding their important insights into social phenomena online. Much more needs to be done to build flexible and agile resources for the rapidly advancing field of social media research if we are to benefit from the contributions of social science and humanities scholars in the field of digital cultures and politics.

[1] For an excellent introduction to the contribution of interpretive scholars to questions about data and the digital see ‘The Datafied Society’ just published by Amsterdam University Press http://en.aup.nl/books/9789462981362-the-datafied-society.html

Pic: Martin Callaghan displays the ‘Geeks and repetitive tasks’ model during the November 2016 Data Carpentry for the Social Sciences workshop at Leeds University.

Human-bot relations at ICA 2017 in San Diego

News this week that a panel I contributed to on political bots has been accepted for the annual International Communication Association (ICA) conference in San Diego with Amanda Clarke, Elizabeth Dubois, Jonas Kaiser and Cornelius Puschmann this May. Political bots are automated agents that are deployed on social media platforms like Twitter to perform a variety of functions that are having a significant impact on politics and public life. There is already some great work about the negative impact of bots that are used to “manipulate public opinion by megaphoning or repressing political content in various forms” (see politicalbots.org) but we were interested in the types of bots these bots are often compared to — the so-called “good” bots that expose the actions of particular types of actors (usually governments) and thereby bring about greater transparency of government activity.

Elizabeth, Cornelius and I worked on a paper about WikiEdits bots for ICA last year in the pre-conference: “Algorithms, Automation, Politics” (“Keeping Ottawa Honest — One Tweet at a Time?” Politicians, Journalists and their Twitter bots, PDF) where we found that the impact of these bots isn’t as simple as bringing about greater transparency. The new work that we will present in May is a deeper investigation of the types of relationships that are catalysed by the existence and ongoing development of transparency bots on Twitter. I’ll be working on the relationship between bots and their creators in both Canada and South Africa, attempting to investigate the relationship between the bots and the transparency that they promise. Cornelius is looking at the relationship between journalists and bots, Elizabeth and Amanda are looking at the relationship between bots and political staff/government employees, and Jonas will be looking more closely at bots and users. The awesome Stuart Geiger who has done some really great work on bots has kindly agreed to be a respondent to the paper.

You can read more about the panel and each of the papers below.

Do people make good bots bad?

Political bots are not necessarily good or bad. We argue the impact of transparency bots (a particular kind of political bot) rests largely on the relationships bots have with their creators, journalists, government and political staff, and the general public. In this panel each of these relationships is highlighted using empirical evidence and a respondent guides wider discussion about how these relationships interact in the wider political and media system.

This panel challenges the notion that political bots are necessarily good or bad by highlighting relationships between political actors and transparency bots. Transparency bots are automated social media accounts which report behaviour of political players/institutions and are normally viewed as a positive force for democracy. In contrast, bot activity such as astroturfing and the creation of fake followers or friends on social media has been examined and critiqued as nefarious in academic and popular literature. We assert that the impact of transparency bots rests largely on the relationships bots have with their creators, journalists, government and political staff, and the general public. Each panelist highlights one of these relationships (noting related interactions with additional actors) in order to answer the question “How do human-bot relationships shape bots’ political impact?”

Through comparative analysis of the Canadian and South African Wikiedits bots, Ford shows that transparency is not a potential affordance of the technology but rather of the conditions in place between actors. Puschmann considers the ways bots are framed and used by journalists in a content analysis of news articles. Dubois and Clarke articulate the ways public servants and political staff respond to the presence of Wikiedits bots revealing that internal institutional policies mediate the relationships these actors can have with bots. Finally, Kaiser asks how users who are not political elite actors frame transparency bots making use of a quantitative and qualitative analysis of Reddit content.

Geiger (respondent) then poses questions which cut across the relationships and themes brought out by panelists. This promotes a holistic view of the bot in their actual communicative system. Cross-cutting questions illustrate that the impact of bots is seen not simply in dyadic relationships but also in the ways various actors interact with each other as well as the bots in question.

This panel is a needed opportunity to critically consider the political role and impact of transparency bots considering the bot in context. Much current literature assumes political bots have significant agency, however, bots need to interact with other political actors in order to have an impact. A nuanced understanding of the different types of relationships among political actors and bots that exists is thus essential. The cohesive conversation presented by panelists allows for a comparison across the different kinds of bot-actor relationships, focusing in detail on particular types of actors and then zooming out to address the wider system inclusive of these relationships.

  1. Bots and their creators
    Heather Ford

Bots – particularly those with public functions such as government transparency – are often created and recreated collaboratively by communities of technologists who share a particular world view of democracy and of technology’s role in politics and social change. This paper will focus on the origins of bots in the motivations and practices of their creators focusing on a particular case of transparency bots. Wikipedia/Twitter bots are built to tweet every time an editor within a particular government IP range edits Wikipedia as a way of notifying others to check possible government attempts to manipulate facts on the platform. The outputs of Wikipedia/Twitter bots have been employed by journalists as sources in stories about governments manipulating information (Ford et al, 2016).

Investigating the relationship between bot creators and their bots in Canada and South Africa by following the bots and their networks using mixed methods, I ask: To what extent is transparency an affordance of the particular technology being employed? Or is transparency rather an affordance of the conditions in place between actors in the network? Building from theories of co-production (Jasanoff, 2004) and comparing the impact of Wikipedia/Twitter bots on the news media in Canada and South Africa, this paper begins to map out the relationships that seem to be required for bots to take on a particular function (such as government transparency). Findings indicate that bots can only become transparency bots through the enrolling of allies (Callon, 1986) and through particular local conditions that ensure success in achieving a particular outcome. This is a stark reminder of the connectedness of human-machine relations and the limitations on technologists to fully create the world they imagine when they build their bots.

 

2. Bots and Journalists
Cornelius Puschmann

Different social agents — human and non-human — compete for attention, spread information and contribute to political debates online. Journalism is impacted by digital automation in two distinct ways: Through its potentially manipulative influence on reporting and thus public opinion (Woolley & Howard, 2016, Woolley, 2016), and by providing journalists with a set of new tools for providing insight, disseminating information, and connecting with audiences (Graefe, 2016; Lokot & Diakopoulos, 2015). This contribution focuses primarily on the first aspect, but also takes the second into account, because we argue that fears of automation in journalism may fuel reservations among journalists regarding the role of bots more generally.

To address the first aspect, we present the results of a quantitative content analysis of English-language mainstream media discourse on bots. Building on prior research on the reception of Bots (Ford et al, 2016), we focus on the following aspects in particular:

– the context in which bots are discussed,

– the evaluation (“good” for furthering transparency, “bad” because they spread propaganda),

– the implications for public deliberation (if any).

Secondly, we discuss the usage of bots and automation for the news media, using a small set of examples from the context of automated journalism (Johri, Han & Mehta, 2016). Bots are increasingly used to automate particular aspects of journalism, such as the generation of news items and the dissemination of content. Building on these examples we point to the “myriad ways in which news bots are being employed for topical, niche, and local news, as well as for providing higher-order journalistic functions such as commentary, critique, or even accountability” (Lokot & Diakopoulos, 2015, p. 2).

 

3. Bots and Government/Political Staff
Elizabeth Dubois and Amanda Clarke

Wikiedits bots are thought to promote more transparent, accountable government because they expose the Wikipedia editing practices of public officials, especially important when those edits are part of partisan battles between political staff, or enable the spread of misinformation and propaganda by properly neutral public servants. However, far from bolstering democratic accountability, these bots may have a perverse effect on democratic governance. Early evidence suggests that the Canadian Wikiedits bot (@gccaedits) may be contributing to a chilling effect wherein public servants and political staff are editing Wikipedia less or editing in ways that are harder to track in order to avoid the scrutiny that these bots enable (Ford et al, 2016). The extent to which this chilling effect shapes public officials’ willingness to edit Wikipedia openly (or at all), and the role the bot plays in inducing this chilling effect, remain open questions ripe for investigation. Focusing on the bot tracking activity in the Government of Canada (@gccaedits), this paper reports on the findings of in-depth interviews with public and political officials responsible for Wikipedia edits as well as analysis of internal government documents related to the bot (retrieved through Access to Information requests).

We find that internal institutional policies, constraints of the Westminster system of democracy (which demands public servants remain anonymous, and that all communications be tightly managed in strict hierarchical chains of command), paired with primarily negative media reporting of the @gccaedits bot, have inhibited Wikipedia editing. This poses risks to the quality of democratic governance in Canada. First, many edits revealed by the bot are in fact useful contributions to knowledge, and reflect the elite and early insider insight of public officials. At a larger level, these edits represent novel and significant disruptions to a public sector communications culture that has not kept pace with the networked models of information production and dissemination that characterize the digital age. In this sense, the administrative and journalistic response to the bot’s reporting sets back important efforts to bolster Open Government and digital era public service renewal. Detailing these costs, and analysing the role of the bot and human responses to it, this paper suggests how wikiedit bots shape digital era governance.

4. Bots and Users
Jonas Kaiser

Users interact online with bots on a daily basis. They tweet, upvote or comment, in short: participate in many different communities and are involved in shaping the user’s perceptions. Based on this experience the users’ perspective on bots may differ significantly from journalists, bot creators or political actors. Yet it is being ignored in the literature up to now. As such we are missing an integral perspective on bots that may help us to understand how the societal discourse surrounding bots is structured. To analyze how and in which context users talk about transparency bots specifically a content analysis and topic analysis of Reddit comments from 86 posts in 48 subreddits on the issue of Wikiedits bots will be conducted. This proposal’s research focuses on two major aspects: how Reddit users 1) frame and with what other 2) topics they associate transparency bots.

Framing in this context is understood as “making sense of relevant events, suggesting what is at issue” (Gamson & Modigliani, 1989, p. 3). Even though some studies have shown, for example, how political actors frame bots (Ford, Dubois, & Puschmann, 2016) a closer look at the user’s side is missing. But this perspective is important as non-elite users may have a different view than the more elite political actors that can help us understand in how they interpret bots. This overlooked perspective, then, could have meaningful implications for political actors or bot creators. At the same time it is important to understand the broader context of the user discourse on transparency bots to properly connect the identified frames with overarching topics. Hence an automated topic modeling approach (Blei, Ng & Jordan, 2003) is chosen to identify the underlying themes within the comments. By combining frame analysis with topic modeling this project will highlight the way users talk about transparency bots and in which context they do so and thus emphasize the role of the users within the broader public discourse on bots.

Bibliography

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.

Callon, M. (1986). “Some Elements of a Sociology of Translation: Domestication of the Scallops and the Fishermen of St Brieuc Bay”. In John Law (ed.), Power, Action and Belief: A New Sociology of Knowledge (London: Routledge & Kegan Paul).

Ford, H., Dubois, E., & Puschmann, C. (2016). Automation, Algorithms, and Politics | Keeping Ottawa Honest—One Tweet at a Time? Politicians, Journalists, Wikipedians and Their Twitter Bots. International Journal of Communication, 10, 24.

Gamson, W. A., & Modigliani, A. (1989). Media Discourse and Public Opinion on Nuclear Power: A Constructionist Approach. American Journal of Sociology, 95(1), 1-37.

Graefe, A. (2016). Guide to automated journalism. http://towcenter.org/research/guide-to-automated-journalism/

Jasanoff, S. (2004). States of Knowledge: The Co-Production of Science and the Social Order. (London: Routledge Chapman & Hall)

Johri et al. (2016). Domain specific newsbots. Live automated reporting systems involving natural language communication. Paper presented at 2016 Computation + Journalism Symposium.

Lokot, T. & Diakopoulos, N. (2015). News bots: Automating news and information dissemination on Twitter. Digital Journalism. doi: 10.1080/21670811.2015.1081822

Woolley, S. C. (2016). Automating power: Social bot interference in global politics. First Monday. doi: 10.5210/fm.v21i4.6161

Woolley, S. C., & Howard, P. (2016). Bots unite to automate the presidential election. Retrieved Jun. 5, 2016, from http://www.wired.com/2016/05/twitterbots-2/

How Wikipedia’s silent coup ousted our traditional sources of knowledge

[Reposted from The Conversation, 15 January 2016]

As Wikipedia turns 15, volunteer editors worldwide will be celebrating with themed cakes and edit-a-thons aimed at filling holes in poorly covered topics. It’s remarkable that a user-editable encyclopedia project that allows anyone to edit has got this far, especially as the website is kept afloat through donations and the efforts of thousands of volunteers. But Wikipedia hasn’t just become an important and heavily relied-upon source of facts: it has become an authority on those facts.

Through six years of studying Wikipedia I’ve learned that we are witnessing a largely silent coup, in which traditional sources of authority have been usurped. Rather than discovering what the capital of Israel is by consulting paper copies of Encyclopedia Britannica or geographical reference books, we source our information online. Instead of learning about thermonuclear warfare from university professors, we can now watch a YouTube video about it.

The ability to publish online cheaply has led to an explosion in the number and range of people putting across facts and opinion than was traditionally delivered through largely academic publishers. But rather than this leading to an increase in the diversity of knowledge and the democratisation of expertise, the result has actually been greater consolidation in the number of knowledge sources considered authoritative. Wikipedia, particularly in terms of its alliance with Google and other search engines, now plays a central role. Continue reading “How Wikipedia’s silent coup ousted our traditional sources of knowledge”

What I’m talking about in 2016

Authority and authoritative sources, critical data studies, digital methods, the travel of facts online, bot politics and social media and politics. These are some of the things I’m talking about in 2016. (Just in case you thought the #sunselfies only indicated fun and aimless loafing).  

15 January Fact factories: How Wikipedia’s logics determine what facts are represented online. Wikipedia 15th birthday event, Oxford Internet Institute. [Webcast, OII event page, OII’s Medium post, The Conversation article]

29 January Wikipedia and me: A story in four acts. TEDx Leeds University. [Video, TEDx Leeds University site]

Abstract: This is a story about how I came to be involved in Wikipedia and how I became a critic. It’s a story about hope and friendship and failure, and what to do afterwards. In many ways this story represents the relationship that many others like me have had with the Internet: a story about enormous hope and enthusiasm followed by disappointment and despair. Although similar, the uniqueness of these stories is in the final act – the act where I tell you what I now think about the future of the Internet after my initial despair. This is my Internet love story in four acts: 1) Seeing the light 2) California rulz 3) Doubting Thomas 4) Critics unite. 

17 February. Add data to methods and stir. Digital Methods Summer School. CCI, Queensland University of Technology, Brisbane [QUT Digital Methods Summer School website]

Abstract: Are engagements with real humans necessary to ethnographic research? In this presentation, I argue for methods that connect data traces to the individuals who produce them by exploring examples of experimental methods featured on the site ‘EthnographyMatters.net’, such as live fieldnoting, collaborative mapmaking and ‘sensory postcards’.  This presentation will serve as an inspiration for new work that expands beyond disciplinary and methodological boundaries and connects the stories we tell about our things with the humans who create them.  

Continue reading “What I’m talking about in 2016”

Max Klein on Wikidata, “botpedia” and gender classification

Max Klein defines himself on his blog as a ‘Mathematician-Programmer, Wikimedia-Enthusiast, Burner-Yogi’ who believes in ‘liberty through wikis and logic’. I interviewed him a few weeks ago when he was in the UK for Wikimania 2014. He then wrote up some of his answers so that we could share with it others. Max is a long-time volunteer of Wikipedia who has occupied a wide range of roles as a volunteer and as a Wikipedian in residence for OCLC, among others. He has been working on Wikidata from the beginning but it hasn’t always been plain sailing. Max is outspoken about his ideas and he is respected for that, as well as for his patience in teaching those who want to learn. This interview serves as a brief introduction to Wikidata and some of its early disagreements. 

Max Klein in 2011. CC BY SA, Wikimedia Commons
Max Klein in 2011. CC BY SA, Wikimedia Commons

How was Wikidata originally seeded?
In the first days of Wikidata we used to call it a ‘botpedia’ because it was basically just an echo chamber of bots talking to each other. People were writing bots to import information from infoboxes on Wikipedia. A heavy focus of this was data about persons from authority files.

Authority files?
An authority file is a Library Science term that is basically a numbering system to assign authors unique identifiers. The point is to avoid a “which John Smith?” problem. At last year’s Wikimania I said that Wikidata itself has become a kind of “super authority control” because now it connects so many other organisations’ authority control (e.g. Library of Congress and IMDB). In the future I can imagine Wikidata being the one authority control system to rule them all.

In the beginning, each Wikipedia project was supposed to be able to decide whether it wanted to integrate Wikidata. Do you know how this process was undertaken?
It actually wasn’t decided site-by-site. At first only Hungarian, Italian, and Hebrew Wikipedias were progressive enough to try. But once English Wikipedia approved the migration to use Wikidata, soon after there was a global switch for all Wikis to do so (see the announcement here).

Do you think it will be more difficult to edit Wikipedia when infoboxes are linking to templates that derive their data from Wikidata? (both editing and producing new infoboxes?)
It would seem to complicate matters that infobox editing becomes opaque to those who aren’t Wikidata aware. However at Wikimania 2014, two Sergeys from Russian Wikipedia demonstrated a very slick gadget that made this transparent again – it allowed editing of the Wikidata item from the Wikipedia article. So with the right technology this problem is a nonstarter.

Can you tell me about your opposition to the ways in which Wikidata editors decided to structure gender information on Wikidata?
In Wikidata you can put a constraint to what values a property can have. When I came across it the “sex or gender” property said “only one of ‘male, female, or intersex'”. I was opposed to this because I believe that any way the Wikidata community structure the gender options, we are going to imbue it with our own bias. For instance already the property is called “sex or gender”, which shows a lack of distinction between the two, which some people would consider important. So I spent some time arguing that at least we should allow any value. So if you want to say that someone is “third gender” or even that their gender is “Sodium” that’s now possible. It was just an early case of heteronormativity sneaking into the ontology.

Wikidata uses a CC0 license which is less restrictive than the CC BY SA license that Wikipedia is governed by. What do you think the impact of this decision has been in relation to others like Google who make use of Wikidata in projects like the Google Knowledge Graph?
Wikidata being CC0 at first seemed very radical to me. But one thing I noticed was that increasingly this will mean where the Google Knowledge Graph now credits their “info-cards” to Wikipedia, the attribution will just start disappearing. This seems mostly innocent until you consider that Google is a funder of the Wikidata project. So in some way it could seem like they are just paying to remove a blemish on their perceived omniscience.

But to nip my pessimism I have to remind myself that if we really believe in the Open Source, Open Data credo then this rising tide lifts all boats.

Code and the (Semantic) City

Mark Graham and I have just returned from Maynooth in Ireland where we participated in a really great workshop called Code and the City organised by Rob Kitchin and his team at the Programmable City project. We presented a draft paper entitled, ‘Semantic Cities: Coded Geopolitics and Rise of the Semantic Web’ where we trace how the city of Jerusalem is represented across Wikipedia and through WikiData, Freebase and to Google’s Knowledge Graph in order to answer questions about how linked data and the semantic web changes a user’s interactions with the city. We’ve been indebted to the folks from all of these projects who have helped us navigate questions about the history and affordances of these projects so that we can better understand the current Web ecology. The paper is currently being revised and will be available soon, we hope!

Infoboxes and cleanup tags: Artifacts of Wikipedia newsmaking

Screen Shot 2014-09-02 at 2.06.05 PM
Infobox from the first version of the 2011 Egyptian Revolution (then ‘protests’) article on English Wikipedia, 25 January, 2011

My article about Wikipedia infoboxes and cleanup tags and their role in the development of the 2011 Egyptian Revolution article has just been published in the journal, ‘Journalism: Theory, Practice and Criticism‘ (a pre-print is available on Academia.edu). The article forms part of a special issue of the journal edited by C W Anderson and Juliette de Meyer who organised the ‘Objects of Journalism’ pre-conference at the International Communications Association conference in London that I attended last year. The issue includes a number of really interesting articles from a variety of periods in journalism’s history – from pica sticks to interfaces, timezones to software, some of which we covered in the August 2013 edition of ethnographymatters.net

My article is about infoboxes and cleanup tags as objects of Wikipedia journalism, objects that have important functions in the coordination of editing and writing by distributed groups of editors. Infoboxes are summary tables on the right hand side of an article that enable readability and quick reference, while cleanup tags are notices at the head of an article warning readers and editors of specific problems with articles. When added to an article, both tools simultaneously notify editors about missing or weak elements of the article and add articles to particular categories of work.

The article contains an account of the first 18 days of the protests that resulted in the resignation of then-president Hosni Mubarak based on interviews with a number of the article’s key editors as well as traces in related articles, talk pages and edit histories. Below is a selection from what happened on day 1:

Day 1: 25 January, 2011 (first day of the protests)

The_Egyptian_Liberal published the article on English Wikipedia on the afternoon of what would become a wave of protests that would lead to the unseating of President Hosni Mubarak. A template was used to insert the ‘uprising’ infobox to house summarised information about the event including fields for its ‘characteristics’, the number of injuries and fatalities. This template was chosen from a range of other infoboxes relating to history and events on Wikipedia, but has since been deleted in favor of the more recently developed ‘civil conflict’ infobox with fields for ‘causes’, ‘methods’ and ‘results’.

The first draft included the terms ‘demonstration’, ‘riot’ and ‘self-immolation’ in the ‘characteristics’ field and was illustrated by the Latuff cartoon of Khaled Mohamed Saeed and Hosni Mubarak with the caption ‘Khaled Mohamed Saeed holding up a tiny, flailing, stone-faced Hosni Mubarak’. Khaled Mohamed Saeed was a young Egyptian man who was beaten to death reportedly by Egyptian security forces and the subject of the Facebook group ‘We are all Khaled Said’ moderated by Wael Ghonim that contributed to the growing discontent in the weeks leading up to 25 January, 2011. This would ideally have been a filled by a photograph of the protests, but the cartoon was used because the article was uploaded so soon after the first protests began. It also has significant emotive power and clearly represented the perspective of the crowd of anti-Mubarak demonstrators in the first protests.

Upon publishing, three prominent cleanup tags were automatically appended to the head of the article. These included the ‘new unreviewed article’ tag, the ‘expert in politics needed’ tag and the ‘current event’ tag, warning readers that information on the page may change rapidly as events progress. These three lines of code that constituted the cleanup tags initiated a complex distribution of tasks to different groups of users located in work groups throughout the site: page patrollers, subject experts and those interested in current events.

The three cleanup tags automatically appended to the article when it was published at UTC 13:27 on 25 January, 2011
The three cleanup tags automatically appended to the article when it was published at UTC 13:27 on 25 January, 2011

Looking at the diffs in the first day of the article’s growth, it becomes clear that the article is by no means a ‘blank slate’ that editors fill progressively with prose. Much of the activity in the first stage of the article’s development consisted of editors inserting markers or frames in the article that acted to prioritize and distribute work. Cleanup tags alerted others about what they believed to be priorities (to improve weak sections or provide political expertise, for example) while infoboxes and tables provided frames for editors to fill in details iteratively as new information became available.

By discussing the use of these tools in the context of Bowker and Star’s theories of classification (2000), I argue that these tools are not only material but also conceptual and symbolic. They facilitate collaboration by enabling users to fill in details according to a pre-defined set of categories and by catalyzing notices that alert others to the work that they believe needs to be done on the article. Their power, however, cannot only be seen in terms of their functional value. These artifacts are deployed and removed as acts of social and strategic power play among Wikipedia editors who each want to influence the narrative about what happened and why it happened. Infoboxes and tabular elements arise as clean, simple, well-referenced numbers out of the messiness and conflict that gave rise to them. When cleanup tags are removed, the article develops an implicit authority, appearing to rise above uncertainty, power struggles and the impermanence of the compromise that it originated from.

This categorization practice enables editors to collaborate iteratively with one another because each object signals work that needs to be done by others in order to fill in the gaps of the current content. In addition to this functional value, however, categorization also has a number of symbolic and political consequences. Editors are engaged in a continual practice of iterative summation that contributes to an active construction of the event as it happens rather than a mere assembling of ‘reliable sources’. The deployment and removal of cleanup tags can be seen as an act of power play between editors that affects readers’ evaluation of the article’s content. Infoboxes are similar sites of struggle whose deployment and development result in an erasure of the contradictions and debates that gave rise to them. These objects illuminate how this novel journalistic practice has important implications for the way that political events are represented.