Max Klein on Wikidata, “botpedia” and gender classification

Max Klein defines himself on his blog as a ‘Mathematician-Programmer, Wikimedia-Enthusiast, Burner-Yogi’ who believes in ‘liberty through wikis and logic’. I interviewed him a few weeks ago when he was in the UK for Wikimania 2014. He then wrote up some of his answers so that we could share with it others. Max is a long-time volunteer of Wikipedia who has occupied a wide range of roles as a volunteer and as a Wikipedian in residence for OCLC, among others. He has been working on Wikidata from the beginning but it hasn’t always been plain sailing. Max is outspoken about his ideas and he is respected for that, as well as for his patience in teaching those who want to learn. This interview serves as a brief introduction to Wikidata and some of its early disagreements. 

Max Klein in 2011. CC BY SA, Wikimedia Commons
Max Klein in 2011. CC BY SA, Wikimedia Commons

How was Wikidata originally seeded?
In the first days of Wikidata we used to call it a ‘botpedia’ because it was basically just an echo chamber of bots talking to each other. People were writing bots to import information from infoboxes on Wikipedia. A heavy focus of this was data about persons from authority files.

Authority files?
An authority file is a Library Science term that is basically a numbering system to assign authors unique identifiers. The point is to avoid a “which John Smith?” problem. At last year’s Wikimania I said that Wikidata itself has become a kind of “super authority control” because now it connects so many other organisations’ authority control (e.g. Library of Congress and IMDB). In the future I can imagine Wikidata being the one authority control system to rule them all.

In the beginning, each Wikipedia project was supposed to be able to decide whether it wanted to integrate Wikidata. Do you know how this process was undertaken?
It actually wasn’t decided site-by-site. At first only Hungarian, Italian, and Hebrew Wikipedias were progressive enough to try. But once English Wikipedia approved the migration to use Wikidata, soon after there was a global switch for all Wikis to do so (see the announcement here).

Do you think it will be more difficult to edit Wikipedia when infoboxes are linking to templates that derive their data from Wikidata? (both editing and producing new infoboxes?)
It would seem to complicate matters that infobox editing becomes opaque to those who aren’t Wikidata aware. However at Wikimania 2014, two Sergeys from Russian Wikipedia demonstrated a very slick gadget that made this transparent again – it allowed editing of the Wikidata item from the Wikipedia article. So with the right technology this problem is a nonstarter.

Can you tell me about your opposition to the ways in which Wikidata editors decided to structure gender information on Wikidata?
In Wikidata you can put a constraint to what values a property can have. When I came across it the “sex or gender” property said “only one of ‘male, female, or intersex'”. I was opposed to this because I believe that any way the Wikidata community structure the gender options, we are going to imbue it with our own bias. For instance already the property is called “sex or gender”, which shows a lack of distinction between the two, which some people would consider important. So I spent some time arguing that at least we should allow any value. So if you want to say that someone is “third gender” or even that their gender is “Sodium” that’s now possible. It was just an early case of heteronormativity sneaking into the ontology.

Wikidata uses a CC0 license which is less restrictive than the CC BY SA license that Wikipedia is governed by. What do you think the impact of this decision has been in relation to others like Google who make use of Wikidata in projects like the Google Knowledge Graph?
Wikidata being CC0 at first seemed very radical to me. But one thing I noticed was that increasingly this will mean where the Google Knowledge Graph now credits their “info-cards” to Wikipedia, the attribution will just start disappearing. This seems mostly innocent until you consider that Google is a funder of the Wikidata project. So in some way it could seem like they are just paying to remove a blemish on their perceived omniscience.

But to nip my pessimism I have to remind myself that if we really believe in the Open Source, Open Data credo then this rising tide lifts all boats.

Infoboxes and cleanup tags: Artifacts of Wikipedia newsmaking

Screen Shot 2014-09-02 at 2.06.05 PM
Infobox from the first version of the 2011 Egyptian Revolution (then ‘protests’) article on English Wikipedia, 25 January, 2011

My article about Wikipedia infoboxes and cleanup tags and their role in the development of the 2011 Egyptian Revolution article has just been published in the journal, ‘Journalism: Theory, Practice and Criticism‘ (a pre-print is available on Academia.edu). The article forms part of a special issue of the journal edited by C W Anderson and Juliette de Meyer who organised the ‘Objects of Journalism’ pre-conference at the International Communications Association conference in London that I attended last year. The issue includes a number of really interesting articles from a variety of periods in journalism’s history – from pica sticks to interfaces, timezones to software, some of which we covered in the August 2013 edition of ethnographymatters.net

My article is about infoboxes and cleanup tags as objects of Wikipedia journalism, objects that have important functions in the coordination of editing and writing by distributed groups of editors. Infoboxes are summary tables on the right hand side of an article that enable readability and quick reference, while cleanup tags are notices at the head of an article warning readers and editors of specific problems with articles. When added to an article, both tools simultaneously notify editors about missing or weak elements of the article and add articles to particular categories of work.

The article contains an account of the first 18 days of the protests that resulted in the resignation of then-president Hosni Mubarak based on interviews with a number of the article’s key editors as well as traces in related articles, talk pages and edit histories. Below is a selection from what happened on day 1:

Day 1: 25 January, 2011 (first day of the protests)

The_Egyptian_Liberal published the article on English Wikipedia on the afternoon of what would become a wave of protests that would lead to the unseating of President Hosni Mubarak. A template was used to insert the ‘uprising’ infobox to house summarised information about the event including fields for its ‘characteristics’, the number of injuries and fatalities. This template was chosen from a range of other infoboxes relating to history and events on Wikipedia, but has since been deleted in favor of the more recently developed ‘civil conflict’ infobox with fields for ‘causes’, ‘methods’ and ‘results’.

The first draft included the terms ‘demonstration’, ‘riot’ and ‘self-immolation’ in the ‘characteristics’ field and was illustrated by the Latuff cartoon of Khaled Mohamed Saeed and Hosni Mubarak with the caption ‘Khaled Mohamed Saeed holding up a tiny, flailing, stone-faced Hosni Mubarak’. Khaled Mohamed Saeed was a young Egyptian man who was beaten to death reportedly by Egyptian security forces and the subject of the Facebook group ‘We are all Khaled Said’ moderated by Wael Ghonim that contributed to the growing discontent in the weeks leading up to 25 January, 2011. This would ideally have been a filled by a photograph of the protests, but the cartoon was used because the article was uploaded so soon after the first protests began. It also has significant emotive power and clearly represented the perspective of the crowd of anti-Mubarak demonstrators in the first protests.

Upon publishing, three prominent cleanup tags were automatically appended to the head of the article. These included the ‘new unreviewed article’ tag, the ‘expert in politics needed’ tag and the ‘current event’ tag, warning readers that information on the page may change rapidly as events progress. These three lines of code that constituted the cleanup tags initiated a complex distribution of tasks to different groups of users located in work groups throughout the site: page patrollers, subject experts and those interested in current events.

The three cleanup tags automatically appended to the article when it was published at UTC 13:27 on 25 January, 2011
The three cleanup tags automatically appended to the article when it was published at UTC 13:27 on 25 January, 2011

Looking at the diffs in the first day of the article’s growth, it becomes clear that the article is by no means a ‘blank slate’ that editors fill progressively with prose. Much of the activity in the first stage of the article’s development consisted of editors inserting markers or frames in the article that acted to prioritize and distribute work. Cleanup tags alerted others about what they believed to be priorities (to improve weak sections or provide political expertise, for example) while infoboxes and tables provided frames for editors to fill in details iteratively as new information became available.

By discussing the use of these tools in the context of Bowker and Star’s theories of classification (2000), I argue that these tools are not only material but also conceptual and symbolic. They facilitate collaboration by enabling users to fill in details according to a pre-defined set of categories and by catalyzing notices that alert others to the work that they believe needs to be done on the article. Their power, however, cannot only be seen in terms of their functional value. These artifacts are deployed and removed as acts of social and strategic power play among Wikipedia editors who each want to influence the narrative about what happened and why it happened. Infoboxes and tabular elements arise as clean, simple, well-referenced numbers out of the messiness and conflict that gave rise to them. When cleanup tags are removed, the article develops an implicit authority, appearing to rise above uncertainty, power struggles and the impermanence of the compromise that it originated from.

This categorization practice enables editors to collaborate iteratively with one another because each object signals work that needs to be done by others in order to fill in the gaps of the current content. In addition to this functional value, however, categorization also has a number of symbolic and political consequences. Editors are engaged in a continual practice of iterative summation that contributes to an active construction of the event as it happens rather than a mere assembling of ‘reliable sources’. The deployment and removal of cleanup tags can be seen as an act of power play between editors that affects readers’ evaluation of the article’s content. Infoboxes are similar sites of struggle whose deployment and development result in an erasure of the contradictions and debates that gave rise to them. These objects illuminate how this novel journalistic practice has important implications for the way that political events are represented.