Towards a “knowledge gap index” for Wikipedia

In January 2019, prompted by the Wikimedia Movement’s 2030 strategic direction, the Research team at the Wikimedia Foundation identified the need to develop a knowledge gaps index—a composite index to support the decision makers across the Wikimedia movement by providing: a framework to encourage structured and targeted brainstorming discussions; data on the state of the knowledge gaps across the Wikimedia projects that can inform decision making and assist with measuring the long term impact of large scale initiatives in the Movement.

Below is the text of my response to the first draft of the release. We were encouraged to submit examples of our own research to support our critique. The Research team responded to many of my comments in their final version, as the first step towards building the knowledge gap index.

This is a really great effort and it’s no small feat to gather all this research together in one frame. My comments all surround one missing piece in this, and that is the issue of power. For me, it is power that is missing from the document as a whole – the recognition that editing Wikipedia is as much about power as it is about internet access or demographic features. Framing Wikipedia’s problems as gaps that need to be filled is a mistake because it doesn’t enable us to see how Wikipedia is a system governed by unequal power dynamics that determine who is able to be an effective contributor. More specific comments below:

  • In Section 3, you leave out technical contributors from your definition of contributor. I understand why you might do this but I think it is a mistake as you note: “software and choices made in its design certainly are highly impactful on what types of contributors feel supported and what content is created.” As argued in my paper with Judy Wajcman (Ford, H., & Wajcman, J. (2017). ‘Anyone can edit’, not everyone does: Wikipedia’s infrastructure and the gender gap. Social Studies of Science, 47(4), 511–527. https://doi.org/10.1177/0306312717692172) gendering on Wikipedia happens at the level of infrastructure and code and it matters who is developing software tools.
  • In Section 3.1.4, you frame language fluency as less important given that “lower fluency individuals can be important for effective patrolling in small wikis [114], increase the diversity of contributors, and allow for the cross-pollination of content that might otherwise remain locked up in other languages [74]”. But it is important to recognise that there are potential problems when editors from powerful language groups (Europe and North America) contribute to small language encyclopedias (e.g. see Cebuano Wikipedia). https://www.quora.com/Why-are-there-so-many-articles-in-the-Cebuano-language-on-Wikipedia
  • In Section 2.1.7 you write about “ethnicity and race” in the context of “Sociodemographic gaps”. I worry that we have virtually no critical race scholarship of Wikipedia and that the sentence you begin with “Ethnicity and race are very contextual as to what it means about an individual’s status and access to resources” downplays the extent to which Wikipedia is a project that prioritises knowledges from white, European cultures. It seems to be a significant gap in our research, one which this strategy will not solve given the emphasis on metrics as an evaluation tool. I urge the group to discuss *this* severe gap with critical race scholars and to start a conversation about race and Wikipedia.
  • In Section 3.3.3, you write about the “tech skills gap” and the research that has found that “high internet skills are associated with an increase in awareness that Wikipedia can be edited and having edited Wikipedia” so that “edit-a-thons in particular can help to bridge this gap”. In some early work with Stuart Geiger, we noted that it isn’t just tech skills that are required to become an empowered member of the Wikimedia community. Rather, it is about “trace literacy” – “the organisational literacy that is necessary to be an empowered, literate member of the Wikimedia community”. We wrote that Literacy is a means of exercising power in Wikipedia. Keeping traces obscure help the powerful to remain in power and to keep new editors from being able to argue effectively or even to know that there is a space to argue or who to argue with in order to have their edits endure.” Our recommendation was that “Wikipedia literacy needs to engage with the social and cultural aspects of article editing, with training materials and workshops provided the space to work through particularly challenging scenarios that new editors might find themselves in and to work out how this fits within the larger organizational structure.” Again, this is about power not skills. (There is a slideshow of the paper from OpenSym conference and the paper is at https://www.opensym.org/ws2012/p21wikisym2012.pdf and https://dl.acm.org/doi/10.1145/2462932.2462954)
  • Section 4.1 looks at “Policy Gaps” although I’m not sure it is appropriate to talk about policies here as gaps? What’s missing here is notability policies and it is in the notability guidelines where the most power to keep Other voices out is exercised. More work needs to be done to investigate this but the paper above is a start (and perhaps there are others).
  • Section 4.2.2 talks about “Structured data” as a way of improving knowledge diversity and initiatives such as Abstract Wikipedia aiming to “close (the) gap”. Authors should recognise that structured data is not a panacea and that there have been critiques of these programmes within Wikipedia and by social scientists (see, for example, https://journals.sagepub.com/doi/full/10.1177/0263775816668857 Ford, H., & Graham, M. (2016). Provenance, power and place: Linked data and opaque digital geographies. Environment and Planning D: Society and Space, 34(6), 957–970. https://doi.org/10.1177/0263775816668857 open access at https://ora.ox.ac.uk/catalog/uuid:b5756cd4-6d1e-4da1-971e-37b384cd18ca/download_file?file_format=pdf&safe_filename=EPD_final.pdf&type_of_work=Journal+article)
  • The authors point to metrics and studies of the underlying causes of Wikipedia’s gaps in order to evaluate where the gaps are and where they come from. It is very important to recognise that metrics alone will not solve the problem, but I’m dismayed to see how little has been cited in terms of causes and interventions and that the only two papers cited are a literature review and a quantitative study. Quantitative research alone will not enable us to understand causes of Wikipedia’s inequality problems and qualitative and mixed methods research are, indeed, more appropriate methods for asking why questions here. For example, this study that I conducted with Wikipedians helped us to understand that the usual interventions such as editathons and training would not help to fill targeted gaps in articles relating to the South African education curriculum. Instead, the focus needed to be on bringing outsiders in – but not by forcing them to edit directly on wiki – this simply wouldn’t happen, but to find ways of negotiation required for engaging with new editor groups in the long-term project of filling Wikipedia’s gaps. Again, the focus is on the social and cultural aspects of Wikipedia and an emphasis on power. (See https://journals.sagepub.com/doi/full/10.1177/1461444818760870 and https://osf.io/preprints/socarxiv/qn5xd)
  • In terms of the methodology of this review, I noticed that the focus is on the field of “computational social science” which “tries to characterise and quantify different aspects of Wikimedia communities using a computational approach”. I strongly urge the authors to look beyond computational social science to the social science and humanities venues (including STS journals like Science, Technology and Human Values and the Social Studies of Science as well as media studies venues such as New Media and Society).

Also, I’m unsure what this document means in terms of research strategy, but I recommend three main gaps that could be addressed in a future version of this:

  1. A closer engagement with social science literature including critical data studies, media studies, STS to think about causes of Wikipedia inequality.
  2. A dialogue with critical race scholars in order to chart a research agenda to investigate this significant gap in Wikipedia research.
  3. A moment to think about the framing of the problem in terms of “gaps” is the most effective way of understanding the system-wide inequalities within Wikipedia and Wikimedia.

Finally, I believe that regular demographic surveys of Wikimedia users would be incredibly helpful for research and would move us beyond the data that we can regularly access (i.e. metrics) which, as you point out, does not reveal the diversity of our communities. I wish that I had more time to point to other research here, but I work for a university that is under severe strain at the moment and this was all I managed to find time for. I hope it is useful and I look forward to the next version of this!–Hfordsa (talk) 00:47, 30 September 2020 (UTC)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s