Last month’s Wired magazine showed an infographic with a headline that read: ‘History’s most influential people, ranked by Wikipedia reach’ with a group of 20 men arranged in hierarchical order — from Jesus at number 1 to Stalin at number 20. Curious, I wondered how ‘influence’ and ‘Wikipedia reach’ was being decided. According to the article, ‘Rankings (were) based on parameters such as the number of language editions in which that person has a page, and the number of people known to speak those languages’. What really surprised me was not the particular arrangement of figures on this page but the conclusions that were being drawn from it.
According to the piece, César Hidalgo, head of the Media Lab’s Macro Connections group, who researched the data, made the following claims about the data gathered from Wikipedia:
a) “It shows you how the world perceives your own national culture.”
b) “It’s a socio-cultural mirror.”
c) “We use historical characters as proxies for culture.”
And finally, perhaps most surprising is this final line in the story:
Using this quantitative approach, Hidalgo is now testing hypotheses such as whether cultural development is structured or random. “Can you have a Steve Jobs in a country that has not generated enough science or technology?” he wonders. “Ultimately we want to know how culture assembles itself.”
It is difficult to comment on the particular method used by this study because there is little more than the diagram and a few paragraphs of analysis, and the journalist may have misquoted him, but I wanted to draw attention to the statements being made because I think it represents the growing phenomenon of big data analysts using Wikipedia data to make assumptions about ‘culture’. Continue reading “Why Wikipedia is no ‘proxy for culture’ (Part 1 of 3)”