Wikipedia Isn’t Journalism, But Are Wikipedians Reluctant Journalists?

Cross-posted from PBS Idea Lab

Wikipedia articles on breaking news stories dominate page views on the world’s sixth-largest website. Perhaps more importantly, these articles drive the most significant editor contribution — especially among new editors.

WikipediaLogo.jpg

In the first three months of this year, English Wikipedia articles with the most contributors were the 2011 Tucson shooting, the 2011 Egyptian revolution and the 2011 Tōhoku earthquake and tsunami articles with 460, 405 and 785 editors contributing to the growth of the article respectively.

Interestingly, a number of Wikipedia policies discourage writing articles on breaking news. One of Wikipedia’s 42 policies, titled “What Wikipedia is not” (or WP:NOT), highlights that the site is, above all, an encyclopedia, not a newspaper (Wikipedia:NotNewspaper). The policy states that although the encyclopedia needs to include current and up-to-date information as well as standalone articles on “significant current events,” not all verifiable events are suitable for inclusion in Wikipedia.

Wikipedia articles are not journalism

According to the policy, “Wikipedia should not offer first-hand news reports on breaking stories” because “Wikipedia is not a primary source.” The encyclopedia has a tenuous relationship with primary sources. Policy states that primary sources, “accounts written by people who are directly involved in an event, offering an insider’s view of an event” are (mostly) inappropriate because Wikipedia strives to represent a “Neutral Point of View” (NPOV), and primary sources can be misused to reflect a fringe theory as mainstream. NPOV is one of the five pillars of Wikipedia and frames to a large degree what is allowed into the encyclopedia and what is left out.

News reports on a breaking news story require that Wikipedians use primary sources to update the rapidly evolving articles on issues like death counts after an earthquake. While journalists are able to use primary sources to make a judgment on the death count at the time of publishing and then do the same using new sources when they write successive stories, Wikipedians must do the same collectively and iteratively as new versions are created every few seconds.

In the Japanese earthquake article, this challenge resulted in contradictory facts about the height of the tsunami and the death tolls in the same article, prompting one editor (“Dcoetzee”) to create templates for the number of missing and the dead casualties that could be edited once with changes immediately reflected in every part of the page (see Keegan, Gergle and Contractor’s Hot off the Wiki: Dynamics, Practices, and Structures in Wikipedia’s Coverage of the Tohoku Catastrophes).

Wikipedia articles are not news reports

The barrier to entry into Wikipedia articles is notability: Subjects must be notable enough to create enduring articles on the encyclopedia. According to policy, while news reporting covers announcements, sports news or celebrities, the fact that something is “in the news” is not a sufficient basis for inclusion in the encyclopaedia. Notability is difficult, perhaps impossible to predict directly after an event, and can result in historical events being described in purely modern terms or an article being created about something noteworthy at a particular time which later might not meet notability requirements.

Wikipedians call this “recentism” and have a tag to make it transparent to readers that the article might be skewed towards “recent perspectives.” In an essay on “recentism,” Wikipedians describe the phenomenon as “writing or editing without a long-term, historical view, thereby inflating the importance of a topic that has received recent public attention.”

Both the “Wikipedia articles are not: Journalism” and “Wikipedia articles are not: News reports” policies recommend moving timely news subjects to WikiNews, a sister project to Wikipedia that allows use of primary sources and is intended to be a primary source. But WikiNews has suffered from a low contributor base and disagreement among contributors about the best way to build the news portal.

In September, a large portion of the Wikinews contributor base announced on the Foundation-l mailing list that they had forked the project and started OpenGlobe” after becoming deeply dissatisfied with Wikinews.”

Wikipedia articles are not who’s who

The third item of “Wikipedia:NotNewspaper” explains that, even when an event is notable, individuals involved in it may not be. This policy speaks to the need for enduring articles that will still be notable in the years after the event. While newspapers are often concerned with explaining events through the people affected by such events, Wikipedia wishes to take the long-term view, attempting to avoid cases that give undue weight to the person or event and thus conflict with NPOV.

japan.jpg 

The first rough draft of history?

It took just 11 minutes for the Japanese Wikipedia to create an article after the 9.0-magnitude undersea megathrust earthquake occurred off the coast of Japan on March 11. Twenty-one minutes later, the English Wikipedia article was created, and although the wire services reported the earthquake within minutes, The New York Times did not file a full story until more than three hours after the earthquake hit.

Despite the distinct discouragement of reporting on current news item for reasons mentioned above, Wikipedia has become the site of major activity around large news events like this one. The ability of anyone to edit the encyclopedia and the lack of any restrictions on editing articles, as well as the fact that notability is a relative concept, means that Wikipedia policy cannot stop the hundreds of editors who flock to the encyclopedia driven by a single purpose to work on a particular page.

But if Wikipedia and not the news media is the first rough draft of history, what does this mean for Neutral Point of View? If Wikipedians are evaluating and synthesizing primary sources rather than sources who have already evaluated the importance of an event, is Wikipedia at the risk of becoming subjective? Consensus may be more easily achieved when the event is a natural disaster, but when it’s a war or a revolution and the editors’ motivations are different, then the same architectural flexibilities can lock articles into disagreement.

Wikipedia may be a reluctant journalist, but its influence on the media landscape is unmistakable.

Why Wikipedia articles are deleted

Stuart Geiger and I just presented some research at WikiSym on why Wikipedia articles are deleted through both the speedy deletion or “CSD” process, a unilateral process whereby administrators can deleted problematic articles without discussion, and the articles for deletions or “AfD” process whereby articles discuss whether articles should be deleted. You might imagine that the majority of CSDs are deleted because of spam or vandalism, but interestingly, we found that the majority (the blue chunk in the chart on the left here) are deleted because of they lack any ‘indication of importance’. We also found that the deletion process is heavily frequented by a relatively small number of longstanding users.

Our key findings include:

1. About half of all deleted articles from June ’07 to Jan ’11 were unilaterally deleted by administrators via the CSD process.
2. Surprisingly, spam, vandalism and patent nonsense make up only 8.00%, 5.69% and 5.36% of CSDs respectively, while the more subjective ‘No indication of importance’ makes up 38.47% of all CSD criteria.
3. With some outliers, AfD discussions have few participants, and those participants are overwhelmingly regulars to the process. 74% of all AfDs are made up entirely of users who have previously participated in an AfD, and 18% of all AfDs only have one newcomer. You can read more on the PDFs below but there’s also a lot of great research by other authors at WikiSym and on from the Wikimedia Foundation’s Summer of Research program.

Poster PDF |  Participation in Wikipedia’s Article Deletion Processes (WikiSym accepted poster research) PDF

Collaborative filtering: the problem with classification

In the wake of anxiety about the “trustworthiness” of Wikipedia articles and the different quality levels of articles, a number of tools have been developed to automatically determine how trustworthy content on Wikipedia is. The first is ‘content-based filtering’, the second is ‘collaborative filtering’. Content based filtering uses properties of the text itself to automatically gauge trustworthiness. WikiTrust, a MediaWiki extension, for example, enables users to view a Wikipedia article with text highlighted in different shades according to the reputation of the author, which in turn is based on the number of times that the authors’ contributions have been reverted (lower reputation) or preserved (higher reputation). Collaborative filtering, on the other hand, is based on the subjective evaluations of Wikipedia articles by users. The Wikipedia Recommender System (WRS), for example, uses the ratings of users similar to you to predict how you might evaluate articles in the future.

Christian Jensen, Povilas Pilkauskas and Thomas Lefevre are behind the WRS and have just published ‘Classification of Recommender Expertise in the Wikipedia Recommender System’ detailing the next version of their prototype. Jensen et al explain that the key problem of collaborative filtering systems is that a single user has varying levels of expertise in different areas and will therefore be good at rating some articles and not-so-good at rating others. You may have provided an accurate rating of an article about drag racing, for example, but that does not mean that you should automatically be believed when you provide feedback about painters from the Italian Renaissance. The first version of WRS worked with this kind of simplistic recommendation system and the team wanted to advance it to take account of a user’s expertise in different areas.

Now comes the hard part. In order for the system to take note of which topics users were proficient in, they had to decide what classification system they were going to use for all the information on Wikipedia (i.e. all “human knowledge”). They looked at the hundreds of classification schemes in order to Continue reading “Collaborative filtering: the problem with classification”