Collaborative filtering: the problem with classification

In the wake of anxiety about the “trustworthiness” of Wikipedia articles and the different quality levels of articles, a number of tools have been developed to automatically determine how trustworthy content on Wikipedia is. The first is ‘content-based filtering’, the second is ‘collaborative filtering’. Content based filtering uses properties of the text itself to automatically gauge trustworthiness. WikiTrust, a MediaWiki extension, for example, enables users to view a Wikipedia article with text highlighted in different shades according to the reputation of the author, which in turn is based on the number of times that the authors’ contributions have been reverted (lower reputation) or preserved (higher reputation). Collaborative filtering, on the other hand, is based on the subjective evaluations of Wikipedia articles by users. The Wikipedia Recommender System (WRS), for example, uses the ratings of users similar to you to predict how you might evaluate articles in the future.

Christian Jensen, Povilas Pilkauskas and Thomas Lefevre are behind the WRS and have just published ‘Classification of Recommender Expertise in the Wikipedia Recommender System’ detailing the next version of their prototype. Jensen et al explain that the key problem of collaborative filtering systems is that a single user has varying levels of expertise in different areas and will therefore be good at rating some articles and not-so-good at rating others. You may have provided an accurate rating of an article about drag racing, for example, but that does not mean that you should automatically be believed when you provide feedback about painters from the Italian Renaissance. The first version of WRS worked with this kind of simplistic recommendation system and the team wanted to advance it to take account of a user’s expertise in different areas.

Now comes the hard part. In order for the system to take note of which topics users were proficient in, they had to decide what classification system they were going to use for all the information on Wikipedia (i.e. all “human knowledge”). They looked at the hundreds of classification schemes in order to find the “best” – “best” here being one that was intuitive, easy-to-use, complete, concise, useful and most importantly, unambiguous. If they were going to match up a user’s scores in a particular area, they needed to make sure that they were defining that area accurately enough, and on an encyclopedia like Wikipedia where there are no definitive categories for articles, this would prove pretty difficult.

They ran a bunch of tests to see how users would classify articles on Albert Einstein, for example, which could be classified as natural science by a physics student but as a biography by a history student. The online survey, completed by 130 people from “different countries and continents” (not sure which) asked informants to read four Wikipedia articles and categorize them according to one of four classification schemes. The classification schemes included categories from Wikiportals, Citizendium, the Dewey Decimal Classification scheme and the Open Directory Project (Dmoz), with Dmoz proving least ambiguous to users.

They then outline the new version of the WRS prototype that now includes an assessment of the expertise of recommenders according to the Dmoz scheme. They consider how reputation is affected by the two parameters: ratings and categories. If the user and the recommender agree on the quality of the article (quality is the primary parameter) but disagree on the category (category is the secondary parameter) then they apply a +1/2 value to the interaction. If the user and recommender disagree on both the rating and the category, then they consider the majority of the other recommenders and apply a -1/2 penalty to the interaction. If the majority agrees with the user, then the trust dynamics will be scored as -3/2; if the majority agrees with the recommender, then a score of -1/2 will be applied.

It’s an interesting paper and the algorithms are always fascinating but its important to realise that these systems are limited and should not hold too much power over us. The Dmoz categories a very specific and limited framework (arts, business, health, recreation, games etc) whereas our individual and communal expertise and interest often spans very different categories. I may be an expert in biographies but be weak in scientific theories; I may be expert at a broad range of topics from my country but understand little about topics in the same broad fields from other countries. When looking at designing similar systems for Ushahidi we need to make sure that there is an opportunity for feedback beyond the limited categories as well. That means encouraging people to make notes about why they rate one source more highly than another in a particular context and situation. It’s good to start somewhere and refine as we go along, and so projects like the WRS are incredibly helpful as we go down this road.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s