Collaborative filtering: the problem with classification

In the wake of anxiety about the “trustworthiness” of Wikipedia articles and the different quality levels of articles, a number of tools have been developed to automatically determine how trustworthy content on Wikipedia is. The first is ‘content-based filtering’, the second is ‘collaborative filtering’. Content based filtering uses properties of the text itself to automatically gauge trustworthiness. WikiTrust, a MediaWiki extension, for example, enables users to view a Wikipedia article with text highlighted in different shades according to the reputation of the author, which in turn is based on the number of times that the authors’ contributions have been reverted (lower reputation) or preserved (higher reputation). Collaborative filtering, on the other hand, is based on the subjective evaluations of Wikipedia articles by users. The Wikipedia Recommender System (WRS), for example, uses the ratings of users similar to you to predict how you might evaluate articles in the future.

Christian Jensen, Povilas Pilkauskas and Thomas Lefevre are behind the WRS and have just published ‘Classification of Recommender Expertise in the Wikipedia Recommender System’ detailing the next version of their prototype. Jensen et al explain that the key problem of collaborative filtering systems is that a single user has varying levels of expertise in different areas and will therefore be good at rating some articles and not-so-good at rating others. You may have provided an accurate rating of an article about drag racing, for example, but that does not mean that you should automatically be believed when you provide feedback about painters from the Italian Renaissance. The first version of WRS worked with this kind of simplistic recommendation system and the team wanted to advance it to take account of a user’s expertise in different areas.

Now comes the hard part. In order for the system to take note of which topics users were proficient in, they had to decide what classification system they were going to use for all the information on Wikipedia (i.e. all “human knowledge”). They looked at the hundreds of classification schemes in order to Continue reading “Collaborative filtering: the problem with classification”