Comparing Recommendation Lists
In my research, I am trying to understand how different recommender algorithms behave in different situations. We’ve known for a while that ‘different recommenders are different’1, to paraphrase Sean McNee. However, we lack thorough data on how they are different in a variety of contexts. Our RecSys 2014 paper, User Perception of Differences in Recommender Algorithms (by myself, Max Harper, Martijn Willemsen, and Joseph Konstan), reports on an experiment that we ran to collect some of this data.
I have done some work on this subject in offline contexts already; my When Recommenders Fail paper looked at contexts in which different algorithms make different mistakes. LensKit makes it easy to test many different algorithms in the same experimental setup and context. This experiment brings my research goals back into the user experiment realm: directly measuring the ways in which users experience the output of different algorithms as being different.