Biased Lift for Related-Item Recommendation

Effectively computing non-personalized recommendations can be annoyingly subtle. If we do naïve things like sorting by average rating, we get a top-N list dominated by items that one user really liked. Sorting by overall popularity doesn’t have this problem; as soon as we want contextual related-product recommendations, however (e.g. “people who bought this also bought”), and don’t want those recommendations to be dominated by the most popular items overall, the problem comes roaring back.
Preliminaries
I’m going to use my usual notation in this blog post: we have a set of users who interact with items forming set .
For convenience, let be the number of users who have interacted with item . We can directly sort by this value, in descending order, to find the most popular items (those items who have been interacted with by the most users). We can also frame overall popularity as the probability that a user will interact with that item (assuming, for the moment, that all users are equally likely or representative of future incoming users):
When we have explicit ratings, user gives item the rating , and the average rating for an item is .
To recommend items, we compute some score and pick the items with the highest score, or use these scors in some more sophisticated ranker such as a sampler or diversifier.
Biasing Lift
This raises a question: how do we apply the same principle to probabilities and lift? What would it mean to start each item out with some number of “neutral” interactions for the purpose of lift?2
Working this out is a little less obvious than starting each item with some neutral ratings. Both because we need to define “neutral”, and because there will be some additional terms involved (the number of neutral interactions per item is not sufficient for the math to work out).
Let’s define “neutral” as “independent”: our goal is for each item to start out with virtual interactions that are independent of any other item’s virtual interactions. To compute probabilities, we also need to know the number of virtual users (). We can set this directly, but it is more natural to derive it from , the (average) number of interactions supplied by each virtual user. We’ll see at the end that we can actually ignore and and derive a sufficient approximation of biased lift with only , but we’ll use them for now.
If each item has virtual interactions, we need total virtual interactions. If each virtual user supplies virtual independent interactions, then the virtual user count . With these in hand, we can define the prior probability of an item, or the probability that one of our virtual users will interact with it:
Since all virtual interactions are independent, we can define the prior joint probability of individual item pairs:
To define a biased probability that incorporates our virtual clicks, an item with real clicks will have an additional virtual clicks from a pool of virtual users. This yields:
Next, we will define the biased joint probability . For this, we need to know expected number of virtual co-occurrences a pair of items will have: of our virtual users, how many will have interacted with both items? This is , allowing us to compute the biased joint probability:
We can plug these biased joint and unconditinoal probabilities into the lift formula to get biased lift:
The last simplification is not a precise computation of the biased lift, but can be used to identify the items pairs with the highest lift. We selected to be a constant, and is fixed given the data set (since it is computed from fixed and the constants and ). Therefore, adding or terms involving only and do change the order of item pairs within a single data set.
Further, , so adding in the numerator is adding a very small quantity, and can be effectively ignored for approximation purposes.
Since the final ordering score no longer depends on , we actually do not need to pick if we just want to use biased lift to select the most related recommendations for a reference item. We can select (higher values increase the number of real co-occurrances needed to obtain a high biased lift score), and sort by biased scores computed directly from interaction counts:
And it works! I don’t know how well it works yet, overall, but in my preliminary trials in class, it did exactly what I hoped, computing related-book recommendations that weren’t dominated by either most-popular-overall or unpopular-but-that-one-user-also-read-the-reference-book recommendations. I’ll probably add it to LensKit soon.
We called this the damped mean in early writing and teaching; biased mean is perhaps better, but may be confusing with biases in the bias model coming in a couple of paragraphs.↩︎
I have to believe others have asked this question, given how old lift is as a metric, but I have been unable to find references to damping or biasing lift in this way. If someone knows one, please send it my way!↩︎