Biased Lift for Related-Item Recommendation

A man lifting a barbell with weights.
Photo by Victor Freitas on Unsplash.

Effectively computing non-personalized recommendations can be annoyingly subtle. If we do naïve things like sorting by average rating, we get a top-N list dominated by items that one user really liked. Sorting by overall popularity doesn’t have this problem; as soon as we want contextual related-product recommendations, however (e.g. “people who bought this also bought”), and don’t want those recommendations to be dominated by the most popular items overall, the problem comes roaring back.

Preliminaries

I’m going to use my usual notation in this blog post: we have a set UU of MM users who interact with NN items forming set II.

For convenience, let mi=|Ui|m_i = |U_i| be the number of users who have interacted with item iIi \in I. We can directly sort by this value, in descending order, to find the most popular items (those items who have been interacted with by the most users). We can also frame overall popularity as the probability that a user will interact with that item (assuming, for the moment, that all users are equally likely or representative of future incoming users):

P[i]=PuU[iIu]=miM\operatorname{P}[i] = \operatorname{P}_{u \in U}[i \in I_u] = \frac{m_i}{M}

When we have explicit ratings, user uu gives item ii the rating ruir_{ui}, and the average rating for an item is rui\bar{r}_{ui}.

To recommend items, we compute some score s(i|)s(i|\dots) and pick the items with the highest score, or use these scors in some more sophisticated ranker such as a sampler or diversifier.

Biasing Lift

This raises a question: how do we apply the same principle to probabilities and lift? What would it mean to start each item out with some number of “neutral” interactions for the purpose of lift?2

Working this out is a little less obvious than starting each item with some neutral ratings. Both because we need to define “neutral”, and because there will be some additional terms involved (the number of neutral interactions per item is not sufficient for the math to work out).

Let’s define “neutral” as “independent”: our goal is for each item to start out with κ\kappa virtual interactions that are independent of any other item’s virtual interactions. To compute probabilities, we also need to know the number of virtual users (KK). We can set this directly, but it is more natural to derive it from nKn_K, the (average) number of interactions supplied by each virtual user. We’ll see at the end that we can actually ignore nKn_K and KK and derive a sufficient approximation of biased lift with only κ\kappa, but we’ll use them for now.

If each item has κ\kappa virtual interactions, we need κN\kappa N total virtual interactions. If each virtual user supplies nKn_K virtual independent interactions, then the virtual user count K=κNnKK = \frac{\kappa N}{n_K}. With these in hand, we can define the prior probability of an item, or the probability that one of our virtual users will interact with it:

PK[i]=κK \operatorname{P}_K[i] = \frac{\kappa}{K}

Since all virtual interactions are independent, we can define the prior joint probability of individual item pairs:

PK[i,j]=PK[i]PK[j]=κ2K2 \operatorname{P}_K[i,j] = \operatorname{P}_K[i] \operatorname{P}_K[j] = \frac{\kappa^2}{K^2}

To define a biased probability Ṗ[i]\dot{\operatorname{P}}[i] that incorporates our virtual clicks, an item with mim_i real clicks will have an additional κ\kappa virtual clicks from a pool of KK virtual users. This yields:

Ṗ[i]=mi+κM+K \dot{\operatorname{P}}[i] = \frac{m_i + \kappa}{M + K}

Next, we will define the biased joint probability Ṗ[i,j]\dot{\operatorname{P}}[i,j]. For this, we need to know expected number of virtual co-occurrences a pair of items will have: of our KK virtual users, how many will have interacted with both items? This is KPK[i,j]=κ2KK \operatorname{P}_K[i,j] = \frac{\kappa^2}{K}, allowing us to compute the biased joint probability:

Ṗ[i,j]=mij+κ2KM+K \dot{\operatorname{P}}[i,j] = \frac{m_{ij} + \frac{\kappa^2}{K}}{M + K}

We can plug these biased joint and unconditinoal probabilities into the lift formula to get biased lift:

BiasedLift[i,j]=Ṗ[i,j]Ṗ[i]Ṗ[j]=mij+κ2KM+KM+Kmi+κM+Kmj+κ=(M+K)mij+κ2K(mi+κ)(mj+κ)Mmij(mi+κ)(mj+κ)\begin{align*} \operatorname{BiasedLift}[i,j] & = \frac{\dot{\operatorname{P}}[i,j]}{\dot{\operatorname{P}}[i]\dot{\operatorname{P}}[j]} \\ & = \frac{m_{ij} + \frac{\kappa^2}{K}}{M + K} \cdot \frac{M+K}{m_i + \kappa} \cdot \frac{M+K}{m_j + \kappa} \\ & = (M + K) \frac{m_{ij} + \frac{\kappa^2}{K}}{(m_i + \kappa)(m_j + \kappa)} \\ & \approx M \frac{m_{ij}}{(m_i + \kappa)(m_j + \kappa)} \end{align*}

The last simplification is not a precise computation of the biased lift, but can be used to identify the items pairs with the highest lift. We selected κ\kappa to be a constant, and KK is fixed given the data set (since it is computed from fixed NN and the constants κ\kappa and nKn_K). Therefore, adding KK or terms involving only KK and κ\kappa do change the order of item pairs within a single data set.

Further, Kκ2K \gg \kappa^2, so adding κ2K\frac{\kappa^2}{K} in the numerator is adding a very small quantity, and can be effectively ignored for approximation purposes.

Since the final ordering score no longer depends on KK, we actually do not need to pick nKn_K if we just want to use biased lift to select the most related recommendations for a reference item. We can select κ\kappa (higher values increase the number of real co-occurrances needed to obtain a high biased lift score), and sort by biased scores computed directly from interaction counts:

s(i|j)=Mmij(mi+κ)(mj+κ) s(i|j) = M \frac{m_{ij}}{(m_i + \kappa)(m_j + \kappa)}

And it works! I don’t know how well it works yet, overall, but in my preliminary trials in class, it did exactly what I hoped, computing related-book recommendations that weren’t dominated by either most-popular-overall or unpopular-but-that-one-user-also-read-the-reference-book recommendations. I’ll probably add it to LensKit soon.


  1. We called this the damped mean in early writing and teaching; biased mean is perhaps better, but may be confusing with biases in the bias model coming in a couple of paragraphs.↩︎

  2. I have to believe others have asked this question, given how old lift is as a metric, but I have been unable to find references to damping or biasing lift in this way. If someone knows one, please send it my way!↩︎