Improving Recommendations by Calibrating for User Interests

Sometimes I make the mistake of listening to some new pop songs back-to-back on my music app for a while and soon after most of my recommendations are just the latest pop. This also applies to liking social media posts, streaming movies, and even shopping sites. Things go bad enough to a point where I’m conscious of what I watch or play lest it spoils the for-you lists. Well, this is a good sign that recommendations given by those platforms are not calibrated. Calibration in recommendation systems ensures that the suggested recommendations represent the variety of the user’s interests. For instance, consider someone who likes watching more drama shows than musicals but watches them both. An uncalibrated system might skew the recommendations towards what’s popular and only recommend more musicals or recommend only what the user likes more.

December 19, 2024

min read

Param Raval

A recent paper by Spotify explores this problem to propose a new approach for generating calibrated recommendations by modeling it as a max flow problem.

The problem and proposal

Traditional recommendation algorithms aimed at maximizing the relevance of the generated recs for the user – prioritizing the accuracy of the suggestions. This risks shunning the user’s diversity of interests across genres/categories and optimizing just for the primary or popular ones. Having calibration in recommender systems will measure if the generated recs align with the user’s historical likes and interests.

It is intuitive to think that the solution could be simply constraining the recommended items to match the category ratio seen in the user activity. While that is true, the paper’s approach integrates it into the optimization algorithm and theoretically guarantees an optimal solution. They draw an equivalence of this with finding a maximum flow between two nodes in a graph with minimal cost. By modifying the optimization goal to minimize the gap between the distribution of user interests and recommendations, they calibrate the recommended list.

Minimum-cost flow for calibration

To understand how the proposal can be modelled, we can look at an earlier work that the authors note. One of their inspirations is an existing greedy approach that tries to maximize an objective balancing relevance and divergence (between the two distributions noted by p and q) as shown below.

The algorithm iteratively adds the most relevant and least miscalibrated item to an initially empty recommendation list. The weight given to relevance versus miscalibration is controlled by the parameter λ.

The proposed approach using minimum-cost flow (MCF), that models the problem as a graph, derives the costs of its edges from the above equation. For n required recommendations, n successful flows must pass through the graph with minimal cost.

Generally in recommender algorithms, each user gets a personalised matrix A where the m rows are the items available to recommend and the n columns being the items actually recommended. Each item i has the expected value Aij of being recommended in the jth slot. Now, a matching binary matrix M of size m x n finds the best recommendation list ensuring each item is put in the list at most once and only one item occupies a slot. An optimal M* can be given as a maximum weight assignment problem:

The term above encourages highly relevant items to be chosen for the recommendations. On the other hand, adding a divergence term penalises the final recommendations distribution q to not deviate from the user distribution p – calibrating the choices based on their categories.

Every item available (size m) belongs to one or more content categories (size c). If the optimal solution picks j items from a category k, then q(k) = j/n . Thus, the divergence term (using the Kullback-Liebler divergence) DKL(q||p) is modified as,

The objective function balancing relevance and miscalibration thus becomes,

Graph representation

In the graph network, a flow starts from the source node and traverses through the network via the edges and reaches the sink node. The M* equation boils down to the edge costs representing relevance (between slot and item choice nodes) and divergence (between all the category nodes and the sink).

First, -Aij (negated to minimise) represents the cost of the edge connecting nodes i and j from a list of candidate item nodes and a list of slot nodes respectively. And, Ek,i-Ek,i-1 is the cost from the category node wk,i to the sink (where Ek,i is as defined above). The cost of other edges is zero. A sample graph is shown below.

An example of a flow graph given in the paper is reproduced here. There are 6 candidate items (*r1…r6*) each belonging to a certain category (red or blue) from which we want to generate 4 recommendations for the slots *y1…y4*. Each item is associated with a category label node *w1,i* or *w2,i* with weighted edges leading to the sink.

Empirical studies and results

The authors compare MCF with a greedy method and a Mixed Integer Programming method, and a base algorithm without calibration. On the MovieLens and Last.fm datasets for different values of λ (from 0 to 0.9), MCF improves relevance and lowers miscalibration better than others overall. It also works well with users having different genre diversity (low to high number of genres) and different popularity tendencies (low to high interest in popular items).

If you are interested, the paper also has a theoretical proof of optimality of this approach – guaranteeing an optimal solution.

On the more practical side, they show that for larger numbers of recommendations MCF does not decrease miscalibration as much and the time taken is greater than other methods. But in real use cases, the sizes are around 10-20 for which MCF still works the best.

To get more details about the theoretical proof, limitations, experiments, and results you can refer to the original paper:

Himan Abdollahpouri, Zahra Nazari, Alex Gain, Clay Gibson, Maria Dimakopoulou, Jesse Anderton, Benjamin Carterette, Mounia Lalmas, and Tony Jebara. 2023. Calibrated Recommendations as a Minimum-Cost Flow Problem. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (WSDM '23). Association for Computing Machinery, New York, NY, USA, 571–579.

Conclusion

It is important to account for the diversity of the users’ interests in generating good recommendations. The paper discussed here models calibration in recommendations as a minimum-cost flow problem and outperforms existing methods in generating both relevant and calibrated recommendations.

Improving Recommendations by Calibrating for User Interests

The problem and proposal

Minimum-cost flow for calibration

Graph representation

Empirical studies and results

Conclusion

Get up and running with one engineer in one sprint

Related Posts

Is Data Splitting Making or Breaking Your Recommender System?

Is the key to unlocking better user experiences in recommender systems found in exploration?

The 10 Best Semantic Search APIs in 2025