A recent paper by Spotify explores this problem to propose a new approach for generating calibrated recommendations by modeling it as a max flow problem.
The problem and proposal
Traditional recommendation algorithms aimed at maximizing the relevance of the generated recs for the user – prioritizing the accuracy of the suggestions. This risks shunning the user’s diversity of interests across genres/categories and optimizing just for the primary or popular ones. Having calibration in recommender systems will measure if the generated recs align with the user’s historical likes and interests.
It is intuitive to think that the solution could be simply constraining the recommended items to match the category ratio seen in the user activity. While that is true, the paper’s approach integrates it into the optimization algorithm and theoretically guarantees an optimal solution. They draw an equivalence of this with finding a maximum flow between two nodes in a graph with minimal cost. By modifying the optimization goal to minimize the gap between the distribution of user interests and recommendations, they calibrate the recommended list.
Minimum-cost flow for calibration
To understand how the proposal can be modelled, we can look at an earlier work that the authors note. One of their inspirations is an existing greedy approach that tries to maximize an objective balancing relevance and divergence (between the two distributions noted by p and q) as shown below.
The algorithm iteratively adds the most relevant and least miscalibrated item to an initially empty recommendation list. The weight given to relevance versus miscalibration is controlled by the parameter λ.
The proposed approach using minimum-cost flow (MCF), that models the problem as a graph, derives the costs of its edges from the above equation. For n required recommendations, n successful flows must pass through the graph with minimal cost.
Generally in recommender algorithms, each user gets a personalised matrix A where the m rows are the items available to recommend and the n columns being the items actually recommended. Each item i has the expected value Aij of being recommended in the jth slot. Now, a matching binary matrix M of size m x n finds the best recommendation list ensuring each item is put in the list at most once and only one item occupies a slot. An optimal M* can be given as a maximum weight assignment problem:
The term above encourages highly relevant items to be chosen for the recommendations. On the other hand, adding a divergence term penalises the final recommendations distribution q to not deviate from the user distribution p – calibrating the choices based on their categories.
Every item available (size m) belongs to one or more content categories (size c). If the optimal solution picks j items from a category k, then q(k) = j/n . Thus, the divergence term (using the Kullback-Liebler divergence) DKL(q||p) is modified as,
The objective function balancing relevance and miscalibration thus becomes,
Graph representation
In the graph network, a flow starts from the source node and traverses through the network via the edges and reaches the sink node. The M* equation boils down to the edge costs representing relevance (between slot and item choice nodes) and divergence (between all the category nodes and the sink).
First, -Aij (negated to minimise) represents the cost of the edge connecting nodes i and j from a list of candidate item nodes and a list of slot nodes respectively. And, Ek,i-Ek,i-1 is the cost from the category node wk,i to the sink (where Ek,i is as defined above). The cost of other edges is zero. A sample graph is shown below.
Empirical studies and results
The authors compare MCF with a greedy method and a Mixed Integer Programming method, and a base algorithm without calibration. On the MovieLens and Last.fm datasets for different values of λ (from 0 to 0.9), MCF improves relevance and lowers miscalibration better than others overall. It also works well with users having different genre diversity (low to high number of genres) and different popularity tendencies (low to high interest in popular items).
If you are interested, the paper also has a theoretical proof of optimality of this approach – guaranteeing an optimal solution.
On the more practical side, they show that for larger numbers of recommendations MCF does not decrease miscalibration as much and the time taken is greater than other methods. But in real use cases, the sizes are around 10-20 for which MCF still works the best.
To get more details about the theoretical proof, limitations, experiments, and results you can refer to the original paper:
Himan Abdollahpouri, Zahra Nazari, Alex Gain, Clay Gibson, Maria Dimakopoulou, Jesse Anderton, Benjamin Carterette, Mounia Lalmas, and Tony Jebara. 2023. Calibrated Recommendations as a Minimum-Cost Flow Problem. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (WSDM '23). Association for Computing Machinery, New York, NY, USA, 571–579.
Conclusion
It is important to account for the diversity of the users’ interests in generating good recommendations. The paper discussed here models calibration in recommendations as a minimum-cost flow problem and outperforms existing methods in generating both relevant and calibrated recommendations.