Balancing Exploration and Exploitation as a strategy for enhancing music recommendation systems

Presenter: Rodrigo Borges

Music recommender systems typically use historical listening information for making personalized recommendations. This approach however keeps high rated songs always as better candidates in a greedy manner. We present a strategy for balancing safe (Exploitation) and novel (Exploration) recommendations in order to prevent suboptimal performance over the long term. The solution proposed is based in a reinforcement learning problem called multi-armed bandit that simulates a situation where someone is playing in several slot machines and needs to optimize his gains. The player starts without any knowledge about the machines as has to choose between the current best machine and new possibilities at each turn. Practical results from the literature are presented as enhancing long term recommendation as well as solving the problem of new items added to the dataset.

(video presentation in portuguese)

When: June 18th, 2018

Where: Antonio Gilioli Auditorium, IME/USP