Reinforced Learning Strategies applied to Personalized Music Recommendation

Author: Rodrigo Borges

Advisor: Marcelo Queiroz

The number of songs available in digital format on the internet grows every day, and reaches an excessive volume. When deciding to listen to music, a user has such a large number of options at his disposal that he feels the need for a tool to help him in his decision-making process. Otherwise, it is plausible to believe that he would never access a large part of this material. Online platforms offer these users automatic recommendation services that analyze their listening history and keep available to them a sequence of songs that match their personal taste.

Recommender Systems appear as an independent area of research in the early 1990s, within the context of online services, when researchers begin to focus efforts on problems directly related to predicting user behavior when evaluating items which they had not previously yet been introduced to. In its first formulation, therefore, the automatic recommendation problem boiled down to predicting such evaluations and suggesting to the user the item with the highest predicted value. The problem, however, begins to expand and diversify depending on the different possible applications, and were also specifically driven by a prize offered by the North American company Netflix, specialized in movie recommendations, which offered one million dollars to whoever improved the accuracy of their recommendation algorithm by 1%.

Making good recommendations, however, is not about simply analyzing the user’s listening history and suggesting songs to which he has reacted positively, it is necessary to provide him with new material, or songs that he has not yet heard and that he has a high probability of liking. It is also necessary, in addition, to know how to make recommendations for a user who has just entered the platform, when the system does not have any information about them and must be able to make relevant propositions. Finally, the system must be able to account for each user reaction in its execution time and incorporate it in subsequent executions in order to make this process dynamic.