Estrategias de calentamiento en bandidos multi-brazo para recomendación

López Ramos, Esther

UAM_Biblioteca

Author

López Ramos, Esther

Advisor

Castells Azpilicueta, Pablo

Entity

UAM. Departamento de Ingeniería Informática

Date

2021-02

Subjects

Reinforcement learning; recommender systems; multi-armed bandits; Informática

URI

http://hdl.handle.net/10486/695134

Note

Trabajo Fin de Máster en Investigación e Innovación en Inteligencia Computacional y Sistemas Interactivos

Esta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.

Abstract

Recommender systems have become an essential piece of multiple online platforms such as streaming services and e-commerce in the last years as they provide users with articles they may find interesting and thus granting them a personalised experience. The recommendation problem has many opened investigation lines. One of them is the topic we tackle in this work: the cold-start problem. In the context of recommender systems the cold-start problem refers to the situation in which a system does not have enough information to give proper suggestions to the user. The cold-start problem often occurs because of the following three main reasons: the user to be recommended is new to the system and thus there is no information about its likes, some of the items that are recommended have been recently added to the system and they do not have users’ reviews, or the system is completely new and there is no information about the users nor the items. Classical recommendation techniques come from Machine learning and they understand recommendation as an static process in which the system provides suggestions to the user and the last rates them. It is more convenient to understand recommendation as a cycle of constant interaction between the user and the system and every time a user rates an item, the system uses it to learn from the user. In that sense we can sacrifice immediate reward in order to earn information about the user and improve long term reward. This schema establishes a balance between exploration (non-optimal recommendations to learn about the user) and exploitation (optimal recommendations to maximise the reward). Techniques known as multi-armed bandits are used to get that balance between exploration and exploitation and we propose them to tackle cold-start problem. Our hypothesis is that an exploration in the first epochs of the recommendation cycle can lead to an improvement in the reward during the latest epochs. To test this hypothesis we divide the recommendation loop in two phases: the warm-up, in which we follow a more exploratory approach to get as much information as possible; and exploitation, in which the system uses the knowledge acquired during the warm-up to maximise the reward. For this two phases we combine different recommendation strategies, among which we consider both multi-armed bandits and classic algorithms. We evaluate them offline in three datasets: CM100K (music), MovieLens1M (films) and Twitter. We also study how the warm-up duration affects the exploitation phase. Results show that in two dataset (MovieLens and Twitter) classical algorithms perform better during the exploitation phase in terms of recall after a mainly exploratory warm-up phase.

Show full item record