Estrategias de calentamiento en bandidos multi-brazo para recomendación
Author
López Ramos, EstherAdvisor
Castells Azpilicueta, PabloEntity
UAM. Departamento de Ingeniería InformáticaDate
2021-02Subjects
Reinforcement learning; recommender systems; multi-armed bandits; InformáticaNote
Trabajo Fin de Máster en Investigación e Innovación en Inteligencia Computacional y Sistemas InteractivosEsta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.
Abstract
Recommender systems have become an essential piece of multiple online platforms such as streaming
services and e-commerce in the last years as they provide users with articles they may find interesting
and thus granting them a personalised experience. The recommendation problem has many opened
investigation lines. One of them is the topic we tackle in this work: the cold-start problem.
In the context of recommender systems the cold-start problem refers to the situation in which a
system does not have enough information to give proper suggestions to the user. The cold-start problem
often occurs because of the following three main reasons: the user to be recommended is new to the
system and thus there is no information about its likes, some of the items that are recommended have
been recently added to the system and they do not have users’ reviews, or the system is completely
new and there is no information about the users nor the items.
Classical recommendation techniques come from Machine learning and they understand recommendation as an static process in which the system provides suggestions to the user and the last rates
them. It is more convenient to understand recommendation as a cycle of constant interaction between
the user and the system and every time a user rates an item, the system uses it to learn from the
user. In that sense we can sacrifice immediate reward in order to earn information about the user and
improve long term reward. This schema establishes a balance between exploration (non-optimal recommendations to learn about the user) and exploitation (optimal recommendations to maximise the
reward). Techniques known as multi-armed bandits are used to get that balance between exploration
and exploitation and we propose them to tackle cold-start problem.
Our hypothesis is that an exploration in the first epochs of the recommendation cycle can lead to
an improvement in the reward during the latest epochs. To test this hypothesis we divide the recommendation loop in two phases: the warm-up, in which we follow a more exploratory approach to get
as much information as possible; and exploitation, in which the system uses the knowledge acquired
during the warm-up to maximise the reward. For this two phases we combine different recommendation
strategies, among which we consider both multi-armed bandits and classic algorithms. We evaluate
them offline in three datasets: CM100K (music), MovieLens1M (films) and Twitter. We also study how
the warm-up duration affects the exploitation phase. Results show that in two dataset (MovieLens and
Twitter) classical algorithms perform better during the exploitation phase in terms of recall after a mainly
exploratory warm-up phase.
Files in this item
Google Scholar:López Ramos, Esther
This item appears in the following Collection(s)
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/
Related items
Showing items related by title, author, creator and subject.
-
Bandidos multi-brazo en sistemas de recomendación
López Ramos, Esther
2019-06 -
Falsos positivos en recomendación con bandidos multi-brazo
Cuesta Fernández, Emilio
2021-01 -
Librería de bandidos multi-brazo para recomendación.
Cabornero Pascual, David
2021-05