Advanced machine learning methods based on Gaussian processes
Title (trans.)
Métodos avanzados de aprendizaje automático basados en procesos GaussianosAuthor
Villacampa Calvo, CarlosAdvisor
Hernández Lobato, DanielEntity
UAM. Departamento de Ingeniería InformáticaDate
2022-07-05Subjects
InformáticaNote
Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 05-07-2022Esta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.
Abstract
Machine learning is a set of methods that learn patterns from some observed data
and allow to make predictions about previously unseen data. They can be divided in
supervised and unsupervised learning. In supervised learning the observed data have
associated labels representing the target that we want to learn, which can be categorical
(classification) or real-valued (regression).
Bayesian models have become popular in the recent years given their ability to
provide uncertainty estimates about the predictions made, which is critical from some
applications such as autonomous cars. The models are also less prone to overfitting
than other popular models such as neural networks.
Gaussian processes (GPs) are a type model that can be used to address both supervised
and unsupervised learning problems. Besides being Bayesian models they are
also non-parametric, so their expressiveness grows with the number of training data
points. Furthermore, prior knowledge can be introduced by means of a covariance
function like in kernel methods. This makes GPs more interpretable than other models,
as the parameters of the covariance function characterize the properties of the function
that we are trying to learn.
However, GPs suffer from several limitations. First, their computational cost is cubic
with respect to the number of training points. Also, exact inference is only feasible for
regression problems. Sparse GPs combined with approximate inference techniques,
such as variational inference (VI) or expectation propagation (EP), allow these models
to scale to larger datasets and to be used for other types of problems. Both VI and
EP rely on minimizing the Kullback-Leibler (KL) divergence between the posterior
distribution and its approximation.
This thesis proposes several GP models based on an approximate inference algorithm
called power EP that allows to minimize a family of divergence measures called a-
divergences that is a generalization of the KL-divergence. First, in the context of multiclass
classification problems. Later, we extend this framework to a generalization of GPs
called deep GPs which, unlike GPs, can be useful for problems where the functions that
we are modeling are non-smooth or non-stationary or where the predictive distribution
is not Gaussian.
Next, this thesis proposes a new model based on sparse GPs and VI that takes into
account noise in the inputs for multi-class classification problems. It was motivated by
a problem coming from astro-physics, where having noise in the inputs is common
due to errors in experimental measurements.
Finally, this thesis proposes a new method that improves the training cost of sparse
GPs. This method can reduce drastically the number of inducing points needed to
train a model by making them input dependent through a non-linear transformation,
e. g.a neural network
Files in this item
Google Scholar:Villacampa Calvo, Carlos
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.