Advanced machine learning methods based on Gaussian processes

Villacampa Calvo, Carlos

UAM_Biblioteca

Title (trans.)

Métodos avanzados de aprendizaje automático basados en procesos Gaussianos

Author

Villacampa Calvo, Carlos

Advisor

Hernández Lobato, Daniel

Entity

UAM. Departamento de Ingeniería Informática

Date

2022-07-05

Subjects

Informática

URI

http://hdl.handle.net/10486/704130

Note

Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 05-07-2022

Esta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.

Abstract

Machine learning is a set of methods that learn patterns from some observed data and allow to make predictions about previously unseen data. They can be divided in supervised and unsupervised learning. In supervised learning the observed data have associated labels representing the target that we want to learn, which can be categorical (classification) or real-valued (regression). Bayesian models have become popular in the recent years given their ability to provide uncertainty estimates about the predictions made, which is critical from some applications such as autonomous cars. The models are also less prone to overfitting than other popular models such as neural networks. Gaussian processes (GPs) are a type model that can be used to address both supervised and unsupervised learning problems. Besides being Bayesian models they are also non-parametric, so their expressiveness grows with the number of training data points. Furthermore, prior knowledge can be introduced by means of a covariance function like in kernel methods. This makes GPs more interpretable than other models, as the parameters of the covariance function characterize the properties of the function that we are trying to learn. However, GPs suffer from several limitations. First, their computational cost is cubic with respect to the number of training points. Also, exact inference is only feasible for regression problems. Sparse GPs combined with approximate inference techniques, such as variational inference (VI) or expectation propagation (EP), allow these models to scale to larger datasets and to be used for other types of problems. Both VI and EP rely on minimizing the Kullback-Leibler (KL) divergence between the posterior distribution and its approximation. This thesis proposes several GP models based on an approximate inference algorithm called power EP that allows to minimize a family of divergence measures called a- divergences that is a generalization of the KL-divergence. First, in the context of multiclass classification problems. Later, we extend this framework to a generalization of GPs called deep GPs which, unlike GPs, can be useful for problems where the functions that we are modeling are non-smooth or non-stationary or where the predictive distribution is not Gaussian. Next, this thesis proposes a new model based on sparse GPs and VI that takes into account noise in the inputs for multi-class classification problems. It was motivated by a problem coming from astro-physics, where having noise in the inputs is common due to errors in experimental measurements. Finally, this thesis proposes a new method that improves the training cost of sparse GPs. This method can reduce drastically the number of inducing points needed to train a model by making them input dependent through a non-linear transformation, e. g.a neural network

Show full item record