Ensemble Learning in the Presence of Noise
Author
Sabzevari, MaryamEntity
UAM. Departamento de Ingeniería InformáticaDate
2015-06Subjects
Aprendizaje automático; InformáticaNote
Máster Universitario en Investigación e Innovación en Tecnologías de la Información y las ComunicacionesEsta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.
Abstract
Learning in the presence of noise is an important issue in machine learning. The design
and implementation of e ective strategies for automatic induction from noisy data is
particularly important in real-world problems, where noise from defective collecting
processes, data contamination or intrinsic
uctuations is ubiquitous. There are two
general strategies to address this problem. One is to design a robust learning method.
Another one is to identify noisy instances and eliminate or correct them.
In this thesis we propose to use ensembles to mitigate the negative impact of mislabelled
data in the learning process. In ensemble learning the predictions of individual learners
are combined to obtain a nal decision. E ective combinations take advantage of the
complementarity of these base learners. In this manner the errors incurred by a learner
can be compensated by the predictions of other learners in the combination.
A rst contribution of this work is the use of subsampling to build bootstrap ensembles,
such as bagging and random forest, that are resilient to class label noise. By using lower
sampling rates, the detrimental e ect of mislabelled examples on the nal ensemble
decisions can be tempered. The reason is that each labelled instance is present in a
smaller fraction of the training sets used to build individual learners. Ensembles can
also be used as a noise detection procedure to improve the quality of the data used for
training. In this strategy, one attempts to identify noisy instances and either correct (by
switching their class label) or discard them. A particular example is identi ed as noise
if a speci ed percentage (greater than 50%) of the learners disagree with the given label
for this example. Using an extensive empirical evaluation we demonstrate the use of
subsampling as an e ective tool to detect and handle noise in classi cation problems.
Files in this item
Google Scholar:Sabzevari, Maryam
This item appears in the following Collection(s)
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/
Related items
Showing items related by title, author, creator and subject.