Empirical analysis and evaluation of approximate techniques for pruning regression bagging ensembles
Entity
UAM. Departamento de Ingeniería InformáticaPublisher
ElsevierDate
2011-06Citation
10.1016/j.neucom.2011.03.001
Neurocomputing 74.12-13 (2011): 2250 – 2264
ISSN
0925-2312 (print); 1872-8286 (online)DOI
10.1016/j.neucom.2011.03.001Funded by
The authors acknowledge support from the Spanish Ministerio de Ciencia e Innovación, Project TIN2010-21575-C02-02.Editor's Version
http://dx.doi.org/10.1016/j.neucom.2011.03.001Subjects
Bagging; Boosting; Ensemble learning; Ensemble pruning; Regression; Semidefinite programming; InformáticaNote
This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing 74.12-13 (2011) DOI: 10.1016/j.neucom.2011.03.001Rights
© 2011 ElsevierEsta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.
Abstract
Identifying the optimal subset of regressors in a regression bagging ensemble
is a difficult task that has exponential cost in the size of the ensemble. In this
article we analyze two approximate techniques especially devised to address
this problem. The first strategy constructs a relaxed version of the problem
that can be solved using Semidefinite Programming. The second one is based
on modifying the order of aggregation of the regressors. Ordered Aggregation
is a simple forward selection algorithm that incorporates at each step the regressor
that reduces the training error of the current subensemble the most.
Both techniques can be used to identify subensembles that are close to the
optimal ones, which can be obtained by exhaustive search at a larger computational
cost. Experiments in a wide variety of synthetic and real-world
regression problems show that pruned ensembles composed of only 20% of the
initial regressors often have better generalization performance than the original
bagging ensembles. These improvements are due to a reduction in the
bias and the covariance components of the generalization error. Subensembles
obtained using either SDP or Ordered Aggregation generally outperform
subensembles obtained by other ensemble pruning methods and ensembles
generated by the Adaboost.R2 algorithm, negative correlation learning or
regularized linear stacked generalization. Ordered Aggregation has a slightly better overall performance than SDP in the problems investigated. However,
the difference is not statistically significant. Ordered Aggregation has
the further advantage that it produces a nested sequence of near-optimal
subensembles of increasing size with no additional computational cost.
Files in this item
Google Scholar:Hernández Lobato, Daniel
-
Martínez Muñoz, Gonzalo
-
Suárez González, Alberto
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.