Subsampling and aggregation: A solution to the scalability problem in distance-based prediction for mixed-type data
Entidad
UAM. Departamento de MatemáticasEditor
MDPIFecha de edición
2021-09-13Cita
10.3390/math9182247
Mathematics 9.18 (2021): 2247
ISSN
2227-7390 (online)DOI
10.3390/math9182247Versión del editor
https://doi.org/10.3390/math9182247Materias
Classification; Dissimilarities; Ensemble; Big Data; Generalized Linear Model; Gower’s Metric; Machine Learning; MatemáticasDerechos
© 2021 by the authors. Licensee MDPI, Basel, SwitzerlandResumen
The distance-based linear model (DB-LM) extends the classical linear regression to the framework of mixed-type predictors or when the only available information is a distance matrix between regressors (as it sometimes happens with big data). The main drawback of these DB methods is their computational cost, particularly due to the eigendecomposition of the Gram matrix. In this context, ensemble regression techniques provide a useful alternative to fitting the model to the whole sample. This work analyzes the performance of three subsampling and aggregation techniques in DB regression on two specific large, real datasets. We also analyze, via simulations, the performance of bagging and DB logistic regression in the classification problem with mixed-type features and large sample sizes
Lista de ficheros
Google Scholar:Baíllo Moreno, Amparo
-
Grané, Aurea
Lista de colecciones del ítem
Registros relacionados
Mostrando ítems relacionados por título, autor, creador y materia.