The effect of low number of points in clustering validation via the negentropy increment
Entidad
UAM. Departamento de Ingeniería InformáticaEditor
Elsevier B.V.Fecha de edición
2011-09Cita
10.1016/j.neucom.2011.03.023
Neurocomputing 74.16 (2011): 2657-2664
ISSN
0925-2312 (print); 1872-8286 (online)DOI
10.1016/j.neucom.2011.03.023Financiado por
This work has been funded by DGUI-CAM/UAM (Project CCG10-UAM/TIC-5864)Versión del editor
http://dx.doi.org/10.1016/j.neucom.2011.03.023Materias
Inteligencia artificial; Crisp clustering; Cluster validation; Negentropy incrementNota
This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 74, 16, (2011) DOI: 10.1016/j.neucom.2011.03.023Selected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009)
Derechos
© 2011 Elsevier B.V. All rights reservedEsta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.
Resumen
We recently introduced the negentropy increment, a validity index for crisp clustering that quantifies the average normality of the clustering partitions using the negentropy. This index can satisfactorily deal with clusters with heterogeneous orientations, scales and densities. One of the main advantages of the index is the simplicity of its calculation, which only requires the computation of the log-determinants of the covariance matrices and the prior probabilities of each cluster. The negentropy increment provides validation results which are in general better than those from other classic cluster validity indices. However, when the number of data points in a partition region is small, the quality in the estimation of the log-determinant of the covariance matrix can be very poor. This affects the proper quantification of the index and therefore the quality of the clustering, so additional requirements such as limitations on the minimum number of points in each region are needed. Although this kind of constraints can provide good results, they need to be adjusted depending on parameters such as the dimension of the data space. In this article we investigate how the estimation of the negentropy increment of a clustering partition is affected by the presence of regions with small number of points. We find that the error in this estimation depends on the number of points in each region, but not on the scale or orientation of their distribution, and show how to correct this error in order to obtain an unbiased estimator of the negentropy increment. We also quantify the amount of uncertainty in the estimation. As we show, both for 2D synthetic problems and multidimensional real benchmark problems, these results can be used to validate clustering partitions with a substantial improvement.
Lista de ficheros
Google Scholar:Lago Fernández, Luis Fernando
-
Sánchez-Montañés Isla, Manuel Antonio
-
Corbacho Abelaira, Fernando
Lista de colecciones del ítem
Registros relacionados
Mostrando ítems relacionados por título, autor, creador y materia.