The effect of low number of points in clustering validation via the negentropy increment

Lago Fernández, Luis Fernando; Sánchez-Montañés Isla, Manuel Antonio; Corbacho Abelaira, Fernando

UAM_Biblioteca

Autor (es)

Lago Fernández, Luis Fernando

; Sánchez-Montañés Isla, Manuel Antonio

; Corbacho Abelaira, Fernando

Entidad

UAM. Departamento de Ingeniería Informática

Editor

Elsevier B.V.

Fecha de edición

2011-09

Cita

Neurocomputing 74.16 (2011): 2657-2664

ISSN

0925-2312 (print); 1872-8286 (online)

DOI

10.1016/j.neucom.2011.03.023

Financiado por

This work has been funded by DGUI-CAM/UAM (Project CCG10-UAM/TIC-5864)

Versión del editor

http://dx.doi.org/10.1016/j.neucom.2011.03.023

Materias

Inteligencia artificial; Crisp clustering; Cluster validation; Negentropy increment

URI

http://hdl.handle.net/10486/9171

Nota

This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 74, 16, (2011) DOI: 10.1016/j.neucom.2011.03.023
Selected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009)

Derechos

Esta obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.

Resumen

We recently introduced the negentropy increment, a validity index for crisp clustering that quantifies the average normality of the clustering partitions using the negentropy. This index can satisfactorily deal with clusters with heterogeneous orientations, scales and densities. One of the main advantages of the index is the simplicity of its calculation, which only requires the computation of the log-determinants of the covariance matrices and the prior probabilities of each cluster. The negentropy increment provides validation results which are in general better than those from other classic cluster validity indices. However, when the number of data points in a partition region is small, the quality in the estimation of the log-determinant of the covariance matrix can be very poor. This affects the proper quantification of the index and therefore the quality of the clustering, so additional requirements such as limitations on the minimum number of points in each region are needed. Although this kind of constraints can provide good results, they need to be adjusted depending on parameters such as the dimension of the data space. In this article we investigate how the estimation of the negentropy increment of a clustering partition is affected by the presence of regions with small number of points. We find that the error in this estimation depends on the number of points in each region, but not on the scale or orientation of their distribution, and show how to correct this error in order to obtain an unbiased estimator of the negentropy increment. We also quantify the amount of uncertainty in the estimation. As we show, both for 2D synthetic problems and multidimensional real benchmark problems, these results can be used to validate clustering partitions with a substantial improvement.

Mostrar el registro completo del ítem

Lista de ficheros

Nombre

effect_lago-fernandez_2011_ps.pdf

Tamaño

1.004Mb

Formato

PDF

Google™ Scholar:Lago Fernández, Luis Fernando - Sánchez-Montañés Isla, Manuel Antonio - Corbacho Abelaira, Fernando

Lista de colecciones del ítem

Producción científica en acceso abierto de la UAM [20456]

UAM_Biblioteca