Reducing the loss of information through annealing text distortion

Granados Fontecha, Ana; Cebrián Ramos, Manuel; Camacho, David; Rodríguez Ortiz, Francisco Borja

UAM_Biblioteca

dc.contributor.author	Granados Fontecha, Ana
dc.contributor.author	Cebrián Ramos, Manuel
dc.contributor.author	Camacho, David
dc.contributor.author	Rodríguez Ortiz, Francisco Borja
dc.contributor.other	UAM. Departamento de Ingeniería Informática	es_ES
dc.date.accessioned	2015-01-27T17:54:24Z
dc.date.available	2015-01-27T17:54:24Z
dc.date.issued	2011-07
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering 23.7 (2011)	en_US
dc.identifier.issn	1041-4347 (print)	en_US
dc.identifier.issn	1558-2191 (online)	en_US
dc.identifier.uri	http://hdl.handle.net/10486/663413
dc.description	Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011	en_US
dc.description.abstract	Compression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting.	en_US
dc.description.sponsorship	This work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects.	en_US
dc.format.extent	13 pág.	es_ES
dc.format.mimetype	application/pdf	en
dc.language.iso	eng	en
dc.publisher	IEEE Computer Soc.	en_US
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	en_US
dc.rights	© 2010 IEEE	en_US
dc.subject.other	Information distortion	en_US
dc.subject.other	Kolmogorov complexity	en_US
dc.subject.other	Clustering by compression	en_US
dc.subject.other	Data compression	en_US
dc.subject.other	Normalized compression distance	en_US
dc.title	Reducing the loss of information through annealing text distortion	en_US
dc.type	article	en_US
dc.subject.eciencia	Informática	es_ES
dc.relation.publisherversion	http://dx.doi.org/10.1109/TKDE.2010.173
dc.identifier.doi	10.1109/TKDE.2010.173
dc.identifier.publicationfirstpage	1090
dc.identifier.publicationissue	7
dc.identifier.publicationlastpage	1102
dc.identifier.publicationvolume	23
dc.type.version	info:eu-repo/semantics/publishedVersion	en
dc.contributor.group	Neurocomputación Biológica (ING EPS-005)	es_ES
dc.contributor.group	Análisis de Datos e Inteligencia Aplicada (ING EPS-012)	es_ES
dc.contributor.group	Herramientas Interactivas Avanzadas (ING EPS-003)	es_ES
dc.rights.accessRights	openAccess	en
dc.authorUAM	Camacho Fernández, David (261274)
dc.facultadUAM	Escuela Politécnica Superior