Reducing the loss of information through annealing text distortion

Biblos-e Archivo/Manakin Repository

Show simple item record Granados Fontecha, Ana Cebrián Ramos, Manuel Camacho, David Rodríguez, Francisco de Borja
dc.contributor.other UAM. Departamento de Ingeniería Informática es_ES 2015-01-27T17:54:24Z 2015-01-27T17:54:24Z 2011-07
dc.identifier.citation IEEE Transactions on Knowledge and Data Engineering 23.7 (2011) en_US
dc.identifier.issn 1041-4347 (print) en_US
dc.identifier.issn 1558-2191 (online) en_US
dc.description Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011 en_US
dc.description.abstract Compression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting. en_US
dc.description.sponsorship This work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects. en_US
dc.format.extent 13 pág. es_ES
dc.format.mimetype application/pdf en
dc.language.iso eng en
dc.publisher IEEE Computer Soc. en_US
dc.relation.ispartof IEEE Transactions on Knowledge and Data Engineering en_US
dc.rights © 2010 IEEE en_US
dc.subject.other Information distortion en_US
dc.subject.other Kolmogorov complexity en_US
dc.subject.other Clustering by compression en_US
dc.subject.other Data compression en_US
dc.subject.other Normalized compression distance en_US
dc.title Reducing the loss of information through annealing text distortion en_US
dc.type article en_US
dc.subject.eciencia Informática es_ES
dc.identifier.doi 10.1109/TKDE.2010.173
dc.identifier.publicationfirstpage 1090
dc.identifier.publicationissue 7
dc.identifier.publicationlastpage 1102
dc.identifier.publicationvolume 23
dc.type.version info:eu-repo/semantics/publishedVersion en Neurocomputación Biológica (ING EPS-005) es_ES Análisis de Datos e Inteligencia Aplicada (ING EPS-012) es_ES Herramientas Interactivas Avanzadas (ING EPS-003) es_ES
dc.rights.accessRights openAccess en
dc.authorUAM Camacho Fernández, David (261274)

Files in this item


This item appears in the following Collection(s)

Show simple item record