Show simple item record

dc.contributor.authorGranados Fontecha, Ana
dc.contributor.authorCebrián Ramos, Manuel
dc.contributor.authorCamacho, David
dc.contributor.authorRodríguez Ortiz, Francisco Borja 
dc.contributor.otherUAM. Departamento de Ingeniería Informáticaes_ES
dc.date.accessioned2015-01-27T17:54:24Z
dc.date.available2015-01-27T17:54:24Z
dc.date.issued2011-07
dc.identifier.citationIEEE Transactions on Knowledge and Data Engineering 23.7 (2011)en_US
dc.identifier.issn1041-4347 (print)en_US
dc.identifier.issn1558-2191 (online)en_US
dc.identifier.urihttp://hdl.handle.net/10486/663413
dc.descriptionPersonal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011en_US
dc.description.abstractCompression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting.en_US
dc.description.sponsorshipThis work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects.en_US
dc.format.extent13 pág.es_ES
dc.format.mimetypeapplication/pdfen
dc.language.isoengen
dc.publisherIEEE Computer Soc.en_US
dc.relation.ispartofIEEE Transactions on Knowledge and Data Engineeringen_US
dc.rights© 2010 IEEEen_US
dc.subject.otherInformation distortionen_US
dc.subject.otherKolmogorov complexityen_US
dc.subject.otherClustering by compressionen_US
dc.subject.otherData compressionen_US
dc.subject.otherNormalized compression distanceen_US
dc.titleReducing the loss of information through annealing text distortionen_US
dc.typearticleen_US
dc.subject.ecienciaInformáticaes_ES
dc.relation.publisherversionhttp://dx.doi.org/10.1109/TKDE.2010.173
dc.identifier.doi10.1109/TKDE.2010.173
dc.identifier.publicationfirstpage1090
dc.identifier.publicationissue7
dc.identifier.publicationlastpage1102
dc.identifier.publicationvolume23
dc.type.versioninfo:eu-repo/semantics/publishedVersionen
dc.contributor.groupNeurocomputación Biológica (ING EPS-005)es_ES
dc.contributor.groupAnálisis de Datos e Inteligencia Aplicada (ING EPS-012)es_ES
dc.contributor.groupHerramientas Interactivas Avanzadas (ING EPS-003)es_ES
dc.rights.accessRightsopenAccessen
dc.authorUAMCamacho Fernández, David (261274)
dc.facultadUAMEscuela Politécnica Superior


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record