Common Pitfalls Using the Normalized Compression Distance: What to Watch Out for in a Compressor

Biblos-e Archivo/Manakin Repository

Show simple item record

dc.contributor.author Cebrián Ramos, Manuel
dc.contributor.author Alfonseca, Manuel
dc.contributor.author Ortega, Alfonso
dc.contributor.other UAM. Departamento de Ingeniería Informática es_ES
dc.date.accessioned 2015-01-14T19:14:18Z
dc.date.available 2015-01-14T19:14:18Z
dc.date.issued 2005
dc.identifier.citation Communications in Information and Systems 5.4 (2005): 367-384 en_US
dc.identifier.issn 1526-7555 (print) en_US
dc.identifier.issn 2163-4548 (online) en_US
dc.identifier.uri http://hdl.handle.net/10486/663140
dc.description.abstract Using the mathematical background for algorithmic complexity developed by Kolmogorov in the sixties, Cilibrasi and Vitanyi have designed a similarity distance named normalized compression distance applicable to the clustering of objects of any kind, such as music, texts or gene sequences. The normalized compression distance is a quasi-universal normalized admissible distance under certain conditions. This paper shows that the compressors used to compute the normalized compression distance are not idempotent in some cases, being strongly skewed with the size of the objects and window size, and therefore causing a deviation in the identity property of the distance if we don't take care that the objects to be compressed fit the windows. The relationship underlying the precision of the distance and the size of the objects has been analyzed for several well-known compressors, and specially in depth for three cases, bzip2, gzip and PPMZ which are examples of the three main types of compressors: block-sorting, Lempel-Ziv, and statistic. en_US
dc.description.sponsorship This work was partially supported by grant TSI 2005- 08255-C07-06 of the Spanish Ministry of Education and Science. en_US
dc.format.extent 18 pág. es_ES
dc.format.mimetype application/pdf en
dc.language.iso eng en
dc.publisher International Press of Boston en_US
dc.relation.ispartof Communications in Information and Systems en_US
dc.rights © International Press 2005 en_US
dc.title Common Pitfalls Using the Normalized Compression Distance: What to Watch Out for in a Compressor en_US
dc.type article en_US
dc.subject.eciencia Informática es_ES
dc.relation.publisherversion http://dx.doi.org/10.4310/CIS.2005.v5.n4.a1
dc.identifier.doi 10.4310/CIS.2005.v5.n4.a1
dc.identifier.publicationfirstpage 367
dc.identifier.publicationissue 4
dc.identifier.publicationlastpage 384
dc.identifier.publicationvolume 5
dc.type.version info:eu-repo/semantics/publishedVersion en
dc.contributor.group Herramientas Interactivas Avanzadas (ING EPS-003) es_ES
dc.rights.accessRights openAccess en
dc.authorUAM Alfonseca Moreno, Manuel (258923)


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record