Mañana, JUEVES, 24 DE ABRIL, el sistema se apagará debido a tareas habituales de mantenimiento a partir de las 9 de la mañana. Lamentamos las molestias.
Common Pitfalls Using the Normalized Compression Distance: What to Watch Out for in a Compressor
Entity
UAM. Departamento de Ingeniería InformáticaPublisher
International Press of BostonDate
2005Citation
10.4310/CIS.2005.v5.n4.a1
Communications in Information and Systems 5.4 (2005): 367-384
ISSN
1526-7555 (print); 2163-4548 (online)DOI
10.4310/CIS.2005.v5.n4.a1Funded by
This work was partially supported by grant TSI 2005- 08255-C07-06 of the Spanish Ministry of Education and Science.Editor's Version
http://dx.doi.org/10.4310/CIS.2005.v5.n4.a1Subjects
InformáticaRights
© International Press 2005Abstract
Using the mathematical background for algorithmic complexity developed by Kolmogorov in the sixties, Cilibrasi and Vitanyi have designed a similarity distance named normalized compression distance applicable to the clustering of objects of any kind, such as music, texts or gene sequences. The normalized compression distance is a quasi-universal normalized admissible distance under certain conditions. This paper shows that the compressors used to compute the normalized compression distance are not idempotent in some cases, being strongly skewed with the size of the objects and window size, and therefore causing a deviation in the identity property of the distance if we don't take care that the objects to be compressed fit the windows. The relationship underlying the precision of the distance and the size of the objects has been analyzed for several well-known compressors, and specially in depth for three cases, bzip2, gzip and PPMZ which are examples of the three main types of compressors: block-sorting, Lempel-Ziv, and statistic.
Files in this item
Google Scholar:Cebrián Ramos, Manuel
-
Alfonseca, Manuel
-
Ortega de la Puente, Alfonso
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.