Detecting the same text in different languages
Entity
UAM. Departamento de Ingeniería InformáticaPublisher
IEEEDate
2006Citation
10.1109/ITW2.2006.323816
Information Theory Workshop, 2006. ITW '06 Punta del Este. IEEE, 2006. 337-341
ISBN
1-4244-0035-X (print); 1-4244-0036-8 (online)DOI
10.1109/ITW2.2006.323816Funded by
This work was partially supported by grant TIN 2004-07676-G01 of the Spanish Ministry of Education and Culture. Partially supported by grant TSI 2005-08255-C07-06 of the Spanish Ministry of Education and CultureEditor's Version
http://dx.doi.org/10.1109/ITW.2006.322834Subjects
Compression algorithms; Computer science education; Data mining; Entropy; H infinity control; Humans; Length measurement; Testing; Tin; Topology; InformáticaNote
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. K. Koroutchev, and M. Cebrian, “Detecting the same text in different languages”, in Information Theory Workshop, 2006. ITW '06 Punta del Este. IEEE, Punta del Este, Uruguay, 2006, pp. 337-341Rights
© 2006 IEEEAbstract
Compression based similarity distances have the main drawback of needing the same coding scheme for the objects to be compared. When two texts are translated, there exists significant similarity with no literal coincidence. In this article, we present an algorithm that compares the redundancy structure of the data extracted by means of a Lempel- Ziv compression scheme. Each text is presented as a graph and two texts are considered similar with our measure if they have the same referential topology when compressed. We give empirical evidence that this measure detects similarity between data coded in different languages.
Files in this item
Google Scholar:Nedeltchev Koroutchev, Kostadin
-
Cebrián Ramos, Manuel
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.