Cepstral trajectories in linguistic units for text-independent speaker recognition
Entity
UAM. Departamento de Tecnología Electrónica y de las ComunicacionesPublisher
Springer Berlin HeidelbergDate
2012-12Citation
10.1007/978-3-642-35292-8_3
Advances in Speech and Language Technologies for Iberian Languages: IberSPEECH 2012 Conference. Communications in Computer and Information Science, Volumen 328. Springer, 2012. 20-29
ISSN
1865-0929ISBN
978-3-642-35291-1 (print); 978-3-642-35292-8 (online)DOI
10.1007/978-3-642-35292-8_3Funded by
Supported by MEC grant PR-2010-123, MICINN project TEC09-14179, ForBayes project CCG10-UAM/TIC-5792 and Cátedra UAM-Telefónica.Editor's Version
http://dx.doi.org/10.1007/978-3-642-35292-8_3Subjects
Automatic speaker recognition; Cepstral trajectories; Forensic speaker identification; Linguistic units; Temporal contours; TelecomunicacionesNote
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-35292-8_3Proceedings of IberSPEECH, held in Madrid (Spain) on 2012.
Rights
© Springer-Verlag Berlin Heidelberg 2012Abstract
In this paper, the contributions of different linguistic units to the speaker recognition task are explored by means of temporal trajectories of their MFCC features. Inspired by successful work in forensic speaker identification, we extend the approach based on temporal contours of formant frequencies in linguistic units to design a fully automatic system that puts together both forensic and automatic speaker recognition worlds. The combination of MFCC features and unit-dependent trajectories provides a powerful tool to extract individualizing information. At a fine-grained level, we provide a calibrated likelihood ratio per linguistic unit under analysis (extremely useful in applications such as forensics), and at a coarse-grained level, we combine the individual contributions of the different units to obtain a highly discriminative single system. This approach has been tested with NIST SRE 2006 datasets and protocols, consisting of 9,720 trials from 219 male speakers for the 1side-1side English-only task, and development data being extracted from 367 male speakers from 1,808 conversations from NIST SRE 2004 and 2005 datasets
Files in this item
Google Scholar:Franco-Pedroso, Javier
-
Espinoza Cuadros, Fernando Manuel
-
González Rodríguez, Joaquín
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.