A linguistically-motivated speaker recognition front-end through session variability compensated cepstral trajectories in phone units
Entity
UAM. Departamento de Tecnología Electrónica y de las ComunicacionesPublisher
IEEEDate
2012Citation
10.1109/ICASSP.2012.6288892
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012. 4389 - 4392
ISSN
1520-6149ISBN
978-1-4673-0045-2 (print); 978-1-4673-0044-5 (online)DOI
10.1109/ICASSP.2012.6288892Funded by
Supported by MEC grant PR-2010-123, MICINN project TEC09-14179, ForBayes project CCG10-UAM/TIC-5792 and Cátedra UAM-Telefónica.Editor's Version
http://dx.doi.org/10.1109/ICASSP.2012.6288892Subjects
Speaker recognition; Linguistic units; Temporal trajectories; Session variability; Feature compensation; TelecomunicacionesNote
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. González-Rodríguez, J. González-Domínguez, J. Franco-Pedroso, D. Ramos, "A linguistically-motivated speaker recognition front-end through session variability compensated cepstral trajectories in phone units" in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto (Japan), 2012, 4389 - 4392Rights
© 2012 IEEEAbstract
In this paper a new linguistically-motivated front-end is presented showing major performance improvements from the use of session variability compensated cepstral trajectories in phone units. Extending our recent work on temporal contours in linguistic units (TCLU), we have combined the potential of those unit-dependent trajectories with the ability of feature domain factor analysis techniques to compensate session variability effects, which has resulted in consistent and discriminant phone-dependent trajectories across different recording sessions. Evaluating with NIST SRE04 English-only 1s1s task, we report EERs as low as 5.40% from the trajectories in a single phone, with 29 different phones producing each of them EERs smaller than 10%, and additionally showing an excellent calibration performance per unit. The combination of different units shows significant complementarity reporting EERs as 1.63% (100×DCF=0.732) from a simple sum fusion of 23 best phones, or 0.68% (100×DCF=0.304) when fusing them through logistic regression.
Files in this item
Google Scholar:González Rodríguez, Joaquín
-
González Domínguez, Javier
-
Franco-Pedroso, Javier
-
Ramos Castro, Daniel
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.