A linguistically-motivated speaker recognition front-end through session variability compensated cepstral trajectories in phone units

González Rodríguez, Joaquín; González Domínguez, Javier; Franco-Pedroso, Javier; Ramos Castro, Daniel

UAM_Biblioteca

Author

González Rodríguez, Joaquín

; González Domínguez, Javier; Franco-Pedroso, Javier; Ramos Castro, Daniel

Entity

UAM. Departamento de Tecnología Electrónica y de las Comunicaciones

Publisher

IEEE

Date

2012

Citation

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012. 4389 - 4392

ISSN

1520-6149

ISBN

978-1-4673-0045-2 (print); 978-1-4673-0044-5 (online)

DOI

10.1109/ICASSP.2012.6288892

Funded by

Supported by MEC grant PR-2010-123, MICINN project TEC09-14179, ForBayes project CCG10-UAM/TIC-5792 and Cátedra UAM-Telefónica.

Editor's Version

http://dx.doi.org/10.1109/ICASSP.2012.6288892

Subjects

Speaker recognition; Linguistic units; Temporal trajectories; Session variability; Feature compensation; Telecomunicaciones

URI

http://hdl.handle.net/10486/664867

Note

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. González-Rodríguez, J. González-Domínguez, J. Franco-Pedroso, D. Ramos, "A linguistically-motivated speaker recognition front-end through session variability compensated cepstral trajectories in phone units" in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto (Japan), 2012, 4389 - 4392

Rights

Abstract

In this paper a new linguistically-motivated front-end is presented showing major performance improvements from the use of session variability compensated cepstral trajectories in phone units. Extending our recent work on temporal contours in linguistic units (TCLU), we have combined the potential of those unit-dependent trajectories with the ability of feature domain factor analysis techniques to compensate session variability effects, which has resulted in consistent and discriminant phone-dependent trajectories across different recording sessions. Evaluating with NIST SRE04 English-only 1s1s task, we report EERs as low as 5.40% from the trajectories in a single phone, with 29 different phones producing each of them EERs smaller than 10%, and additionally showing an excellent calibration performance per unit. The combination of different units shows significant complementarity reporting EERs as 1.63% (100×DCF=0.732) from a simple sum fusion of 23 best phones, or 0.68% (100×DCF=0.304) when fusing them through logistic regression.

Show full item record

Files in this item

Name

linguistically-motivated_gonzalez-rodriguez_ICASSP_2012_ps.pdf

Size

387.3Kb

Format

PDF

Google™ Scholar:González Rodríguez, Joaquín - González Domínguez, Javier - Franco-Pedroso, Javier - Ramos Castro, Daniel

This item appears in the following Collection(s)

Producción científica en acceso abierto de la UAM [20478]

UAM_Biblioteca