Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009

González Domínguez, Javier; López Moreno, Ignacio; Franco-Pedroso, Javier; Ramos Castro, Daniel; Toledano, Doroteo T.; González Rodríguez, Joaquín

UAM_Biblioteca

dc.contributor.author	González Domínguez, Javier
dc.contributor.author	López Moreno, Ignacio
dc.contributor.author	Franco-Pedroso, Javier
dc.contributor.author	Ramos Castro, Daniel
dc.contributor.author	Toledano, Doroteo T.
dc.contributor.author	González Rodríguez, Joaquín
dc.contributor.other	UAM. Departamento de Tecnología Electrónica y de las Comunicaciones	es_ES
dc.date.accessioned	2015-05-07T11:23:13Z
dc.date.available	2015-05-07T11:23:13Z
dc.date.issued	2010-12-01
dc.identifier.citation	IEEE Journal of Selected Topics in Signal Processing 4.6 (2010): 1084 – 1093	en_US
dc.identifier.issn	1932-4553 (print)	en_US
dc.identifier.issn	1941-0484 (online)	en_US
dc.identifier.uri	http://hdl.handle.net/10486/666039
dc.description	Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. Gonzalez-Dominguez, I. Lopez-Moreno, J. Franco-Pedroso, D. Ramos, D. T. Toledano, and J. Gonzalez-Rodriguez, "Multilevel and Session Variability Compensated Language Recognition: ATVS-UAM Systems at NIST LRE 2009" IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 6, pp. 1084 – 1093, December 2010	en_US
dc.description.abstract	This work presents the systems submitted by the ATVS Biometric Recognition Group to the 2009 Language Recognition Evaluation (LRE’09), organized by NIST. New challenges included in this LRE edition can be summarized by three main differences with respect to past evaluations. Firstly, the number of languages to be recognized expanded to 23 languages from 14 in 2007, and 7 in 2005. Secondly, the data variability has been increased by including telephone speech excerpts extracted from Voice of America (VOA) radio broadcasts through Internet in addition to Conversational Telephone Speech (CTS). The third difference was the volume of data, involving in this evaluation up to 2 terabytes of speech data for development, which is an order of magnitude greater than past evaluations. LRE’09 thus required participants to develop robust systems able not only to successfully face the session variability problem but also to do it with reasonable computational resources. ATVS participation consisted of state-of-the-art acoustic and high-level systems focussing on these issues. Furthermore, the problem of finding a proper combination and calibration of the information obtained at different levels of the speech signal was widely explored in this submission. In this work, two original contributions were developed. The first contribution was applying a session variability compensation scheme based on Factor Analysis (FA) within the statistics domain into a SVM-supervector (SVM-SV) approach. The second contribution was the employment of a novel backend based on anchor models in order to fuse individual systems prior to one-vs-all calibration via logistic regression. Results both in development and evaluation corpora show the robustness and excellent performance of the submitted systems, exemplified by our system ranked 2nd in the 30 second open-set condition, with remarkably scarce computational resources.	en_US
dc.description.sponsorship	This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-01. Javier Gonzalez-Dominguez also thanks Spanish Ministry of Education for supporting his doctoral research under project TEC2006-13141-C03-03. Special thanks are given to Dr. David Van Leeuwen from TNO Human Factors (Utrech, The Netherlands) for his strong collaboration, valuable discussions and ideas. Also, authors thank to Dr. Patrick Lucey for his final support on (non-target) Australian English review of the manuscript.	en_US
dc.format.extent	11 pág.	es_ES
dc.format.mimetype	application/pdf	en
dc.language.iso	eng	en
dc.publisher	IEEE	en_US
dc.relation.ispartof	IEEE Journal on Selected Topics in Signal Processing	en_US
dc.rights	© 2010 IEEE	en_US
dc.subject.other	Anchor models	en_US
dc.subject.other	Calibration	en_US
dc.subject.other	Factor analysis (FA)	en_US
dc.subject.other	Language recognition	en_US
dc.subject.other	Linear scoring	en_US
dc.subject.other	Sufficient statistics	en_US
dc.title	Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009	en_US
dc.type	article	en_US
dc.subject.eciencia	Telecomunicaciones	es_ES
dc.relation.publisherversion	http://dx.doi.org/10.1109/JSTSP.2010.2076071
dc.identifier.doi	10.1109/JSTSP.2010.2076071
dc.identifier.publicationfirstpage	1084
dc.identifier.publicationissue	6
dc.identifier.publicationlastpage	1093
dc.identifier.publicationvolume	4
dc.type.version	info:eu-repo/semantics/acceptedVersion	en
dc.contributor.group	Análisis y Tratamiento de Voz y Señales Biométricas (ING EPS-002)	es_ES
dc.rights.accessRights	openAccess	en
dc.authorUAM	González Domínguez, Javier (261826)
dc.facultadUAM	Escuela Politécnica Superior

Files in this item

Name:: multilevel_gonzalez-dominguez_ ...
Size:: 1.688Mb
Format:: PDF

This item appears in the following Collection(s)

Producción científica en acceso abierto de la UAM [20343]

Show simple item record

UAM_Biblioteca

Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009

Files in this item

This item appears in the following Collection(s)

Related items

ATVS-UAM NIST LRE 2009 System Description ﻿

ATVS-UAM NIST SRE 2010 System Description ﻿

A linguistically-motivated speaker recognition front-end through session variability compensated cepstral trajectories in phone units ﻿

ATVS-UAM NIST LRE 2009 System Description

ATVS-UAM NIST SRE 2010 System Description

A linguistically-motivated speaker recognition front-end through session variability compensated cepstral trajectories in phone units