Frame-by-frame language identification in short utterances using deep neural networks

González Domínguez, Javier; López-Moreno, Ignacio; Moreno, Pedro J.; González Rodríguez, Joaquín

UAM_Biblioteca

Mañana, JUEVES, 24 DE ABRIL, el sistema se apagará debido a tareas habituales de mantenimiento a partir de las 9 de la mañana. Lamentamos las molestias.

Show simple item record

dc.contributor.author	González Domínguez, Javier
dc.contributor.author	López-Moreno, Ignacio
dc.contributor.author	Moreno, Pedro J.
dc.contributor.author	González Rodríguez, Joaquín
dc.contributor.other	UAM. Departamento de Tecnología Electrónica y de las Comunicaciones	es_ES
dc.date.accessioned	2016-10-18T17:43:15Z
dc.date.available	2016-10-18T17:43:15Z
dc.date.issued	2015-04-01
dc.identifier.citation	Neural Networks 64 (2015): 49-58	en_US
dc.identifier.issn	0893-6080 (print)	en_US
dc.identifier.issn	1879-2782 (online)	en_US
dc.identifier.uri	http://hdl.handle.net/10486/674271
dc.description	This is the author’s version of a work that was accepted for publication in Neural Networks. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neural Networks, VOL 64, (2015) DOI 10.1016/j.neunet.2014.08.006	en_US
dc.description.abstract	This work addresses the use of deep neural networks (DNNs) in automatic language identification (LID) focused on short test utterances. Motivated by their recent success in acoustic modelling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from the short-term acoustic features. We show how DNNs are particularly suitable to perform LID in real-time applications, due to their capacity to emit a language identification posterior at each new frame of the test utterance. We then analyse different aspects of the system, such as the amount of required training data, the number of hidden layers, the relevance of contextual information and the effect of the test utterance duration. Finally, we propose several methods to combine frame-by-frame posteriors. Experiments are conducted on two different datasets: the public NIST Language Recognition Evaluation 2009 (3 s task) and a much larger corpus (of 5 million utterances) known as Google 5M LID, obtained from different Google Services. Reported results show relative improvements of DNNs versus the i-vector system of 40% in LRE09 3 second task and 76% in Google 5M LID.	en_US
dc.format.extent	16 pag.	es_ES
dc.format.mimetype	application/pdf	en
dc.language.iso	eng	en
dc.publisher	Elsevier Ltd	en_US
dc.relation.ispartof	Neural Networks	en_US
dc.rights	© 2015 Elsevier B.V. All rights reserved	en_US
dc.subject.other	DNNs	en_US
dc.subject.other	I-vectors	en_US
dc.subject.other	Real-time LID	en_US
dc.title	Frame-by-frame language identification in short utterances using deep neural networks	en_US
dc.type	article	en_US
dc.subject.eciencia	Telecomunicaciones	es_ES
dc.date.embargoend	2017-04-01
dc.relation.publisherversion	http://dx.doi.org/10.1016/j.neunet.2014.08.006
dc.identifier.doi	10.1016/j.neunet.2014.08.006
dc.identifier.publicationfirstpage	49
dc.identifier.publicationlastpage	58
dc.identifier.publicationvolume	64
dc.type.version	info:eu-repo/semantics/acceptedVersion	en
dc.contributor.group	Análisis y Tratamiento de Voz y Señales Biométricas (ING EPS-002)	es_ES
dc.rights.accessRights	openAccess	en
dc.authorUAM	González Domínguez, Javier (261826)
dc.facultadUAM	Escuela Politécnica Superior

Files in this item

Name:: frame_gonzalez-dominguez_NN_20 ...
Size:: 585.6Kb
Format:: PDF

This item appears in the following Collection(s)

Producción científica en acceso abierto de la UAM [20383]

Show simple item record

UAM_Biblioteca

Frame-by-frame language identification in short utterances using deep neural networks

Files in this item

This item appears in the following Collection(s)

Related items

Automatic language identification using deep neural networks ﻿

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks ﻿

An end-to-end approach to language identification in short utterances using convolutional neural networks ﻿

Automatic language identification using deep neural networks

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

An end-to-end approach to language identification in short utterances using convolutional neural networks