Mañana, JUEVES, 24 DE ABRIL, el sistema se apagará debido a tareas habituales de mantenimiento a partir de las 9 de la mañana. Lamentamos las molestias.

Show simple item record

dc.contributor.authorGonzález Domínguez, Javier
dc.contributor.authorLópez-Moreno, Ignacio
dc.contributor.authorMoreno, Pedro J.
dc.contributor.authorGonzález Rodríguez, Joaquín 
dc.contributor.otherUAM. Departamento de Tecnología Electrónica y de las Comunicacioneses_ES
dc.date.accessioned2016-10-18T17:43:15Z
dc.date.available2016-10-18T17:43:15Z
dc.date.issued2015-04-01
dc.identifier.citationNeural Networks 64 (2015): 49-58en_US
dc.identifier.issn0893-6080 (print)en_US
dc.identifier.issn1879-2782 (online)en_US
dc.identifier.urihttp://hdl.handle.net/10486/674271
dc.descriptionThis is the author’s version of a work that was accepted for publication in Neural Networks. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neural Networks, VOL 64, (2015) DOI 10.1016/j.neunet.2014.08.006en_US
dc.description.abstractThis work addresses the use of deep neural networks (DNNs) in automatic language identification (LID) focused on short test utterances. Motivated by their recent success in acoustic modelling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from the short-term acoustic features. We show how DNNs are particularly suitable to perform LID in real-time applications, due to their capacity to emit a language identification posterior at each new frame of the test utterance. We then analyse different aspects of the system, such as the amount of required training data, the number of hidden layers, the relevance of contextual information and the effect of the test utterance duration. Finally, we propose several methods to combine frame-by-frame posteriors. Experiments are conducted on two different datasets: the public NIST Language Recognition Evaluation 2009 (3 s task) and a much larger corpus (of 5 million utterances) known as Google 5M LID, obtained from different Google Services. Reported results show relative improvements of DNNs versus the i-vector system of 40% in LRE09 3 second task and 76% in Google 5M LID.en_US
dc.format.extent16 pag.es_ES
dc.format.mimetypeapplication/pdfen
dc.language.isoengen
dc.publisherElsevier Ltden_US
dc.relation.ispartofNeural Networksen_US
dc.rights© 2015 Elsevier B.V. All rights reserveden_US
dc.subject.otherDNNsen_US
dc.subject.otherI-vectorsen_US
dc.subject.otherReal-time LIDen_US
dc.titleFrame-by-frame language identification in short utterances using deep neural networksen_US
dc.typearticleen_US
dc.subject.ecienciaTelecomunicacioneses_ES
dc.date.embargoend2017-04-01
dc.relation.publisherversionhttp://dx.doi.org/10.1016/j.neunet.2014.08.006
dc.identifier.doi10.1016/j.neunet.2014.08.006
dc.identifier.publicationfirstpage49
dc.identifier.publicationlastpage58
dc.identifier.publicationvolume64
dc.type.versioninfo:eu-repo/semantics/acceptedVersionen
dc.contributor.groupAnálisis y Tratamiento de Voz y Señales Biométricas (ING EPS-002)es_ES
dc.rights.accessRightsopenAccessen
dc.authorUAMGonzález Domínguez, Javier (261826)
dc.facultadUAMEscuela Politécnica Superior


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record