Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
Entity
UAM. Departamento de Tecnología Electrónica y de las ComunicacionesPublisher
Public Library of ScienceDate
2016-01-29Citation
10.1371/journal.pone.0146917
PLOS ONE 11.1 (2016): e0146917
ISSN
1932-6203DOI
10.1371/journal.pone.0146917Funded by
This work has been supported by project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz (TEC2012-37585-C02-01), funded by Ministerio de Economia y Competitividad, Spain.Project
Gobierno de España. TEC2012-37585-C02-01Editor's Version
http://dx.doi.org/10.1371/journal.pone.0146917Subjects
TelecomunicacionesNote
Zazo R, Lozano-Diez A, Gonzalez-Dominguez J, T. Toledano D, Gonzalez-Rodriguez J (2016) Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks. PLoS ONE 11(1): e0146917. doi:10.1371/journal.pone.0146917Rights
© 2016 Zazo et al.Abstract
Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.
Files in this item
Google Scholar:Zazo, Ruben
-
Lozano Díez, Alicia
-
Gonzalez-Dominguez, Javier
-
Toledano, Doroteo T.
-
González Rodríguez, Joaquín
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.