Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

Zazo, Ruben; Lozano Díez, Alicia; Gonzalez-Dominguez, Javier; Toledano, Doroteo T.; González Rodríguez, Joaquín

UAM_Biblioteca

Author

Zazo, Ruben; Lozano Díez, Alicia

; Gonzalez-Dominguez, Javier; Toledano, Doroteo T.; González Rodríguez, Joaquín

Entity

UAM. Departamento de Tecnología Electrónica y de las Comunicaciones

Publisher

Public Library of Science

Date

2016-01-29

Citation

PLOS ONE 11.1 (2016): e0146917

ISSN

1932-6203

DOI

10.1371/journal.pone.0146917

Funded by

This work has been supported by project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz (TEC2012-37585-C02-01), funded by Ministerio de Economia y Competitividad, Spain.

Project

Gobierno de España. TEC2012-37585-C02-01

Editor's Version

http://dx.doi.org/10.1371/journal.pone.0146917

Subjects

Telecomunicaciones

URI

http://hdl.handle.net/10486/674449

Note

Zazo R, Lozano-Diez A, Gonzalez-Dominguez J, T. Toledano D, Gonzalez-Rodriguez J (2016) Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks. PLoS ONE 11(1): e0146917. doi:10.1371/journal.pone.0146917

Rights

Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

Abstract

Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.

Show full item record

Files in this item

Name

language_zazo_PLOS_2016.PDF

Size

1.715Mb

Format

PDF

Google™ Scholar:Zazo, Ruben - Lozano Díez, Alicia - Gonzalez-Dominguez, Javier - Toledano, Doroteo T. - González Rodríguez, Joaquín

This item appears in the following Collection(s)

Producción científica en acceso abierto de la UAM [20478]

UAM_Biblioteca