Bottleneck features based on gammatone frequency cepstral coefficients
Entity
UAM. Departamento de Tecnología Electrónica y de las ComunicacionesPublisher
International Speech Communication AssociationDate
2013Citation
INTERSPEECH 2013: 14th Annual Conference of the International Speech Communication Association. Ed. F. Bimbot, C. Cerisara, C. Fougeron, G. Gravier, L. Lamel, F. Pellegrino, and P. Perrier. ISCA, 2013. 1751-1755ISSN
1990-9772Editor's Version
http://www.isca-speech.org/archive/interspeech_2013/i13_1751.htmlSubjects
Gammatone filters; Bottleneck feature; Robust speech recognition; Informática; TelecomunicacionesRights
© 2013 ISCAAbstract
Recent work demonstrates impressive success of the bottleneck
(BN) feature in speech recognition, particularly with deep
networks plus appropriate pre-training. A widely admitted advantage
associated with the BN feature is that the network structure
can learn multiple environmental conditions with abundant
training data. For tasks with limited training data, however, this
multi-condition training is unavailable, and so the networks tend
to be over-fitted and sensitive to acoustic condition changes. A
possible solution is to base the BN features on a channel-robust
primary feature.
In this paper, we propose to derive the BN feature based
on Gammatone frequency cepstral coefficients (GFCCs). The
GFCC feature has shown nice robustness against acoustic
change, due to its capability of simulating the auditory system
of humans. The idea is to integrate the advantage of the
GFCC feature in acoustic robustness and the advantage of the
BN feature in signal representation, so that the BN feature can
be improved in the condition of mismatched training/test channels.
This is particularly useful for small-scale tasks for which
the training data are often limited. The experiments are conducted
on the WSJCAM0 database, where the test utterances
are mixed with noises at various SNR levels to simulate the
channel change. The results confirm that the GFCC-based BN
feature is much more robust than the BN features based on the
MFCC and the PLP. Furthermore, the primary GFCC feature
and the GFCC-based BN feature can be concatenated, leading
to a more robust combined feature which provides considerable
performance gains in all the tested noise conditions.
Files in this item
Google Scholar:Qi, Jun
-
Wang, Dong
-
Xu, Ji
-
Tejedor Noguerales, Javier
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.
-
Subspace models for bottleneck features
Qi, Jun; Wang, Dong; Tejedor Noguerales, Javier
2013