Show simple item record

dc.contributor.authorQi, Jun
dc.contributor.authorWang, Dong
dc.contributor.authorXu, Ji
dc.contributor.authorTejedor Noguerales, Javier
dc.contributor.otherUAM. Departamento de Tecnología Electrónica y de las Comunicacioneses_ES
dc.date.accessioned2015-05-27T16:34:49Z
dc.date.available2015-05-27T16:34:49Z
dc.date.issued2013
dc.identifier.citationINTERSPEECH 2013: 14th Annual Conference of the International Speech Communication Association. Ed. F. Bimbot, C. Cerisara, C. Fougeron, G. Gravier, L. Lamel, F. Pellegrino, and P. Perrier. ISCA, 2013. 1751-1755en_US
dc.identifier.issn1990-9772
dc.identifier.urihttp://hdl.handle.net/10486/666441
dc.description.abstractRecent work demonstrates impressive success of the bottleneck (BN) feature in speech recognition, particularly with deep networks plus appropriate pre-training. A widely admitted advantage associated with the BN feature is that the network structure can learn multiple environmental conditions with abundant training data. For tasks with limited training data, however, this multi-condition training is unavailable, and so the networks tend to be over-fitted and sensitive to acoustic condition changes. A possible solution is to base the BN features on a channel-robust primary feature. In this paper, we propose to derive the BN feature based on Gammatone frequency cepstral coefficients (GFCCs). The GFCC feature has shown nice robustness against acoustic change, due to its capability of simulating the auditory system of humans. The idea is to integrate the advantage of the GFCC feature in acoustic robustness and the advantage of the BN feature in signal representation, so that the BN feature can be improved in the condition of mismatched training/test channels. This is particularly useful for small-scale tasks for which the training data are often limited. The experiments are conducted on the WSJCAM0 database, where the test utterances are mixed with noises at various SNR levels to simulate the channel change. The results confirm that the GFCC-based BN feature is much more robust than the BN features based on the MFCC and the PLP. Furthermore, the primary GFCC feature and the GFCC-based BN feature can be concatenated, leading to a more robust combined feature which provides considerable performance gains in all the tested noise conditions.en_US
dc.format.extent5 pág.es_ES
dc.format.mimetypeapplication/pdfen
dc.language.isoengen
dc.publisherInternational Speech Communication Associationen_US
dc.relation.ispartofInterspeechen_US
dc.rights© 2013 ISCAen_US
dc.subject.otherGammatone filtersen_US
dc.subject.otherBottleneck featureen_US
dc.subject.otherRobust speech recognitionen_US
dc.titleBottleneck features based on gammatone frequency cepstral coefficientsen_US
dc.typeconferenceObjecten
dc.subject.ecienciaInformáticaes_ES
dc.subject.ecienciaTelecomunicacioneses_ES
dc.relation.publisherversionhttp://www.isca-speech.org/archive/interspeech_2013/i13_1751.html
dc.identifier.publicationfirstpage1751
dc.identifier.publicationlastpage1755
dc.relation.eventdateAugust 25-29, 2013en_US
dc.relation.eventnumber14
dc.relation.eventplaceLyon (France)en_US
dc.relation.eventtitle14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013en_US
dc.type.versioninfo:eu-repo/semantics/publishedVersionen
dc.contributor.groupLaboratorio de Tecnología Hombre-Computador (ING EPS-010)es_ES
dc.rights.accessRightsopenAccessen
dc.authorUAMTejedor Noguerales, Javier (261273)
dc.facultadUAMEscuela Politécnica Superior


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record