Show simple item record

dc.contributor.authorTejedor Noguerales, Javier
dc.contributor.authorToledano, Doroteo T.
dc.contributor.authorWang, Dong
dc.contributor.authorKing, Simon
dc.contributor.authorColás Pasamontes, José 
dc.contributor.otherUAM. Departamento de Tecnología Electrónica y de las Comunicacioneses_ES
dc.date.accessioned2014-12-05T18:18:13Z
dc.date.available2014-12-05T18:18:13Z
dc.date.issued2014-09
dc.identifier.citationComputer Speech & Language 28.5 (2014): 1083-1114en_US
dc.identifier.issn0885-2308 (print)en_US
dc.identifier.issn1095-8363 (online)en_US
dc.identifier.urihttp://hdl.handle.net/10486/662781
dc.descriptionThis is the author’s version of a work that was accepted for publication in Computer Speech & Language. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Speech & Language, 28, 5, (2014) DOI: 10.1016/j.csl.2013.09.008en_US
dc.description.abstractDiscriminative confidence based on multi-layer perceptrons (MLPs) and multiple features has shown significant advantage compared to the widely used lattice-based confidence in spoken term detection (STD). Although the MLP-based framework can handle any features derived from a multitude of sources, choosing all possible features may lead to over complex models and hence less generality. In this paper, we design an extensive set of features and analyze their contribution to STD individually and as a group. The main goal is to choose a small set of features that are sufficiently informative while keeping the model simple and generalizable. We employ two established models to conduct the analysis: one is linear regression which targets for the most relevant features and the other is logistic linear regression which targets for the most discriminative features. We find the most informative features are comprised of those derived from diverse sources (ASR decoding, duration and lexical properties) and the two models deliver highly consistent feature ranks. STD experiments on both English and Spanish data demonstrate significant performance gains with the proposed feature sets.en_US
dc.description.sponsorshipThis work has been partially supported by project PriorSPEECH (TEC2009-14719-C02-01) from the Spanish Ministry of Science and Innovation and by project MAV2VICMR (S2009/TIC-1542) from the Community of Madrid.en_US
dc.format.extent58 pag.
dc.format.mimetypeapplication/pdfen
dc.language.isoengen
dc.publisherElsevier B.V.
dc.relation.ispartofComputer Speech and Languageen_US
dc.rights© 2014 Elsevier B.V. All rights reserveden_US
dc.subject.otherDiscriminative confidenceen_US
dc.subject.otherFeature analysisen_US
dc.subject.otherSpeech recognitionen_US
dc.subject.otherSpoken term detectionen_US
dc.titleFeature analysis for discriminative confidence estimation in spoken term detectionen_US
dc.typearticleen_US
dc.subject.ecienciaTelecomunicacioneses_ES
dc.date.embargoend2016-09-01
dc.relation.publisherversionhttp://dx.doi.org/10.1016/j.csl.2013.09.008
dc.identifier.doi10.1016/j.csl.2013.09.008
dc.identifier.publicationfirstpage1083
dc.identifier.publicationissue5
dc.identifier.publicationlastpage1114
dc.identifier.publicationvolume28
dc.relation.projectIDComunidad de Madrid. S2009/TIC-1542/MA2VICMRes_ES
dc.type.versioninfo:eu-repo/semantics/acceptedVersionen
dc.contributor.groupAnálisis y Tratamiento de Voz y Señales Biométricas (ING EPS-002)es_ES
dc.contributor.groupLaboratorio de Tecnología Hombre-Computador (ING EPS-010)es_ES
dc.rights.ccReconocimiento – NoComercial – SinObraDerivadaes_ES
dc.rights.accessRightsopenAccessen
dc.authorUAMTejedor Noguerales, Javier (261273)
dc.facultadUAMEscuela Politécnica Superior


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record