UAM | UAM_Biblioteca | Unified search engine | Scientific Production Portal | UAM Research Data Repository
Biblos-e Archivo
    • español
    • English
  • English 
    • español
    • English
  • Log in
JavaScript is disabled for your browser. Some features of this site may not work without it.

Search Biblos-e Archivo

Advanced Search

Browse

All of Biblos-e ArchivoCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsFacultiesThis CollectionBy Issue DateAuthorsTitlesSubjectsFaculties

My Account

Log inRegister

Statistics

View Usage Statistics

Help

Information about Biblos-e ArchivoI want to submit my workFrequently Asked Questions

UAM_Biblioteca

View Item 
  •   Biblos-e Archivo
  • 1 - Producción científica en acceso abierto de la UAM
  • Producción científica en acceso abierto de la UAM
  • View Item
  •   Biblos-e Archivo
  • 1 - Producción científica en acceso abierto de la UAM
  • Producción científica en acceso abierto de la UAM
  • View Item

A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

Author
De Benito-Gorron, Diego; Ramos Castro, Danieluntranslated; Toledano, Doroteo T.
Entity
UAM. Departamento de Tecnología Electrónica y de las Comunicaciones
Publisher
Institute of Electrical and Electronics Engineers Inc. (IEEE)
Date
2021-06-14
Citation
10.1109/ACCESS.2021.3088949
IEEE Access 9 (2021): 89029-89042
 
 
 
ISSN
2169-3536 (online)
DOI
10.1109/ACCESS.2021.3088949
Funded by
This work was supported in part by the Project Deep Speech for Forensics and Security (DSForSec) under Grant RTI2018-098091-B-I00, in part by the Ministry of Science, Innovation and Universities of Spain, and in part by the European Regional Development Fund (ERDF)
Project
Gobierno de España. RTI2018-098091-B-I00
Editor's Version
https://doi.org/10.1109/ACCESS.2021.3088949
Subjects
DCASE 2020 Task 4; multi-resolution; Sound event detection; Telecomunicaciones
URI
http://hdl.handle.net/10486/701107
Rights
© The author(s)

Licencia Creative Commons
Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

Abstract

Sound Event Detection is a task with a rising relevance over the recent years in the field of audio signal processing, due to the creation of specific datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and the introduction of competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). The different categories of acoustic events can present diverse temporal and spectral characteristics. However, most approaches use a fixed time-frequency resolution to represent the audio segments. This work proposes a multi-resolution analysis for feature extraction in Sound Event Detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems. Experiments are carried out over the DESED dataset in the context of the DCASE 2020 Challenge, concluding that the combination of up to 5 resolutions allows a neural network-based system to obtain better results than single-resolution models in terms of event-based F1-score in every event category and in terms of PSDS (Polyphonic Sound Detection Score). Furthermore, we analyze the impact of score thresholding in the computation of F1-score results, finding that the standard value of 0.5 is suboptimal and proposing an alternative strategy based in the use of a specific threshold for each event category, which obtains further improvements in performance
Show full item record

Files in this item

Thumbnail
Name
8985608.pdf
Size
1.719Mb
Format
PDF

Refworks Export

Google™ Scholar:De Benito-Gorron, Diego - Ramos Castro, Daniel - Toledano, Doroteo T.

This item appears in the following Collection(s)

  • Producción científica en acceso abierto de la UAM [17777]

Related items

Showing items related by title, author, creator and subject.

  • An analysis of sound event detection under acoustic degradation using multi-resolution systems 

    de Benito-Gorrón, Diego; Ramos Castro, DanielAutoridad UAM; Toledano, Doroteo T.
    2021-12-06
  • Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset 

    Benito Gorrón, Diego deAutoridad UAM; Lozano Díez, AliciaAutoridad UAM; Toledano, Doroteo T.; González Rodríguez, JoaquínAutoridad UAM
    2019-06-17
  • Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT 

    Toledano, Doroteo T.; Fernández-Gallego, María Pilar; Lozano Díez, AliciaAutoridad UAM
    2018-10-01
All the documents from Biblos-e Archivo are protected by copyrights. Some rights reserved.
Universidad Autónoma de Madrid. Biblioteca
Contact Us | Send Feedback
We are onFacebookCanal BiblosYouTubeTwitterPinterestWhatsappInstagram

Declaración de accesibilidad

 

 

All the documents from Biblos-e Archivo are protected by copyrights. Some rights reserved.
Universidad Autónoma de Madrid. Biblioteca
Contact Us | Send Feedback
We are onFacebookCanal BiblosYouTubeTwitterPinterestWhatsappInstagram

Declaración de accesibilidad