An analysis of sound event detection under acoustic degradation using multi-resolution systems
Entity
UAM. Departamento de Tecnología Electrónica y de las ComunicacionesPublisher
MDPIDate
2021-12-06Citation
10.3390/app112311561
Applied Sciences-Basel 11.23 (2021): 11561
ISSN
2076-3417 (online)DOI
10.3390/app112311561Funded by
This research and the APC were supported by project DSForSec (grant number RTI2018- 098091-B-I00) funded by the Ministry of Science, Innovation and Universities of Spain and the European Regional Development Fund (ERDF)Project
Gobierno de España. RTI2018- 098091-B-I00Editor's Version
https://doi.org/10.3390/app112311561Subjects
Acoustic degradation; DCASE challenge 2020; Multiresolution; Sound event detection; TelecomunicacionesRights
© The author(s)Abstract
The Sound Event Detection task aims to determine the temporal locations of acoustic events in audio clips. In recent years, the relevance of this field is rising due to the introduction of datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). In this paper, we analyze the performance of Sound Event Detection systems under diverse artificial acoustic conditions such as high-or low-pass filtering and clipping or dynamic range compression, as well as under an scenario of high overlap between events. For this purpose, the audio was obtained from the Evaluation subset of the DESED dataset, whereas the systems were trained in the context of the DCASE Challenge 2020 Task 4. Our systems are based upon the challenge baseline, which consists of a Convolutional-Recurrent Neural Network trained using the Mean Teacher method, and they employ a multiresolution approach which is able to improve the Sound Event Detection performance through the use of several resolutions during the extraction of Mel-spectrogram features. We provide insights on the benefits of this multiresolution approach in different acoustic settings, and compare the performance of the single-resolution systems in the aforementioned scenarios when using different resolutions. Furthermore, we complement the analysis of the performance in the high-overlap scenario by assessing the degree of overlap of each event category in sound event detection datasets
Files in this item
Google Scholar:de Benito-Gorrón, Diego
-
Ramos Castro, Daniel
-
Toledano, Doroteo T.
This item appears in the following Collection(s)
Related items
Showing items related by title, author, creator and subject.