Loading…

A novel visual representation method for multi-dimensional sound scene analysis in source localization problem

Real-time source localization test for the moving multiple quadcopter drones. Spatial scene analysis using an open-access audio dataset for machine learning. [Display omitted] •A novel visual representation method for multi-dimensional sound scene analysis.•Representation of the estimated localizati...

Full description

Saved in:
Bibliographic Details
Published in:Mechanical systems and signal processing 2024-02, Vol.208, p.110977, Article 110977
Main Authors: Jung, In-Jee, Cho, Wan-Ho
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Real-time source localization test for the moving multiple quadcopter drones. Spatial scene analysis using an open-access audio dataset for machine learning. [Display omitted] •A novel visual representation method for multi-dimensional sound scene analysis.•Representation of the estimated localization result as RGB color channels.•Encoding with any localization algorithm in the preprocessing.•Possibility of adopting any localization method in the preprocessing for encoding.•Human-interpretable dataset capable of quantitative analysis with decoding.•Exported image file including metadata for decoding. A visual representation method, DoAgram, for multi-dimensional sound scene analysis is suggested. The visual representation of the sound source localization result gives intuitive information about the estimated one to the end user. Also, image-based deep learning is popularly used in the acoustic field nowadays, so such a visualized one can be used for data augmentation. To analyze the spatial sound scene for the moving source, the method displays the estimated azimuth angle and elevation angle of the source, and its corresponding time stamp and frequencies as RGB color channels and metadata by mapping the spatial coordinate to color space. Even though the suggested method is human-interpretable, decoding is needed for the quantitative analysis. Therefore, the time and frequency scanning method, and a histogram to estimate the DoA of the source are proposed. An experiment is conducted in an anechoic chamber to localize two quadcopter drones that have a mean angular velocity of 8°/s ± 9°/s (95 % CI) and 25°/s ± 31°/s (95 % CI), respectively, and the spatial sound scene analysis is implemented using the proposed methods. The test result shows that the trajectories with respect to the time of each source are well separated. Also, an additional test is conducted using an open-access audio dataset for machine learning. The cumulative source mapping method is adopted for the spatial sound scene analysis, and the decoded result shows that the DoAgram is feasible to adopt for machine learning applications.
ISSN:0888-3270
1096-1216
DOI:10.1016/j.ymssp.2023.110977