Loading…

Ensemble width estimation in HRTF-convolved binaural music recordings using an auditory model and a gradient-boosted decision trees regressor

Binaural audio recordings become increasingly popular in multimedia repositories, posing new challenges in indexing, searching, and retrieval of such excerpts in terms of their spatial audio scene characteristics. This paper presents a new method for the automatic estimation of one of the most impor...

Full description

Saved in:
Bibliographic Details
Published in:EURASIP journal on audio, speech, and music processing speech, and music processing, 2024-10, Vol.2024 (1), p.53-26, Article 53
Main Authors: Antoniuk, Paweł, Zieliński, Sławomir K., Lee, Hyunkook
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Binaural audio recordings become increasingly popular in multimedia repositories, posing new challenges in indexing, searching, and retrieval of such excerpts in terms of their spatial audio scene characteristics. This paper presents a new method for the automatic estimation of one of the most important spatial attributes of binaural recordings of music, namely “ensemble width.” The method has been developed using a repository of 23,040 binaural excerpts synthesized by convolving 192 multi-track music recordings with 30 sets of head-related transfer functions (HRTF). The synthesized excerpts represented various spatial distributions of music sound sources along a frontal semicircle in the horizontal plane. A binaural auditory model was exploited to derive the standard binaural cues from the synthesized excerpts, yielding a dataset representing interaural level and time differences, complemented by interaural cross-correlation coefficients. Subsequently, a regression method, based on gradient-boosted decision trees, was applied to the formerly calculated dataset to estimate ensemble width values. According to the obtained results, the mean absolute error of the ensemble width estimation averaged across experimental conditions amounts to 6.63° (SD 0.12°). The accuracy of the method is the highest for the recordings with ensembles narrower than 30°, yielding the mean absolute error ranging between 0.8° and 10.2°. The performance of the proposed algorithm is relatively uniform regardless of the horizontal position of an ensemble. However, its accuracy deteriorates for wider ensembles, with the error reaching 25.2° for the music ensembles spanning 90°. The developed method exhibits satisfactory generalization properties when evaluated both under music-independent and HRTF-independent conditions. The proposed method outperforms the technique based on “spatiograms” recently introduced in the literature.
ISSN:1687-4722
1687-4714
1687-4722
DOI:10.1186/s13636-024-00374-2