Estimating Ensemble Location and Width in Binaural Recordings of Music with Convolutional Neural Networks
Binaural audio technology has been in existence for many years. However, its popularity has significantly increased over the past decade as a consequence of advancements in virtual reality and streaming techniques. Along with its growing popularity, the quantity of publicly accessible binaural audio...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Institute of Fundamental Technological Research Polish Academy of Sciences
2025-02-01
|
Series: | Archives of Acoustics |
Subjects: | |
Online Access: | https://acoustics.ippt.pan.pl/index.php/aa/article/view/4069 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Binaural audio technology has been in existence for many years. However, its popularity has significantly increased over the past decade as a consequence of advancements in virtual reality and streaming techniques. Along with its growing popularity, the quantity of publicly accessible binaural audio recordings has also expanded. Consequently, there is now a need for automated and objective retrieval of spatial content information, with ensemble location and width being the most prominent. This study presents a novel method for estimating these ensemble parameters in binaural recordings of music. For this purpose, a dataset of 23 040 binaural recordings was synthesized from 192 publicly-available music recordings using 30 head-related transfer functions. The synthesized excerpts were then used to train a multi-task spectrogram-based convolutional neural network model, aiming to estimate the ensemble location and width for unseen recordings. The results indicate that a model for estimating ensemble parameters can be successfully constructed with low prediction errors: 4.76° (±0.10°) for ensemble location and 8.57° (±0.19°) for ensemble width. The method developed in this study outperforms previous spatiogram-based techniques recently published in the literature and shows promise for future development as part of a novel tool for binaural audio recordings analysis. |
---|---|
ISSN: | 0137-5075 2300-262X |