Loading…

A TCN-based Primary Ambient Extraction in Generating Ambisonics Audio from Panorama Video

Spatial audio is one of the most essential parts of immersive audio-visual experience such as virtual reality (VR), which reproduces the inherent spatiality of sound and the correspondence of audio-visual experience. Ambisonics is the dominant spatial audio solution due to its flexibility and fideli...

Full description

Saved in:
Bibliographic Details
Main Authors: Lv, Zhuliang, Zhou, Yi, Liu, Hongqing, Shu, Xiaofeng, Zhang, Nannan
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spatial audio is one of the most essential parts of immersive audio-visual experience such as virtual reality (VR), which reproduces the inherent spatiality of sound and the correspondence of audio-visual experience. Ambisonics is the dominant spatial audio solution due to its flexibility and fidelity. However, the production of Ambisonics audio is difficult for the public because of the requirements of expensive equipments or professional music production ability. In this work, an end-to-end Ambisonics generator for panorama video is proposed. To improve the perception of directional sound, we assume that sound field is composed of a primary sound source and an ambient sound without spatiality, and a Temporal Convolutional Network (TCN) based Primary Ambient Extractor (PAE) is proposed to separate the two parts of sound field. The directional sound is spatially encoded by the weights from audio-visual fusion network added by ambient part. Our network is evaluated with panorama video clips with first order Ambisonics. The results show that the proposed approach outperforms other methods in terms of objective evaluations.
ISSN:2641-5542
DOI:10.1109/ISSPIT51521.2020.9408696