Loading…

Improving Acoustic Echo Cancellation by Exploring Speech and Echo Affinity with Multi-Head Attention

Deep learning-based approaches formulate acoustic echo cancellation (AEC) as a supervised speech separation task, where the mixture signal and the far-end signal are combined directly before or after the encoding stage. However, the mixture signal and the far-end signal are not integrated sufficient...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhang, Yiqun, Xu, Xinmeng, Tu, Weiping
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep learning-based approaches formulate acoustic echo cancellation (AEC) as a supervised speech separation task, where the mixture signal and the far-end signal are combined directly before or after the encoding stage. However, the mixture signal and the far-end signal are not integrated sufficiently due to the lack of interpretability for the affinity between speech and echo in a noisy mixture. In this paper, we propose DCA-Net, a dual-branch cross-attention neural network, to improve AEC performance by exploring the affinities between speech and echo in the representation space. In particular, the two branches predict speech and echo, respectively, and an interaction module is designed at several intermediate feature domains between the two branches to learn the correlations between these features of the two branches. Such an interaction can leverage features learned from one branch to restore missing information or counteract undesired information of the other by calculating the similarity between these features of two branches using multi-head cross attention. Evaluation results show that the proposed DCA-Net effectively suppresses acoustic echo and noise while preserving good speech quality.
ISSN:2379-190X
DOI:10.1109/ICASSP48485.2024.10446389