Loading…

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues,...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-10
Main Authors: Wang, Ruoyu, He, Maokui, Du, Jun, Zhou, Hengshun, Niu, Shutong, Chen, Hang, Yue, Yanyan, Yang, Gaobin, Wu, Shilong, Sun, Lei, Tu, Yanhui, Tang, Haitao, Qian, Shuangqing, Gao, Tian, Wang, Mengzhi, Wan, Genshun, Pan, Jia, Gao, Jianqing, Chin-Hui, Lee
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy based on multi-channel spatial information. This approach significantly diminished the word error rates (WER). In terms of recognition, we utilized publicly available pre-trained models as the foundational models to train our end-to-end speech recognition models. Our system attained a Macro-averaged diarization-attributed WER (DA-WER) of 21.01% on the CHiME-7 evaluation set, which signifies a relative improvement of 62.04% over the official baseline system.
ISSN:2331-8422