Loading…

Two-pass Endpoint Detection for Speech Recognition

Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands. The endpoint detector has to trade-off between accuracy and latency, since waiting longer reduces the cases of users being cut-off early. We propose a novel two-pass soluti...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-01
Main Authors: Raju, Anirudh, Khare, Aparna, He, Di, Sklyar, Ilya, Long, Chen, Alptekin, Sam, Trinh, Viet Anh, Zhang, Zhe, Vaz, Colin, Ravichandran, Venkatesh, Maas, Roland, Rastrow, Ariya
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands. The endpoint detector has to trade-off between accuracy and latency, since waiting longer reduces the cases of users being cut-off early. We propose a novel two-pass solution for endpointing, where the utterance endpoint detected from a first pass endpointer is verified by a 2nd-pass model termed EP Arbitrator. Our method improves the trade-off between early cut-offs and latency over a baseline endpointer, as tested on datasets including voice-assistant transactional queries, conversational speech, and the public SLURP corpus. We demonstrate that our method shows improvements regardless of the first-pass EP model used.
ISSN:2331-8422
DOI:10.48550/arxiv.2401.08916