Loading…

Slot-Triggered Contextual Biasing For Personalized Speech Recognition Using Neural Transducers

End-to-end (E2E) automatic speech recognition (ASR) models have been found to perform well on general transcription tasks but often fail to correctly recognize words that occur infrequently in the training data. Personalization is important for a variety of tasks, including virtual assistants where...

Full description

Saved in:
Bibliographic Details
Main Authors: Tong, Sibo, Harding, Philip, Wiesler, Simon
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:End-to-end (E2E) automatic speech recognition (ASR) models have been found to perform well on general transcription tasks but often fail to correctly recognize words that occur infrequently in the training data. Personalization is important for a variety of tasks, including virtual assistants where recall of infrequently observed words such as contact names, song titles and place names is critical. In these cases contextual information is often available which can be used to bias the E2E ASR model. Contextual biasing (CB) has been shown to be effective for this task, however most existing work focuses on biasing for a single domain and so in this work we focus on the application of biasing to multiple domains. We propose a method whereby the E2E ASR model is trained to emit opening and closing tags around slot content which are used to both selectively enable biasing and decide which catalog to use for biasing. Our method is shown to not only efficiently scale to multiple slots, but also further improves accuracy on slot content.
ISSN:2379-190X
DOI:10.1109/ICASSP49357.2023.10096677