Loading…

14.8 KASP: A 96.8% 10-Keyword Accuracy and 1.68μJ/Classification Keyword Spotting and Speaker Verification Processor Using Adaptive Beamforming and Progressive Wake-Up

Keyword spotting (KWS) processors have been proposed and used in voice-control applications such as smart homes, intelligent robots and smart wearables, as shown in Fig. 14.8.1. Existing KWS processors have the following issues: 1) they are sensitive to human-voice noise (e.g., nearby individuals ta...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiao, Jianbiao, Zhang, Xuhui, Zhu, Shijian, Yang, Zhengwei, Du, Meng, Ji, Chunsheng, Long, Yu, Chen, Xiao, Miao, Xiaoyu, Zhou, Liang, Chang, Liang, Liu, Shanshan, Zhou, Jun
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Keyword spotting (KWS) processors have been proposed and used in voice-control applications such as smart homes, intelligent robots and smart wearables, as shown in Fig. 14.8.1. Existing KWS processors have the following issues: 1) they are sensitive to human-voice noise (e.g., nearby individuals talking, TV or radio), which affects their accuracy in real-life applications; 2) they do not sufficiently exploit domain-specific features for energy reduction and accuracy improvement; 3) they do not support multiuser speaker verification (SV) free of speaker-specific training. To address these issues, in this work, we have proposed a high accuracy and ultra-energy-efficient KWS & SV processor (named KASP) with the following features: 1) a dynamically reconfigurable KWS & SV processing architecture supporting KWS-driven adaptive direction of arrival (DoA) estimation and beamforming to improve the accuracy in the presence of human-voice noise; 2) an adaptive DoA frequency-channel selection technique and a lightweight frequency-domain beamforming (Lite-FDBF) technique to reduce the energy consumption and hardware overhead; 3) a four-stage progressive wake-up processing architecture with KWS-aware adaptive voice-activity detection (VAD) to reduce energy consumption and improve the accuracy under different SNR; 4) a lightweight X-Vector (Lite-X-Vector)-based SV for multi-user speaker verification with low energy consumption free of speaker training.
ISSN:2376-8606
DOI:10.1109/ISSCC49657.2024.10454492