Loading…

Robust and efficient keyword spotting using a bidirectional attention LSTM

Speech recognition and voice assistants have become integral in various advanced electronic devices and human-computer interfaces ranging from smartphones to self-driving cars. Personal assistants like Google Assistant and Amazon Alexa are state-of-the-art examples of personal voice assistants trigg...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of speech technology 2023-12, Vol.26 (4), p.919-931
Main Authors:	Swain, Om Prakash, Hemanth, H., Saran, Puneet, Kothandaraman, Mohanaprasad, Ravi, Logesh, Sailor, Hardik, Rajesh, K. S.
Format:	Article
Language:	English
Subjects:	Accuracy Artificial Intelligence Artificial neural networks Attention Autonomous cars Denoising Engineering Human-computer interface Imperative sentences Keywords Machine learning Mass media Model accuracy Noise reduction Signal,Image and Speech Processing Social Sciences Speech recognition Training Voice recognition
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c1854-a9a818858bd1247d70b50eb92e767cbaa3b4a91bea8e61ec5c2c359b5d5ec973
container_end_page	931
container_issue	4
container_start_page	919
container_title	International journal of speech technology
container_volume	26
creator	Swain, Om Prakash Hemanth, H. Saran, Puneet Kothandaraman, Mohanaprasad Ravi, Logesh Sailor, Hardik Rajesh, K. S.
description	Speech recognition and voice assistants have become integral in various advanced electronic devices and human-computer interfaces ranging from smartphones to self-driving cars. Personal assistants like Google Assistant and Amazon Alexa are state-of-the-art examples of personal voice assistants triggered by simple wake words. Much research has been conducted on improving wake word detection and keyword spotting techniques. Along those lines, this research aimed to build an enhanced keyword-spotting model with high accuracy and reduced training time. The proposed model involves a Convolutional Neural Network and a bidirectional attention LSTM network. Another critical area of focus in this paper is audio pre-processing, where the Spectral gating technique has been introduced for denoising the data. The aforementioned CRNN model exploiting the abilities of CNN and LSTM was trained on this denoised data. The model has trained on 35 keywords from the Google Speech Commands dataset and can identify each. The model training was performed using Google Colab. The model proposed in this paper achieved a training accuracy of 93.44% and a test accuracy of 92.94%.
doi_str_mv	10.1007/s10772-023-10067-4
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2913468157</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2913468157</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1854-a9a818858bd1247d70b50eb92e767cbaa3b4a91bea8e61ec5c2c359b5d5ec973</originalsourceid><addsrcrecordid>eNp9UE1LAzEQDaJgrf4BTwHPq5lsskmOUvykImjvIclmy9a6qUkW8d-buoI3LzNvhvceMw-hcyCXQIi4SkCEoBWhdVXmRlTsAM2Al5UEIIcF1xIqyqA5RicpbQghSig6Q48vwY4pYzO02Hdd73o_ZPzmvz5DbHHahZz7YY3HtK8G277to3e5D4PZYpNzYReMl6-rp1N01Jlt8me_fY5WtzerxX21fL57WFwvKweSs8ooI0FKLm0LlIlWEMuJt4p60QhnjaktMwqsN9I34B131NVcWd5y75So5-hist3F8DH6lPUmjLGckzRVULNGlr8Li04sF0NK0Xd6F_t3E780EL2PTE-R6RKZ_olMsyKqJ1Eq5GHt45_1P6pv4MFu7g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2913468157</pqid></control><display><type>article</type><title>Robust and efficient keyword spotting using a bidirectional attention LSTM</title><source>Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List</source><source>Linguistics and Language Behavior Abstracts (LLBA)</source><creator>Swain, Om Prakash ; Hemanth, H. ; Saran, Puneet ; Kothandaraman, Mohanaprasad ; Ravi, Logesh ; Sailor, Hardik ; Rajesh, K. S.</creator><creatorcontrib>Swain, Om Prakash ; Hemanth, H. ; Saran, Puneet ; Kothandaraman, Mohanaprasad ; Ravi, Logesh ; Sailor, Hardik ; Rajesh, K. S.</creatorcontrib><description>Speech recognition and voice assistants have become integral in various advanced electronic devices and human-computer interfaces ranging from smartphones to self-driving cars. Personal assistants like Google Assistant and Amazon Alexa are state-of-the-art examples of personal voice assistants triggered by simple wake words. Much research has been conducted on improving wake word detection and keyword spotting techniques. Along those lines, this research aimed to build an enhanced keyword-spotting model with high accuracy and reduced training time. The proposed model involves a Convolutional Neural Network and a bidirectional attention LSTM network. Another critical area of focus in this paper is audio pre-processing, where the Spectral gating technique has been introduced for denoising the data. The aforementioned CRNN model exploiting the abilities of CNN and LSTM was trained on this denoised data. The model has trained on 35 keywords from the Google Speech Commands dataset and can identify each. The model training was performed using Google Colab. The model proposed in this paper achieved a training accuracy of 93.44% and a test accuracy of 92.94%.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-023-10067-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Artificial Intelligence ; Artificial neural networks ; Attention ; Autonomous cars ; Denoising ; Engineering ; Human-computer interface ; Imperative sentences ; Keywords ; Machine learning ; Mass media ; Model accuracy ; Noise reduction ; Signal,Image and Speech Processing ; Social Sciences ; Speech recognition ; Training ; Voice recognition</subject><ispartof>International journal of speech technology, 2023-12, Vol.26 (4), p.919-931</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1854-a9a818858bd1247d70b50eb92e767cbaa3b4a91bea8e61ec5c2c359b5d5ec973</cites><orcidid>0000-0003-3938-7495</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923,31267</link.rule.ids></links><search><creatorcontrib>Swain, Om Prakash</creatorcontrib><creatorcontrib>Hemanth, H.</creatorcontrib><creatorcontrib>Saran, Puneet</creatorcontrib><creatorcontrib>Kothandaraman, Mohanaprasad</creatorcontrib><creatorcontrib>Ravi, Logesh</creatorcontrib><creatorcontrib>Sailor, Hardik</creatorcontrib><creatorcontrib>Rajesh, K. S.</creatorcontrib><title>Robust and efficient keyword spotting using a bidirectional attention LSTM</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>Speech recognition and voice assistants have become integral in various advanced electronic devices and human-computer interfaces ranging from smartphones to self-driving cars. Personal assistants like Google Assistant and Amazon Alexa are state-of-the-art examples of personal voice assistants triggered by simple wake words. Much research has been conducted on improving wake word detection and keyword spotting techniques. Along those lines, this research aimed to build an enhanced keyword-spotting model with high accuracy and reduced training time. The proposed model involves a Convolutional Neural Network and a bidirectional attention LSTM network. Another critical area of focus in this paper is audio pre-processing, where the Spectral gating technique has been introduced for denoising the data. The aforementioned CRNN model exploiting the abilities of CNN and LSTM was trained on this denoised data. The model has trained on 35 keywords from the Google Speech Commands dataset and can identify each. The model training was performed using Google Colab. The model proposed in this paper achieved a training accuracy of 93.44% and a test accuracy of 92.94%.</description><subject>Accuracy</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Attention</subject><subject>Autonomous cars</subject><subject>Denoising</subject><subject>Engineering</subject><subject>Human-computer interface</subject><subject>Imperative sentences</subject><subject>Keywords</subject><subject>Machine learning</subject><subject>Mass media</subject><subject>Model accuracy</subject><subject>Noise reduction</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Speech recognition</subject><subject>Training</subject><subject>Voice recognition</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>7T9</sourceid><recordid>eNp9UE1LAzEQDaJgrf4BTwHPq5lsskmOUvykImjvIclmy9a6qUkW8d-buoI3LzNvhvceMw-hcyCXQIi4SkCEoBWhdVXmRlTsAM2Al5UEIIcF1xIqyqA5RicpbQghSig6Q48vwY4pYzO02Hdd73o_ZPzmvz5DbHHahZz7YY3HtK8G277to3e5D4PZYpNzYReMl6-rp1N01Jlt8me_fY5WtzerxX21fL57WFwvKweSs8ooI0FKLm0LlIlWEMuJt4p60QhnjaktMwqsN9I34B131NVcWd5y75So5-hist3F8DH6lPUmjLGckzRVULNGlr8Li04sF0NK0Xd6F_t3E780EL2PTE-R6RKZ_olMsyKqJ1Eq5GHt45_1P6pv4MFu7g</recordid><startdate>20231201</startdate><enddate>20231201</enddate><creator>Swain, Om Prakash</creator><creator>Hemanth, H.</creator><creator>Saran, Puneet</creator><creator>Kothandaraman, Mohanaprasad</creator><creator>Ravi, Logesh</creator><creator>Sailor, Hardik</creator><creator>Rajesh, K. S.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><orcidid>https://orcid.org/0000-0003-3938-7495</orcidid></search><sort><creationdate>20231201</creationdate><title>Robust and efficient keyword spotting using a bidirectional attention LSTM</title><author>Swain, Om Prakash ; Hemanth, H. ; Saran, Puneet ; Kothandaraman, Mohanaprasad ; Ravi, Logesh ; Sailor, Hardik ; Rajesh, K. S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1854-a9a818858bd1247d70b50eb92e767cbaa3b4a91bea8e61ec5c2c359b5d5ec973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Attention</topic><topic>Autonomous cars</topic><topic>Denoising</topic><topic>Engineering</topic><topic>Human-computer interface</topic><topic>Imperative sentences</topic><topic>Keywords</topic><topic>Machine learning</topic><topic>Mass media</topic><topic>Model accuracy</topic><topic>Noise reduction</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Speech recognition</topic><topic>Training</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Swain, Om Prakash</creatorcontrib><creatorcontrib>Hemanth, H.</creatorcontrib><creatorcontrib>Saran, Puneet</creatorcontrib><creatorcontrib>Kothandaraman, Mohanaprasad</creatorcontrib><creatorcontrib>Ravi, Logesh</creatorcontrib><creatorcontrib>Sailor, Hardik</creatorcontrib><creatorcontrib>Rajesh, K. S.</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Swain, Om Prakash</au><au>Hemanth, H.</au><au>Saran, Puneet</au><au>Kothandaraman, Mohanaprasad</au><au>Ravi, Logesh</au><au>Sailor, Hardik</au><au>Rajesh, K. S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust and efficient keyword spotting using a bidirectional attention LSTM</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2023-12-01</date><risdate>2023</risdate><volume>26</volume><issue>4</issue><spage>919</spage><epage>931</epage><pages>919-931</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>Speech recognition and voice assistants have become integral in various advanced electronic devices and human-computer interfaces ranging from smartphones to self-driving cars. Personal assistants like Google Assistant and Amazon Alexa are state-of-the-art examples of personal voice assistants triggered by simple wake words. Much research has been conducted on improving wake word detection and keyword spotting techniques. Along those lines, this research aimed to build an enhanced keyword-spotting model with high accuracy and reduced training time. The proposed model involves a Convolutional Neural Network and a bidirectional attention LSTM network. Another critical area of focus in this paper is audio pre-processing, where the Spectral gating technique has been introduced for denoising the data. The aforementioned CRNN model exploiting the abilities of CNN and LSTM was trained on this denoised data. The model has trained on 35 keywords from the Google Speech Commands dataset and can identify each. The model training was performed using Google Colab. The model proposed in this paper achieved a training accuracy of 93.44% and a test accuracy of 92.94%.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-023-10067-4</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-3938-7495</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1381-2416
ispartof	International journal of speech technology, 2023-12, Vol.26 (4), p.919-931
issn	1381-2416 1572-8110
language	eng
recordid	cdi_proquest_journals_2913468157
source	Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List; Linguistics and Language Behavior Abstracts (LLBA)
subjects	Accuracy Artificial Intelligence Artificial neural networks Attention Autonomous cars Denoising Engineering Human-computer interface Imperative sentences Keywords Machine learning Mass media Model accuracy Noise reduction Signal,Image and Speech Processing Social Sciences Speech recognition Training Voice recognition
title	Robust and efficient keyword spotting using a bidirectional attention LSTM
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T09%3A57%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20and%20efficient%20keyword%20spotting%20using%20a%20bidirectional%20attention%20LSTM&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Swain,%20Om%20Prakash&rft.date=2023-12-01&rft.volume=26&rft.issue=4&rft.spage=919&rft.epage=931&rft.pages=919-931&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-023-10067-4&rft_dat=%3Cproquest_cross%3E2913468157%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c1854-a9a818858bd1247d70b50eb92e767cbaa3b4a91bea8e61ec5c2c359b5d5ec973%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2913468157&rft_id=info:pmid/&rfr_iscdi=true