Loading…
Auditory inspired Acoustic Model for Hybrid ASR System using Gammatone based Gabor Filters
Spectro-temporal features have recently shown much performance improvement for robust Automatic Speech Recognition (ASR) tasks. Gabor filters are best known to extract the spectro-temporal cues of speech. Spectro-temporal representation becomes an essential ingredient for two dimensional Gabor based...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 6 |
container_issue | |
container_start_page | 1 |
container_title | |
container_volume | |
creator | Dutta, Anirban Ashishkumar, G Rama Rao, Ch V |
description | Spectro-temporal features have recently shown much performance improvement for robust Automatic Speech Recognition (ASR) tasks. Gabor filters are best known to extract the spectro-temporal cues of speech. Spectro-temporal representation becomes an essential ingredient for two dimensional Gabor based feature extraction methods. State of the art spectro-temporal features is mostly based on Mel spectrogram. However, the time-frequency representation based on the Mel scale is not accurate enough to model the human auditory system. This paper concentrates on obtaining the spectro-temporal representation by incorporating a physiologically and psychoacoustically motivated gammatone filter called gammatonegram. From literature, gammatonegram is found to better approximate the auditory perception of speech. The spectro-temporal features obtained using gammatonegram based Gabor filters are fed to a hybrid Deep Neural Network (DNN)-Hidden Markov Model (HMM) framework to develop the acoustic model of an ASR system. Experimental analysis is carried out with NOISEX-92 database implemented on TIMIT. The experimental results show the better performance gain obtained with the proposed features compared with conventional feature extraction methods. |
doi_str_mv | 10.1109/ISSPIT47144.2019.9001859 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9001859</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9001859</ieee_id><sourcerecordid>9001859</sourcerecordid><originalsourceid>FETCH-LOGICAL-i203t-ea16a4d3a43b1817d374dfb13c038b48d0fd8cddf15f3754158683a74e4628f83</originalsourceid><addsrcrecordid>eNotkL1OwzAUhQ0SEqX0CVj8Aim-udexM1YV_ZGKQKQsLJUT28goaSo7HfL2VGqnM5zvfMNhjIOYA4jydVtVn9s9KSCa5wLKeSkEaFnesSdQuQaJBOqeTfKCIJOS8kc2S-lPCIGgdYE0YT-Lsw1DH0cejukUorN80fTnNISGv_fWtdz3kW_GOoZLU33xakyD6_g5heMvX5uuM0N_dLw26TJdm_pCr0I7uJie2YM3bXKzW07Z9-ptv9xku4_1drnYZSEXOGTOQGHIoiGsQYOyqMj6GrARqGvSVnirG2s9SI9KEkhdaDSKHBW59hqn7OXqDc65wymGzsTxcLsC_wHHL1QB</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Auditory inspired Acoustic Model for Hybrid ASR System using Gammatone based Gabor Filters</title><source>IEEE Xplore All Conference Series</source><creator>Dutta, Anirban ; Ashishkumar, G ; Rama Rao, Ch V</creator><creatorcontrib>Dutta, Anirban ; Ashishkumar, G ; Rama Rao, Ch V</creatorcontrib><description>Spectro-temporal features have recently shown much performance improvement for robust Automatic Speech Recognition (ASR) tasks. Gabor filters are best known to extract the spectro-temporal cues of speech. Spectro-temporal representation becomes an essential ingredient for two dimensional Gabor based feature extraction methods. State of the art spectro-temporal features is mostly based on Mel spectrogram. However, the time-frequency representation based on the Mel scale is not accurate enough to model the human auditory system. This paper concentrates on obtaining the spectro-temporal representation by incorporating a physiologically and psychoacoustically motivated gammatone filter called gammatonegram. From literature, gammatonegram is found to better approximate the auditory perception of speech. The spectro-temporal features obtained using gammatonegram based Gabor filters are fed to a hybrid Deep Neural Network (DNN)-Hidden Markov Model (HMM) framework to develop the acoustic model of an ASR system. Experimental analysis is carried out with NOISEX-92 database implemented on TIMIT. The experimental results show the better performance gain obtained with the proposed features compared with conventional feature extraction methods.</description><identifier>EISSN: 2641-5542</identifier><identifier>EISBN: 1728153417</identifier><identifier>EISBN: 9781728153414</identifier><identifier>DOI: 10.1109/ISSPIT47144.2019.9001859</identifier><language>eng</language><publisher>IEEE</publisher><subject>deep neural network ; Feature extraction ; Frequency modulation ; gabor filters ; gammatone filters ; Hidden Markov models ; Mel frequency cepstral coefficient ; spectro-temporal features ; Time-frequency analysis ; Training</subject><ispartof>2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2019, p.1-6</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9001859$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9001859$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dutta, Anirban</creatorcontrib><creatorcontrib>Ashishkumar, G</creatorcontrib><creatorcontrib>Rama Rao, Ch V</creatorcontrib><title>Auditory inspired Acoustic Model for Hybrid ASR System using Gammatone based Gabor Filters</title><title>2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)</title><addtitle>ISSPIT</addtitle><description>Spectro-temporal features have recently shown much performance improvement for robust Automatic Speech Recognition (ASR) tasks. Gabor filters are best known to extract the spectro-temporal cues of speech. Spectro-temporal representation becomes an essential ingredient for two dimensional Gabor based feature extraction methods. State of the art spectro-temporal features is mostly based on Mel spectrogram. However, the time-frequency representation based on the Mel scale is not accurate enough to model the human auditory system. This paper concentrates on obtaining the spectro-temporal representation by incorporating a physiologically and psychoacoustically motivated gammatone filter called gammatonegram. From literature, gammatonegram is found to better approximate the auditory perception of speech. The spectro-temporal features obtained using gammatonegram based Gabor filters are fed to a hybrid Deep Neural Network (DNN)-Hidden Markov Model (HMM) framework to develop the acoustic model of an ASR system. Experimental analysis is carried out with NOISEX-92 database implemented on TIMIT. The experimental results show the better performance gain obtained with the proposed features compared with conventional feature extraction methods.</description><subject>deep neural network</subject><subject>Feature extraction</subject><subject>Frequency modulation</subject><subject>gabor filters</subject><subject>gammatone filters</subject><subject>Hidden Markov models</subject><subject>Mel frequency cepstral coefficient</subject><subject>spectro-temporal features</subject><subject>Time-frequency analysis</subject><subject>Training</subject><issn>2641-5542</issn><isbn>1728153417</isbn><isbn>9781728153414</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkL1OwzAUhQ0SEqX0CVj8Aim-udexM1YV_ZGKQKQsLJUT28goaSo7HfL2VGqnM5zvfMNhjIOYA4jydVtVn9s9KSCa5wLKeSkEaFnesSdQuQaJBOqeTfKCIJOS8kc2S-lPCIGgdYE0YT-Lsw1DH0cejukUorN80fTnNISGv_fWtdz3kW_GOoZLU33xakyD6_g5heMvX5uuM0N_dLw26TJdm_pCr0I7uJie2YM3bXKzW07Z9-ptv9xku4_1drnYZSEXOGTOQGHIoiGsQYOyqMj6GrARqGvSVnirG2s9SI9KEkhdaDSKHBW59hqn7OXqDc65wymGzsTxcLsC_wHHL1QB</recordid><startdate>201912</startdate><enddate>201912</enddate><creator>Dutta, Anirban</creator><creator>Ashishkumar, G</creator><creator>Rama Rao, Ch V</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201912</creationdate><title>Auditory inspired Acoustic Model for Hybrid ASR System using Gammatone based Gabor Filters</title><author>Dutta, Anirban ; Ashishkumar, G ; Rama Rao, Ch V</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i203t-ea16a4d3a43b1817d374dfb13c038b48d0fd8cddf15f3754158683a74e4628f83</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><topic>deep neural network</topic><topic>Feature extraction</topic><topic>Frequency modulation</topic><topic>gabor filters</topic><topic>gammatone filters</topic><topic>Hidden Markov models</topic><topic>Mel frequency cepstral coefficient</topic><topic>spectro-temporal features</topic><topic>Time-frequency analysis</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Dutta, Anirban</creatorcontrib><creatorcontrib>Ashishkumar, G</creatorcontrib><creatorcontrib>Rama Rao, Ch V</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dutta, Anirban</au><au>Ashishkumar, G</au><au>Rama Rao, Ch V</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Auditory inspired Acoustic Model for Hybrid ASR System using Gammatone based Gabor Filters</atitle><btitle>2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)</btitle><stitle>ISSPIT</stitle><date>2019-12</date><risdate>2019</risdate><spage>1</spage><epage>6</epage><pages>1-6</pages><eissn>2641-5542</eissn><eisbn>1728153417</eisbn><eisbn>9781728153414</eisbn><abstract>Spectro-temporal features have recently shown much performance improvement for robust Automatic Speech Recognition (ASR) tasks. Gabor filters are best known to extract the spectro-temporal cues of speech. Spectro-temporal representation becomes an essential ingredient for two dimensional Gabor based feature extraction methods. State of the art spectro-temporal features is mostly based on Mel spectrogram. However, the time-frequency representation based on the Mel scale is not accurate enough to model the human auditory system. This paper concentrates on obtaining the spectro-temporal representation by incorporating a physiologically and psychoacoustically motivated gammatone filter called gammatonegram. From literature, gammatonegram is found to better approximate the auditory perception of speech. The spectro-temporal features obtained using gammatonegram based Gabor filters are fed to a hybrid Deep Neural Network (DNN)-Hidden Markov Model (HMM) framework to develop the acoustic model of an ASR system. Experimental analysis is carried out with NOISEX-92 database implemented on TIMIT. The experimental results show the better performance gain obtained with the proposed features compared with conventional feature extraction methods.</abstract><pub>IEEE</pub><doi>10.1109/ISSPIT47144.2019.9001859</doi><tpages>6</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2641-5542 |
ispartof | 2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2019, p.1-6 |
issn | 2641-5542 |
language | eng |
recordid | cdi_ieee_primary_9001859 |
source | IEEE Xplore All Conference Series |
subjects | deep neural network Feature extraction Frequency modulation gabor filters gammatone filters Hidden Markov models Mel frequency cepstral coefficient spectro-temporal features Time-frequency analysis Training |
title | Auditory inspired Acoustic Model for Hybrid ASR System using Gammatone based Gabor Filters |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T07%3A16%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Auditory%20inspired%20Acoustic%20Model%20for%20Hybrid%20ASR%20System%20using%20Gammatone%20based%20Gabor%20Filters&rft.btitle=2019%20IEEE%20International%20Symposium%20on%20Signal%20Processing%20and%20Information%20Technology%20(ISSPIT)&rft.au=Dutta,%20Anirban&rft.date=2019-12&rft.spage=1&rft.epage=6&rft.pages=1-6&rft.eissn=2641-5542&rft_id=info:doi/10.1109/ISSPIT47144.2019.9001859&rft.eisbn=1728153417&rft.eisbn_list=9781728153414&rft_dat=%3Cieee_CHZPO%3E9001859%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i203t-ea16a4d3a43b1817d374dfb13c038b48d0fd8cddf15f3754158683a74e4628f83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9001859&rfr_iscdi=true |