Loading…

SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition

Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propos...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huang, Zhen, Ng, Tim, Liu, Leo, Mason, Henry, Zhuang, Xiaodan, Liu, Daben
Format:	Conference Proceeding
Language:	English
Subjects:	Acoustics batch normalization Neural networks Optimization Production ResNet scaled exponential linear units self-normalization shortcut connection Speech recognition Topology Training very deep CNNs
Citations:	Items that cite this one
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213
cites
container_end_page	6858
container_issue
container_start_page	6854
container_title
container_volume
creator	Huang, Zhen Ng, Tim Liu, Leo Mason, Henry Zhuang, Xiaodan Liu, Daben
description	Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.
doi_str_mv	10.1109/ICASSP40776.2020.9053973
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9053973</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9053973</ieee_id><sourcerecordid>9053973</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213</originalsourceid><addsrcrecordid>eNotkNtKw0AURUdBsNZ-gS_zA6lnLsnM-CatNyhRjAWfLCeTM-1ImoQk4OXrLdinDXvBZrEZ4wLmQoC7flrcFsWLBmOyuQQJcwepckadsJkzVqTgIMuUSE_ZRCrjEuHg_ZxdDMMnAFij7YR9FPlykec3vKA6JHnb77GOv7HZ8iVRxw9o4F9x3PHCY00Vv_vu2oaaMWLNV7Eh7Pm6iePAQ9vzoiPyO_5Kvt0eytg2l-wsYD3Q7JhTtr6_e1s8Jqvnh4P9KvEytWNCVWml1SGrRACTEmijwFSlDjJIjTpFD77KEAQqRG2Ec76UnkqBNgtSqCm7-t-NRLTp-rjH_mdzvEP9Aa5MVWU</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</title><source>IEEE Xplore All Conference Series</source><creator>Huang, Zhen ; Ng, Tim ; Liu, Leo ; Mason, Henry ; Zhuang, Xiaodan ; Liu, Daben</creator><creatorcontrib>Huang, Zhen ; Ng, Tim ; Liu, Leo ; Mason, Henry ; Zhuang, Xiaodan ; Liu, Daben</creatorcontrib><description>Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781509066315</identifier><identifier>EISBN: 1509066314</identifier><identifier>DOI: 10.1109/ICASSP40776.2020.9053973</identifier><language>eng</language><publisher>IEEE</publisher><subject>Acoustics ; batch normalization ; Neural networks ; Optimization ; Production ; ResNet ; scaled exponential linear units ; self-normalization ; shortcut connection ; Speech recognition ; Topology ; Training ; very deep CNNs</subject><ispartof>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, p.6854-6858</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9053973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27902,54530,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9053973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Huang, Zhen</creatorcontrib><creatorcontrib>Ng, Tim</creatorcontrib><creatorcontrib>Liu, Leo</creatorcontrib><creatorcontrib>Mason, Henry</creatorcontrib><creatorcontrib>Zhuang, Xiaodan</creatorcontrib><creatorcontrib>Liu, Daben</creatorcontrib><title>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</title><title>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.</description><subject>Acoustics</subject><subject>batch normalization</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Production</subject><subject>ResNet</subject><subject>scaled exponential linear units</subject><subject>self-normalization</subject><subject>shortcut connection</subject><subject>Speech recognition</subject><subject>Topology</subject><subject>Training</subject><subject>very deep CNNs</subject><issn>2379-190X</issn><isbn>9781509066315</isbn><isbn>1509066314</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2020</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkNtKw0AURUdBsNZ-gS_zA6lnLsnM-CatNyhRjAWfLCeTM-1ImoQk4OXrLdinDXvBZrEZ4wLmQoC7flrcFsWLBmOyuQQJcwepckadsJkzVqTgIMuUSE_ZRCrjEuHg_ZxdDMMnAFij7YR9FPlykec3vKA6JHnb77GOv7HZ8iVRxw9o4F9x3PHCY00Vv_vu2oaaMWLNV7Eh7Pm6iePAQ9vzoiPyO_5Kvt0eytg2l-wsYD3Q7JhTtr6_e1s8Jqvnh4P9KvEytWNCVWml1SGrRACTEmijwFSlDjJIjTpFD77KEAQqRG2Ec76UnkqBNgtSqCm7-t-NRLTp-rjH_mdzvEP9Aa5MVWU</recordid><startdate>202005</startdate><enddate>202005</enddate><creator>Huang, Zhen</creator><creator>Ng, Tim</creator><creator>Liu, Leo</creator><creator>Mason, Henry</creator><creator>Zhuang, Xiaodan</creator><creator>Liu, Daben</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>202005</creationdate><title>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</title><author>Huang, Zhen ; Ng, Tim ; Liu, Leo ; Mason, Henry ; Zhuang, Xiaodan ; Liu, Daben</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Acoustics</topic><topic>batch normalization</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Production</topic><topic>ResNet</topic><topic>scaled exponential linear units</topic><topic>self-normalization</topic><topic>shortcut connection</topic><topic>Speech recognition</topic><topic>Topology</topic><topic>Training</topic><topic>very deep CNNs</topic><toplevel>online_resources</toplevel><creatorcontrib>Huang, Zhen</creatorcontrib><creatorcontrib>Ng, Tim</creatorcontrib><creatorcontrib>Liu, Leo</creatorcontrib><creatorcontrib>Mason, Henry</creatorcontrib><creatorcontrib>Zhuang, Xiaodan</creatorcontrib><creatorcontrib>Liu, Daben</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Huang, Zhen</au><au>Ng, Tim</au><au>Liu, Leo</au><au>Mason, Henry</au><au>Zhuang, Xiaodan</au><au>Liu, Daben</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</atitle><btitle>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2020-05</date><risdate>2020</risdate><spage>6854</spage><epage>6858</epage><pages>6854-6858</pages><eissn>2379-190X</eissn><eisbn>9781509066315</eisbn><eisbn>1509066314</eisbn><abstract>Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP40776.2020.9053973</doi><tpages>5</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2379-190X
ispartof	ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, p.6854-6858
issn	2379-190X
language	eng
recordid	cdi_ieee_primary_9053973
source	IEEE Xplore All Conference Series
subjects	Acoustics batch normalization Neural networks Optimization Production ResNet scaled exponential linear units self-normalization shortcut connection Speech recognition Topology Training very deep CNNs
title	SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T07%3A06%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=SNDCNN:%20Self-Normalizing%20Deep%20CNNs%20with%20Scaled%20Exponential%20Linear%20Units%20for%20Speech%20Recognition&rft.btitle=ICASSP%202020%20-%202020%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Huang,%20Zhen&rft.date=2020-05&rft.spage=6854&rft.epage=6858&rft.pages=6854-6858&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP40776.2020.9053973&rft.eisbn=9781509066315&rft.eisbn_list=1509066314&rft_dat=%3Cieee_CHZPO%3E9053973%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9053973&rfr_iscdi=true