Loading…

SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition

Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propos...

Full description

Saved in:
Bibliographic Details
Main Authors: Huang, Zhen, Ng, Tim, Liu, Leo, Mason, Henry, Zhuang, Xiaodan, Liu, Daben
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213
cites
container_end_page 6858
container_issue
container_start_page 6854
container_title
container_volume
creator Huang, Zhen
Ng, Tim
Liu, Leo
Mason, Henry
Zhuang, Xiaodan
Liu, Daben
description Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.
doi_str_mv 10.1109/ICASSP40776.2020.9053973
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9053973</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9053973</ieee_id><sourcerecordid>9053973</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213</originalsourceid><addsrcrecordid>eNotkNtKw0AURUdBsNZ-gS_zA6lnLsnM-CatNyhRjAWfLCeTM-1ImoQk4OXrLdinDXvBZrEZ4wLmQoC7flrcFsWLBmOyuQQJcwepckadsJkzVqTgIMuUSE_ZRCrjEuHg_ZxdDMMnAFij7YR9FPlykec3vKA6JHnb77GOv7HZ8iVRxw9o4F9x3PHCY00Vv_vu2oaaMWLNV7Eh7Pm6iePAQ9vzoiPyO_5Kvt0eytg2l-wsYD3Q7JhTtr6_e1s8Jqvnh4P9KvEytWNCVWml1SGrRACTEmijwFSlDjJIjTpFD77KEAQqRG2Ec76UnkqBNgtSqCm7-t-NRLTp-rjH_mdzvEP9Aa5MVWU</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</title><source>IEEE Xplore All Conference Series</source><creator>Huang, Zhen ; Ng, Tim ; Liu, Leo ; Mason, Henry ; Zhuang, Xiaodan ; Liu, Daben</creator><creatorcontrib>Huang, Zhen ; Ng, Tim ; Liu, Leo ; Mason, Henry ; Zhuang, Xiaodan ; Liu, Daben</creatorcontrib><description>Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781509066315</identifier><identifier>EISBN: 1509066314</identifier><identifier>DOI: 10.1109/ICASSP40776.2020.9053973</identifier><language>eng</language><publisher>IEEE</publisher><subject>Acoustics ; batch normalization ; Neural networks ; Optimization ; Production ; ResNet ; scaled exponential linear units ; self-normalization ; shortcut connection ; Speech recognition ; Topology ; Training ; very deep CNNs</subject><ispartof>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, p.6854-6858</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9053973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27902,54530,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9053973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Huang, Zhen</creatorcontrib><creatorcontrib>Ng, Tim</creatorcontrib><creatorcontrib>Liu, Leo</creatorcontrib><creatorcontrib>Mason, Henry</creatorcontrib><creatorcontrib>Zhuang, Xiaodan</creatorcontrib><creatorcontrib>Liu, Daben</creatorcontrib><title>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</title><title>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.</description><subject>Acoustics</subject><subject>batch normalization</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Production</subject><subject>ResNet</subject><subject>scaled exponential linear units</subject><subject>self-normalization</subject><subject>shortcut connection</subject><subject>Speech recognition</subject><subject>Topology</subject><subject>Training</subject><subject>very deep CNNs</subject><issn>2379-190X</issn><isbn>9781509066315</isbn><isbn>1509066314</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2020</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkNtKw0AURUdBsNZ-gS_zA6lnLsnM-CatNyhRjAWfLCeTM-1ImoQk4OXrLdinDXvBZrEZ4wLmQoC7flrcFsWLBmOyuQQJcwepckadsJkzVqTgIMuUSE_ZRCrjEuHg_ZxdDMMnAFij7YR9FPlykec3vKA6JHnb77GOv7HZ8iVRxw9o4F9x3PHCY00Vv_vu2oaaMWLNV7Eh7Pm6iePAQ9vzoiPyO_5Kvt0eytg2l-wsYD3Q7JhTtr6_e1s8Jqvnh4P9KvEytWNCVWml1SGrRACTEmijwFSlDjJIjTpFD77KEAQqRG2Ec76UnkqBNgtSqCm7-t-NRLTp-rjH_mdzvEP9Aa5MVWU</recordid><startdate>202005</startdate><enddate>202005</enddate><creator>Huang, Zhen</creator><creator>Ng, Tim</creator><creator>Liu, Leo</creator><creator>Mason, Henry</creator><creator>Zhuang, Xiaodan</creator><creator>Liu, Daben</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>202005</creationdate><title>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</title><author>Huang, Zhen ; Ng, Tim ; Liu, Leo ; Mason, Henry ; Zhuang, Xiaodan ; Liu, Daben</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Acoustics</topic><topic>batch normalization</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Production</topic><topic>ResNet</topic><topic>scaled exponential linear units</topic><topic>self-normalization</topic><topic>shortcut connection</topic><topic>Speech recognition</topic><topic>Topology</topic><topic>Training</topic><topic>very deep CNNs</topic><toplevel>online_resources</toplevel><creatorcontrib>Huang, Zhen</creatorcontrib><creatorcontrib>Ng, Tim</creatorcontrib><creatorcontrib>Liu, Leo</creatorcontrib><creatorcontrib>Mason, Henry</creatorcontrib><creatorcontrib>Zhuang, Xiaodan</creatorcontrib><creatorcontrib>Liu, Daben</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Huang, Zhen</au><au>Ng, Tim</au><au>Liu, Leo</au><au>Mason, Henry</au><au>Zhuang, Xiaodan</au><au>Liu, Daben</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</atitle><btitle>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2020-05</date><risdate>2020</risdate><spage>6854</spage><epage>6858</epage><pages>6854-6858</pages><eissn>2379-190X</eissn><eisbn>9781509066315</eisbn><eisbn>1509066314</eisbn><abstract>Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP40776.2020.9053973</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2379-190X
ispartof ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, p.6854-6858
issn 2379-190X
language eng
recordid cdi_ieee_primary_9053973
source IEEE Xplore All Conference Series
subjects Acoustics
batch normalization
Neural networks
Optimization
Production
ResNet
scaled exponential linear units
self-normalization
shortcut connection
Speech recognition
Topology
Training
very deep CNNs
title SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T07%3A06%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=SNDCNN:%20Self-Normalizing%20Deep%20CNNs%20with%20Scaled%20Exponential%20Linear%20Units%20for%20Speech%20Recognition&rft.btitle=ICASSP%202020%20-%202020%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Huang,%20Zhen&rft.date=2020-05&rft.spage=6854&rft.epage=6858&rft.pages=6854-6858&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP40776.2020.9053973&rft.eisbn=9781509066315&rft.eisbn_list=1509066314&rft_dat=%3Cieee_CHZPO%3E9053973%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9053973&rfr_iscdi=true