Loading…
SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition
Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propos...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213 |
---|---|
cites | |
container_end_page | 6858 |
container_issue | |
container_start_page | 6854 |
container_title | |
container_volume | |
creator | Huang, Zhen Ng, Tim Liu, Leo Mason, Henry Zhuang, Xiaodan Liu, Daben |
description | Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use. |
doi_str_mv | 10.1109/ICASSP40776.2020.9053973 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9053973</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9053973</ieee_id><sourcerecordid>9053973</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213</originalsourceid><addsrcrecordid>eNotkNtKw0AURUdBsNZ-gS_zA6lnLsnM-CatNyhRjAWfLCeTM-1ImoQk4OXrLdinDXvBZrEZ4wLmQoC7flrcFsWLBmOyuQQJcwepckadsJkzVqTgIMuUSE_ZRCrjEuHg_ZxdDMMnAFij7YR9FPlykec3vKA6JHnb77GOv7HZ8iVRxw9o4F9x3PHCY00Vv_vu2oaaMWLNV7Eh7Pm6iePAQ9vzoiPyO_5Kvt0eytg2l-wsYD3Q7JhTtr6_e1s8Jqvnh4P9KvEytWNCVWml1SGrRACTEmijwFSlDjJIjTpFD77KEAQqRG2Ec76UnkqBNgtSqCm7-t-NRLTp-rjH_mdzvEP9Aa5MVWU</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</title><source>IEEE Xplore All Conference Series</source><creator>Huang, Zhen ; Ng, Tim ; Liu, Leo ; Mason, Henry ; Zhuang, Xiaodan ; Liu, Daben</creator><creatorcontrib>Huang, Zhen ; Ng, Tim ; Liu, Leo ; Mason, Henry ; Zhuang, Xiaodan ; Liu, Daben</creatorcontrib><description>Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781509066315</identifier><identifier>EISBN: 1509066314</identifier><identifier>DOI: 10.1109/ICASSP40776.2020.9053973</identifier><language>eng</language><publisher>IEEE</publisher><subject>Acoustics ; batch normalization ; Neural networks ; Optimization ; Production ; ResNet ; scaled exponential linear units ; self-normalization ; shortcut connection ; Speech recognition ; Topology ; Training ; very deep CNNs</subject><ispartof>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, p.6854-6858</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9053973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27902,54530,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9053973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Huang, Zhen</creatorcontrib><creatorcontrib>Ng, Tim</creatorcontrib><creatorcontrib>Liu, Leo</creatorcontrib><creatorcontrib>Mason, Henry</creatorcontrib><creatorcontrib>Zhuang, Xiaodan</creatorcontrib><creatorcontrib>Liu, Daben</creatorcontrib><title>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</title><title>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.</description><subject>Acoustics</subject><subject>batch normalization</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Production</subject><subject>ResNet</subject><subject>scaled exponential linear units</subject><subject>self-normalization</subject><subject>shortcut connection</subject><subject>Speech recognition</subject><subject>Topology</subject><subject>Training</subject><subject>very deep CNNs</subject><issn>2379-190X</issn><isbn>9781509066315</isbn><isbn>1509066314</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2020</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkNtKw0AURUdBsNZ-gS_zA6lnLsnM-CatNyhRjAWfLCeTM-1ImoQk4OXrLdinDXvBZrEZ4wLmQoC7flrcFsWLBmOyuQQJcwepckadsJkzVqTgIMuUSE_ZRCrjEuHg_ZxdDMMnAFij7YR9FPlykec3vKA6JHnb77GOv7HZ8iVRxw9o4F9x3PHCY00Vv_vu2oaaMWLNV7Eh7Pm6iePAQ9vzoiPyO_5Kvt0eytg2l-wsYD3Q7JhTtr6_e1s8Jqvnh4P9KvEytWNCVWml1SGrRACTEmijwFSlDjJIjTpFD77KEAQqRG2Ec76UnkqBNgtSqCm7-t-NRLTp-rjH_mdzvEP9Aa5MVWU</recordid><startdate>202005</startdate><enddate>202005</enddate><creator>Huang, Zhen</creator><creator>Ng, Tim</creator><creator>Liu, Leo</creator><creator>Mason, Henry</creator><creator>Zhuang, Xiaodan</creator><creator>Liu, Daben</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>202005</creationdate><title>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</title><author>Huang, Zhen ; Ng, Tim ; Liu, Leo ; Mason, Henry ; Zhuang, Xiaodan ; Liu, Daben</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Acoustics</topic><topic>batch normalization</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Production</topic><topic>ResNet</topic><topic>scaled exponential linear units</topic><topic>self-normalization</topic><topic>shortcut connection</topic><topic>Speech recognition</topic><topic>Topology</topic><topic>Training</topic><topic>very deep CNNs</topic><toplevel>online_resources</toplevel><creatorcontrib>Huang, Zhen</creatorcontrib><creatorcontrib>Ng, Tim</creatorcontrib><creatorcontrib>Liu, Leo</creatorcontrib><creatorcontrib>Mason, Henry</creatorcontrib><creatorcontrib>Zhuang, Xiaodan</creatorcontrib><creatorcontrib>Liu, Daben</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Huang, Zhen</au><au>Ng, Tim</au><au>Liu, Leo</au><au>Mason, Henry</au><au>Zhuang, Xiaodan</au><au>Liu, Daben</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition</atitle><btitle>ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2020-05</date><risdate>2020</risdate><spage>6854</spage><epage>6858</epage><pages>6854-6858</pages><eissn>2379-190X</eissn><eisbn>9781509066315</eisbn><eisbn>1509066314</eisbn><abstract>Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspired by Self-Normalizing Neural Networks, we propose the self-normalizing deep CNN (SNDCNN) based acoustic model topology, by removing the SC/BN and replacing the typical RELU activations with scaled exponential linear unit (SELU) in ResNet-50. SELU activations make the network self-normalizing and remove the need for both shortcut connections and batch normalization. Compared to ResNet50, we can achieve the same or lower (up to 4.5% relative) word error rate (WER) while boosting both training and inference speed by 60%-80%. We also explore other model inference optimization schemes to further reduce latency for production use.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP40776.2020.9053973</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2379-190X |
ispartof | ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, p.6854-6858 |
issn | 2379-190X |
language | eng |
recordid | cdi_ieee_primary_9053973 |
source | IEEE Xplore All Conference Series |
subjects | Acoustics batch normalization Neural networks Optimization Production ResNet scaled exponential linear units self-normalization shortcut connection Speech recognition Topology Training very deep CNNs |
title | SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T07%3A06%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=SNDCNN:%20Self-Normalizing%20Deep%20CNNs%20with%20Scaled%20Exponential%20Linear%20Units%20for%20Speech%20Recognition&rft.btitle=ICASSP%202020%20-%202020%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Huang,%20Zhen&rft.date=2020-05&rft.spage=6854&rft.epage=6858&rft.pages=6854-6858&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP40776.2020.9053973&rft.eisbn=9781509066315&rft.eisbn_list=1509066314&rft_dat=%3Cieee_CHZPO%3E9053973%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c258t-edb8284f6d1f075e047307db4f2f24a45ac0cd6a01a3aa47199cb2ceb1a86f213%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9053973&rfr_iscdi=true |