Loading…

Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inef...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2022-02
Main Authors: Chieng, Hock Hung, Noorhaniza Wahid, Ong, Pauline
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Chieng, Hock Hung
Noorhaniza Wahid
Ong, Pauline
description Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in a performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN- 5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.
doi_str_mv 10.48550/arxiv.2011.03155
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2458831729</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2458831729</sourcerecordid><originalsourceid>FETCH-LOGICAL-a529-9e3075e86461a65f5973bcd1f6cfa09cd06ad98651be50d7c6ff03249b469db33</originalsourceid><addsrcrecordid>eNotjc1KxDAYRYMgOIzzAO4CrlPz0y9N3JXRqjCoYBfuhjRNNENNa9qOPr6FcXUvh8u5CF0xmuUKgN6Y9BuOGaeMZVQwgDO04kIwonLOL9BmHA-UUi4LDiBW6P3VJPPlphQsrjozTS6SGr_9hPHzFpcRl60ZpnB0-LmPpAvRmYRLuxAzhT7iao72VPqE75wb8G5ZxBA_LtG5N93oNv-5RnV1X28fye7l4Wlb7ogBrol2ghbglMwlMxI86EI0tmVeWm-oti2VptVKAmsc0Law0nsqeK6bXOq2EWKNrk_aIfXfsxun_aGfU1we9zwHpQQruBZ_OiJSgA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2458831729</pqid></control><display><type>article</type><title>Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning</title><source>Publicly Available Content Database</source><creator>Chieng, Hock Hung ; Noorhaniza Wahid ; Ong, Pauline</creator><creatorcontrib>Chieng, Hock Hung ; Noorhaniza Wahid ; Ong, Pauline</creatorcontrib><description>Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in a performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN- 5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2011.03155</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Approximation ; Artificial neural networks ; Deep learning ; Machine learning ; Mathematical analysis ; Performance degradation ; Performance prediction ; Training</subject><ispartof>arXiv.org, 2022-02</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2458831729?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,27925,37012,44590</link.rule.ids></links><search><creatorcontrib>Chieng, Hock Hung</creatorcontrib><creatorcontrib>Noorhaniza Wahid</creatorcontrib><creatorcontrib>Ong, Pauline</creatorcontrib><title>Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning</title><title>arXiv.org</title><description>Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in a performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN- 5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.</description><subject>Approximation</subject><subject>Artificial neural networks</subject><subject>Deep learning</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Performance degradation</subject><subject>Performance prediction</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNotjc1KxDAYRYMgOIzzAO4CrlPz0y9N3JXRqjCoYBfuhjRNNENNa9qOPr6FcXUvh8u5CF0xmuUKgN6Y9BuOGaeMZVQwgDO04kIwonLOL9BmHA-UUi4LDiBW6P3VJPPlphQsrjozTS6SGr_9hPHzFpcRl60ZpnB0-LmPpAvRmYRLuxAzhT7iao72VPqE75wb8G5ZxBA_LtG5N93oNv-5RnV1X28fye7l4Wlb7ogBrol2ghbglMwlMxI86EI0tmVeWm-oti2VptVKAmsc0Law0nsqeK6bXOq2EWKNrk_aIfXfsxun_aGfU1we9zwHpQQruBZ_OiJSgA</recordid><startdate>20220226</startdate><enddate>20220226</enddate><creator>Chieng, Hock Hung</creator><creator>Noorhaniza Wahid</creator><creator>Ong, Pauline</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220226</creationdate><title>Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning</title><author>Chieng, Hock Hung ; Noorhaniza Wahid ; Ong, Pauline</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a529-9e3075e86461a65f5973bcd1f6cfa09cd06ad98651be50d7c6ff03249b469db33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Approximation</topic><topic>Artificial neural networks</topic><topic>Deep learning</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Performance degradation</topic><topic>Performance prediction</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Chieng, Hock Hung</creatorcontrib><creatorcontrib>Noorhaniza Wahid</creatorcontrib><creatorcontrib>Ong, Pauline</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chieng, Hock Hung</au><au>Noorhaniza Wahid</au><au>Ong, Pauline</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning</atitle><jtitle>arXiv.org</jtitle><date>2022-02-26</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in a performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN- 5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2011.03155</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2022-02
issn 2331-8422
language eng
recordid cdi_proquest_journals_2458831729
source Publicly Available Content Database
subjects Approximation
Artificial neural networks
Deep learning
Machine learning
Mathematical analysis
Performance degradation
Performance prediction
Training
title Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T16%3A55%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Parametric%20Flatten-T%20Swish:%20An%20Adaptive%20Non-linear%20Activation%20Function%20For%20Deep%20Learning&rft.jtitle=arXiv.org&rft.au=Chieng,%20Hock%20Hung&rft.date=2022-02-26&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2011.03155&rft_dat=%3Cproquest%3E2458831729%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a529-9e3075e86461a65f5973bcd1f6cfa09cd06ad98651be50d7c6ff03249b469db33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2458831729&rft_id=info:pmid/&rfr_iscdi=true